KR20230017554A

KR20230017554A - Method and system for evaluating quality of voice counseling

Info

Publication number: KR20230017554A
Application number: KR1020210099202A
Authority: KR
Inventors: 이건수; 김찬호; 김기원; 신종호
Original assignee: 주식회사 씨앤에이아이
Priority date: 2021-07-28
Filing date: 2021-07-28
Publication date: 2023-02-06
Also published as: KR102583434B1

Abstract

The present invention relates to a method for effectively evaluating the quality of a voice consultation. The method for evaluating the quality of a voice consultation comprises the steps of: if a voice consultation is performed, receiving recorded data associated with the voice consultation; based on the received recorded data, generating text data corresponding to the voices of a counselor and a customer over time; dividing the generated text data into sentence units to generate a plurality of text sentences; generating a mel spectrogram associated with the recorded data and verifying the accuracy of the plurality of text sentences generated using the generated mel spectrogram; and performing a quality evaluation for the voice consultation by using the plurality of verified text sentences according to a time section and time of the voice consultation.

Description

음성 상담의 품질 평가 방법 및 시스템{METHOD AND SYSTEM FOR EVALUATING QUALITY OF VOICE COUNSELING}Method and system for evaluating the quality of voice counseling

본 발명은 음성 상담의 품질 평가 방법 및 시스템에 관한 것으로, 구체적으로, 음성을 텍스트로 변환하여 품질 평가를 수행하는 음성 상담의 품질 평가 방법 및 시스템에 관한 것이다.The present invention relates to a method and system for evaluating quality of voice counseling, and more particularly, to a method and system for evaluating quality of voice counseling in which quality evaluation is performed by converting voice into text.

언택트(untact) 시대의 시작으로, 온라인 산업의 성장 속도는 점차 빨라지고 있다. 이와 같이 온라인 산업이 성장함에 따라, 고객 관리에 대한 중요성은 높아지게 되었고, 그 접점에 존재하는 컨택(contact) 센터 역시 중요하게 인식되고 있다. 이러한 컨택 센터의 업무는 크게 상담사의 상담 업무 및 상담사 관리 업무로 구분된다. 상담사 관리 업무 중 하나로서, 고객 관리를 위해 주기적으로 상담사의 상담 품질 평가를 수행하는 것은 매우 중요하다.With the beginning of the untact era, the growth rate of the online industry is gradually accelerating. As the online industry grows in this way, the importance of customer management has increased, and the contact center that exists at the point of contact is also recognized as important. The work of such a contact center is largely divided into a counselor's counseling task and a counselor's management task. As one of the counselor's management tasks, it is very important to periodically evaluate the counselor's counseling quality for customer management.

한편, 상담 품질 평가는 품질 평가를 수행하는 교육 강사에 의해 수행될 수 있다. 그러나, 상담사 대비 교육 강사의 수가 현저히 적기 때문에, 교육 강사가 모든 상담사의 통화 내용을 검수하여 품질 평가를 수행하는 것은 불가능하다. 따라서, 음성 상담의 품질 평가를 높은 정확도로 자동화하기 위한 기술이 요구된다.Meanwhile, the consultation quality evaluation may be performed by a training instructor who performs the quality evaluation. However, since the number of training instructors is significantly smaller than that of counselors, it is impossible for the training instructor to perform quality evaluation by inspecting all counselors' call contents. Therefore, a technique for automating quality evaluation of voice counseling with high accuracy is required.

본 발명은 상기와 같은 문제점을 해결하기 위한 음성 상담의 품질 평가 방법, 기록매체에 저장된 컴퓨터 프로그램 및 시스템(장치)을 제공한다.The present invention provides a method for evaluating the quality of voice counseling, a computer program stored in a recording medium, and a system (apparatus) to solve the above problems.

본 발명은 방법, 시스템(장치) 또는 판독 가능 저장 매체에 저장된 컴퓨터 프로그램을 포함한 다양한 방식으로 구현될 수 있다.The present invention may be implemented in a variety of ways including as a method, system (device) or computer program stored on a readable storage medium.

본 발명의 일 실시예에 따르면, 적어도 하나의 프로세서에 의해 수행되는 음성 상담의 품질 평가 방법은, 음성 상담이 수행된 경우, 음성 상담과 연관된 녹취 데이터를 수신하는 단계, 수신된 녹취 데이터를 기초로, 시간에 따른 상담사 및 고객의 음성에 대응하는 텍스트 데이터를 생성하는 단계, 생성된 텍스트 데이터를 문장 단위로 분할하여 복수의 텍스트 문장을 생성하는 단계, 녹취 데이터 상에 포함된 무음 구간을 추출하고, 추출된 무음 구간을 기초로 생성된 복수의 텍스트 문장의 정확성을 검증하는 단계 및 음성 상담의 시간 구간 및 시간에 따른, 검증된 복수의 텍스트 문장을 이용하여 음성 상담에 대한 품질 평가를 수행하는 단계를 포함한다.According to an embodiment of the present invention, a method for evaluating the quality of voice counseling performed by at least one processor includes, when voice counseling is performed, receiving recorded data associated with the voice counseling, based on the received recorded data. , generating text data corresponding to the voices of the counselor and the customer over time, generating a plurality of text sentences by dividing the generated text data into sentence units, extracting a silent section included in the recorded data, The step of verifying the accuracy of a plurality of text sentences generated based on the extracted silent section and the step of performing quality evaluation for voice counseling using the plurality of verified text sentences according to the time interval and time of voice counseling. include

본 발명의 일 실시예에 따르면, 녹취 데이터 상에 포함된 무음 구간을 추출하고, 추출된 무음 구간을 기초로 생성된 복수의 텍스트 문장의 정확성을 검증하는 단계는, 녹취 데이터와 연관된 멜 스펙트로그램을 생성하고, 생성된 멜 스펙트로그램을 이용하여 무음 구간을 추출하는 단계를 포함한다.According to an embodiment of the present invention, the step of extracting a silent section included in the recorded data and verifying the accuracy of a plurality of text sentences generated based on the extracted silent section includes a Mel spectrogram associated with the recorded data. and extracting a silent section using the generated MEL spectrogram.

본 발명의 일 실시예에 따르면, 음성 상담에 대한 품질 평가를 수행하는 단계는, 복수의 텍스트 문장 중 특정 시간 범위 내에 포함된 제1 세트의 텍스트 문장을 추출하는 단계, 추출된 제1 세트의 텍스트 문장을 복수의 형태소로 분할하는 단계 및 분할된 복수의 형태소 상에 인사말과 연관된 인사 키워드가 포함되어 있는지 여부를 판정하는 단계를 포함한다.According to an embodiment of the present invention, the step of performing quality evaluation on voice counseling may include extracting a first set of text sentences included within a specific time range among a plurality of text sentences, and extracting the first set of text sentences. Dividing the sentence into a plurality of morphemes, and determining whether a greeting keyword related to the greeting is included in the divided plurality of morphemes.

본 발명의 일 실시예에 따르면, 음성 상담에 대한 품질 평가를 수행하는 단계는, 복수의 텍스트 문장을 상담사와 연관된 제2 세트의 텍스트 문장 및 고객과 연관된 제3 세트의 텍스트 문장으로 분할하는 단계 및 제2 세트의 텍스트 문장의 적어도 일부가 제3 세트의 텍스트 문장의 적어도 일부와 중첩되는 횟수를 산출하는 단계를 포함한다.According to one embodiment of the present invention, performing quality evaluation on voice counseling includes dividing a plurality of text sentences into a second set of text sentences associated with a counselor and a third set of text sentences associated with a customer; and and calculating the number of times at least some of the text sentences in the second set overlap with at least some of the text sentences in the third set.

본 발명의 일 실시예에 따르면, 음성 상담에 대한 품질 평가를 수행하는 단계는, 음성 상담의 시간 구간 및 시간에 따른 복수의 텍스트 문장을 기초로 특정 시간 동안 음소거가 발생한 구간을 추출하는 단계, 복수의 텍스트 문장 중 음소거가 발생한 구간 이전의 텍스트 문장을 결정하는 단계 및 결정된 음소거가 발생한 구간 이전의 텍스트 문장 상에 대기 요청 키워드가 포함되어 있는지 여부를 판정하는 단계를 포함한다.According to an embodiment of the present invention, the step of evaluating the quality of voice counseling may include extracting a section in which mute occurs for a specific time based on a time section of the voice counseling and a plurality of text sentences according to time; Determining a text sentence before a mute-occurring section among text sentences of , and determining whether a standby request keyword is included in a text sentence before the determined muting-occurring section.

본 발명의 일 실시예에 따르면, 음성 상담에 대한 품질 평가를 수행하는 단계는, 복수의 텍스트 문장을 상담사와 연관된 제2 세트의 텍스트 문장 및 고객과 연관된 제3 세트의 텍스트 문장으로 분할하는 단계 및 제3 세트의 텍스트 문장 중 고객의 개인 정보를 포함하는 개인 정보 텍스트 문장을 추출하는 단계, 제2 세트의 텍스트 문장 중 추출된 개인 정보 텍스트 문장 직후의 텍스트 문장을 추출하는 단계 및 추출된 개인 정보 텍스트 문장 직후의 텍스트 문장에 고객의 개인 정보에 대응하는 키워드가 포함되어 있는지 여부를 판정하는 단계를 포함한다.According to one embodiment of the present invention, performing quality evaluation on voice counseling includes dividing a plurality of text sentences into a second set of text sentences associated with a counselor and a third set of text sentences associated with a customer; and extracting a personal information text sentence including personal information of the customer from among the third set of text sentences; extracting a text sentence right after the extracted personal information text sentence from among the second set of text sentences; and extracting the extracted personal information text sentence. and judging whether the text sentence immediately following the sentence contains a keyword corresponding to the customer's personal information.

본 발명의 일 실시예에 따르면, 음성 상담에 대한 품질 평가를 수행하는 단계는, 복수의 텍스트 문장 중 마지막 특정 개수에 대응하는 제4 세트의 텍스트 문장을 추출하는 단계 및 개체명 인식 알고리즘을 이용하여 제4 세트의 텍스트 문장 상에 상담사와 연관된 정보가 포함되어 있는지 여부를 판정하는 단계를 포함한다.According to one embodiment of the present invention, the step of performing quality evaluation on voice counseling includes extracting a fourth set of text sentences corresponding to the last specific number of a plurality of text sentences and using an object name recognition algorithm. and determining whether information related to the counselor is included on the fourth set of text sentences.

본 발명의 일 실시예에 따르면, 음성 상담의 품질 평가에 대한 결과 데이터를 품질 평가의 항목 별로 시각화하여 품질 평가 보고서를 생성하는 단계를 더 포함한다.According to an embodiment of the present invention, the method further includes generating a quality evaluation report by visualizing result data for quality evaluation of voice counseling for each quality evaluation item.

본 발명의 일 실시예에 따른 상술된 음성 상담의 품질 평가 방법을 컴퓨터에서 실행하기 위해 컴퓨터 판독 가능한 기록 매체에 저장된 컴퓨터 프로그램이 제공된다.A computer program stored in a computer-readable recording medium is provided to execute the above-described quality evaluation method of voice counseling according to an embodiment of the present invention in a computer.

본 발명의 일 실시예에 따른 음성 상담 품질 평가 시스템은, 음성 상담이 수행된 경우, 음성 상담과 연관된 녹취 데이터를 수신하고, 수신된 녹취 데이터를 기초로, 시간에 따른 상담사 및 고객의 음성에 대응하는 텍스트 데이터를 생성하는 텍스트 변환부, 생성된 텍스트 데이터를 문장 단위로 분할하여 복수의 텍스트 문장을 생성하는 문장 생성부, 녹취 데이터 상에 포함된 무음 구간을 추출하고, 추출된 무음 구간을 기초로 생성된 복수의 텍스트 문장의 정확성을 검증하는 문장 검증부 및 음성 상담의 시간 구간 및 시간에 따른, 검증된 복수의 텍스트 문장을 이용하여 음성 상담에 대한 품질 평가를 수행하는 품질 평가 수행부를 포함한다.Voice counseling quality evaluation system according to an embodiment of the present invention, when voice counseling is performed, receives recorded data related to voice counseling, and responds to the voices of counselors and customers over time based on the received recorded data. A text conversion unit for generating text data to generate, a sentence generation unit for generating a plurality of text sentences by dividing the generated text data into sentence units, extracting a silent section included in the recording data, and based on the extracted silent section It includes a sentence verifier that verifies the accuracy of the generated plurality of text sentences and a quality evaluation performer that performs quality evaluation of the voice counseling by using the plurality of verified text sentences according to the time period and time of the voice counseling.

본 발명의 다양한 실시예에서, 사용자가 직접 상담사들의 모든 음성 상담 내용을 직접 듣지 않고도, 음성 상담에 대한 품질 평가를 효과적으로 수행할 수 있다.In various embodiments of the present invention, it is possible to effectively evaluate the quality of voice counseling without the user directly listening to all voice counseling contents of counselors.

본 발명의 다양한 실시예에서, 상담사의 상담 능력 향상 및 추가 교육을 위한 별도의 문서를 생성하지 않고도, 자동적으로 생성되는 품질 평가 보고서를 이용하여 상담사에 대한 교육이 효율적으로 수행될 수 있다.In various embodiments of the present invention, counselor training can be efficiently performed using an automatically generated quality evaluation report without generating a separate document for counseling ability improvement and additional training of counselors.

본 발명의 다양한 실시예에서, 프로세서는 음성 상담이 시작된 후, 특정 시간 내에 정해진 구성을 포함하는 인사말이 발화되었는지 여부를 간단히 판정할 수 있다.In various embodiments of the present invention, the processor may simply determine whether a greeting including a predetermined configuration is uttered within a specific time after the voice counseling starts.

본 발명의 다양한 실시예에서, 프로세서는 텍스트 문장을 통해 상담사와 고객의 음성이 중첩되는 구간을 인식하여, 상담사의 경청 능력을 효과적으로 평가할 수 있다.In various embodiments of the present invention, the processor may effectively evaluate the counselor's listening ability by recognizing a section where voices of the counselor and the customer overlap through text sentences.

본 발명의 다양한 실시예에서, 프로세서는 음성 상담 중 발생하는 묵음 구간들 중 대기 요청이 없는 묵음 구간만을 효과적으로 추출하여 상담 품질 평가를 수행할 수 있다.In various embodiments of the present invention, the processor may perform consultation quality evaluation by effectively extracting only silent sections without a waiting request from among silent sections occurring during voice counseling.

본 발명의 다양한 실시예에서, 프로세서는 고객의 개인 정보가 발화된 경우에, 상담사가 해당 내용을 복창하며 개인 정보의 정확성에 대한 재확인을 수행하였는지 여부를 간단히 인식할 수 있다.In various embodiments of the present invention, when the customer's personal information is spoken, the processor may simply recognize whether or not the counselor repeats the corresponding content and reconfirms the accuracy of the personal information.

본 발명의 효과는 이상에서 언급한 효과로 제한되지 않으며, 언급되지 않은 다른 효과들은 청구범위의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자("통상의 기술자"라 함)에게 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned are clear to those skilled in the art (referred to as "ordinary technicians") from the description of the claims. will be understandable.

본 발명의 실시예들은, 이하 설명하는 첨부 도면들을 참조하여 설명될 것이며, 여기서 유사한 참조 번호는 유사한 요소들을 나타내지만, 이에 한정되지는 않는다.
도 1은 본 발명의 일 실시예에 따른 음성 상담의 품질 평가가 수행되는 예시를 나타내는 도면이다.
도 2는 본 발명의 일 실시예에 따른 음성 상담 품질 평가 시스템의 내부 구성을 나타내는 기능적인 블록도이다.
도 3은 본 발명의 일 실시예에 따른 멜 스펙트로그램이 생성되는 예시를 나타내는 도면이다.
도 4는 본 발명의 일 실시예에 따른 음성 상담의 품질 평가 방법의 예시를 나타내는 흐름도이다.
도 5는 본 발명의 일 실시예에 따른 도입부 평가 방법의 예시를 나타내는 흐름도이다.
도 6은 본 발명의 일 실시예에 따른 경청 능력 평가 방법의 예시를 나타내는 흐름도이다.
도 7은 본 발명의 일 실시예에 따른 묵음 평가 방법의 예시를 나타내는 흐름도이다.
도 8은 본 발명의 일 실시예에 따른 정보 확인 평가 방법의 예시를 나타내는 흐름도이다.
도 9는 본 발명의 일 실시예에 따른 끝인사 평가 방법의 예시를 나타내는 흐름도이다.BRIEF DESCRIPTION OF THE DRAWINGS Embodiments of the present invention will be described with reference to the accompanying drawings in which like reference numbers indicate like elements, but are not limited thereto.
1 is a diagram illustrating an example in which quality evaluation of voice counseling is performed according to an embodiment of the present invention.
2 is a functional block diagram showing the internal configuration of a voice consultation quality evaluation system according to an embodiment of the present invention.
3 is a diagram illustrating an example of generating a Mel spectrogram according to an embodiment of the present invention.
4 is a flowchart illustrating an example of a method for evaluating the quality of voice counseling according to an embodiment of the present invention.
5 is a flowchart illustrating an example of a method for evaluating an introductory part according to an embodiment of the present invention.
6 is a flowchart illustrating an example of a listening ability evaluation method according to an embodiment of the present invention.
7 is a flowchart illustrating an example of a method for evaluating silence according to an embodiment of the present invention.
8 is a flowchart illustrating an example of an information verification evaluation method according to an embodiment of the present invention.
9 is a flowchart illustrating an example of a method for evaluating end greetings according to an embodiment of the present invention.

이하, 본 발명의 실시를 위한 구체적인 내용을 첨부된 도면을 참조하여 상세히 설명한다. 다만, 이하의 설명에서는 본 발명의 요지를 불필요하게 흐릴 우려가 있는 경우, 널리 알려진 기능이나 구성에 관한 구체적 설명은 생략하기로 한다.Hereinafter, specific details for the implementation of the present invention will be described in detail with reference to the accompanying drawings. However, in the following description, if there is a risk of unnecessarily obscuring the gist of the present invention, detailed descriptions of well-known functions or configurations will be omitted.

첨부된 도면에서, 동일하거나 대응하는 구성요소에는 동일한 참조부호가 부여되어 있다. 또한, 이하의 실시예들의 설명에 있어서, 동일하거나 대응되는 구성요소를 중복하여 기술하는 것이 생략될 수 있다. 그러나, 구성요소에 관한 기술이 생략되어도, 그러한 구성요소가 어떤 실시예에 포함되지 않는 것으로 의도되지는 않는다.In the accompanying drawings, identical or corresponding elements are given the same reference numerals. In addition, in the description of the following embodiments, overlapping descriptions of the same or corresponding components may be omitted. However, omission of a description of a component does not intend that such a component is not included in an embodiment.

개시된 실시예의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명이 완전하도록 하고, 본 발명이 통상의 기술자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것일 뿐이다.Advantages and features of the disclosed embodiments, and methods of achieving them, will become apparent with reference to the following embodiments in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below and can be implemented in various different forms, only these embodiments make the present invention complete and the scope of the invention to those skilled in the art. It is provided only for complete information.

본 명세서에서 사용되는 용어에 대해 간략히 설명하고, 개시된 실시예에 대해 구체적으로 설명하기로 한다. 본 명세서에서 사용되는 용어는 본 발명에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 관련 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서, 본 발명에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 발명의 전반에 걸친 내용을 토대로 정의되어야 한다.Terms used in this specification will be briefly described, and the disclosed embodiments will be described in detail. The terms used in this specification have been selected from general terms that are currently widely used as much as possible while considering the functions in the present invention, but these may vary depending on the intention or precedent of a person skilled in the related field, the emergence of new technologies, and the like. In addition, in a specific case, there is also a term arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the description of the invention. Therefore, the term used in the present invention should be defined based on the meaning of the term and the overall content of the present invention, not simply the name of the term.

본 명세서에서의 단수의 표현은 문맥상 명백하게 단수인 것으로 특정하지 않는 한, 복수의 표현을 포함한다. 또한, 복수의 표현은 문맥상 명백하게 복수인 것으로 특정하지 않는 한, 단수의 표현을 포함한다. 명세서 전체에서 어떤 부분이 어떤 구성요소를 포함한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다.Expressions in the singular number in this specification include plural expressions unless the context clearly dictates that they are singular. Also, plural expressions include singular expressions unless the context clearly specifies that they are plural. When it is said that a certain part includes a certain component in the entire specification, this means that it may further include other components without excluding other components unless otherwise stated.

본 발명에서, "포함하다", "포함하는" 등의 용어는 특징들, 단계들, 동작들, 요소들 및/또는 구성 요소들이 존재하는 것을 나타낼 수 있으나, 이러한 용어가 하나 이상의 다른 기능들, 단계들, 동작들, 요소들, 구성 요소들 및/또는 이들의 조합이 추가되는 것을 배제하지는 않는다.In the present invention, the terms "comprise", "comprising" and the like may indicate that features, steps, operations, elements and/or components are present, but may be used when such terms include one or more other functions, It is not excluded that steps, actions, elements, components, and/or combinations thereof may be added.

본 발명에서, 특정 구성 요소가 임의의 다른 구성 요소에 "결합", "조합", "연결" 되거나, "반응" 하는 것으로 언급된 경우, 특정 구성 요소는 다른 구성 요소에 직접 결합, 조합 및/또는 연결되거나, 반응할 수 있으나, 이에 한정되지 않는다. 예를 들어, 특정 구성 요소와 다른 구성 요소 사이에 하나 이상의 중간 구성 요소가 존재할 수 있다. 또한, 본 발명에서 "및/또는"은 열거된 하나 이상의 항목의 각각 또는 하나 이상의 항목의 적어도 일부의 조합을 포함할 수 있다.In the present invention, when a specific element is referred to as being “coupled”, “combined”, “connected”, or “reactive” to any other element, the specific element is directly bonded to, combined with, and/or other elements. or may be linked or reacted, but is not limited thereto. For example, one or more intermediate components may exist between certain components and other components. Also, in the present invention, “and/or” may include each of one or more items listed or a combination of at least a part of one or more items.

본 발명에서, "제1", "제2" 등의 용어는 특정 구성 요소를 다른 구성 요소와 구별하기 위해 사용되는 것으로, 이러한 용어에 의해 상술된 구성 요소가 제한되진 않는다. 예를 들어, "제1" 구성 요소는 "제2" 구성 요소와 동일하거나 유사한 형태의 요소일 수 있다.In the present invention, terms such as "first" and "second" are used to distinguish a specific component from other components, and the aforementioned components are not limited by these terms. For example, the “first” element may have the same or similar shape as the “second” element.

본 발명에서, '녹취 데이터'는 상담사와 고객 사이의 음성 상담을 녹음하거나 녹취한 음성 데이터로서, 시간에 따른 상담사 및 고객의 음성의 크기(진폭), 높낮이 등에 대한 정보를 포함할 수 있다.In the present invention, 'recording data' is voice data recorded or transcribed from voice counseling between a counselor and a customer, and may include information about the volume (amplitude) and pitch of the counselor's and customer's voice over time.

본 발명에서, '스펙트로그램(spectrogram) '은 소리 영역을 주요한 특징 중심으로 표시하는 이미지를 지칭할 수 있으며, 음성의 강도와 주파수의 분포를 포함할 수 있다. 또한, 멜 스펙트로그램(Mel Spectrogram)는 음성을 주파수 영역으로 인식하는 스펙트로그램을 지칭할 수 있다.In the present invention, a 'spectrogram' may refer to an image displaying a sound region as a center of a main feature, and may include a distribution of sound intensity and frequency. Also, a Mel spectrogram may refer to a spectrogram for recognizing voice in a frequency domain.

도 1은 본 발명의 일 실시예에 따른 음성 상담의 품질 평가가 수행되는 예시를 나타내는 도면이다. 일 실시예에 따르면, 상담사(110)와 고객(120)은 음성 상담을 수행할 수 있다. 이 경우, 음성 상담 녹취 서버(130)는 음성 상담을 수행하는 상담사(110) 및 고객(120)의 음성을 실시간으로 녹취할 수 있다. 예를 들어, 음성 상담 녹취 서버(130)는 상담사(110)의 음성 및 고객(120)의 음성을 각각 분리하여 녹취하거나, 음성 상담의 내용을 녹취한 후, 상담사(110)의 음성 및 고객(120)의 음성으로 분리할 수도 있다.1 is a diagram illustrating an example in which quality evaluation of voice counseling is performed according to an embodiment of the present invention. According to one embodiment, the counselor 110 and the customer 120 may perform voice counseling. In this case, the voice counseling recording server 130 may record the voices of the counselor 110 and the customer 120 performing voice counseling in real time. For example, the voice counseling recording server 130 separately records the voice of the counselor 110 and the voice of the customer 120, or after recording the contents of the voice counseling, the voice of the counselor 110 and the customer ( 120) can also be separated.

음성 상담 녹취 서버(130)에 의해 획득된 녹취 데이터는 녹취 DB(140)에 저장되어 관리될 수 있다. 예를 들어, 녹취 데이터는 상담을 수행한 상담사 별로 분리되어 녹취 DB(140)에 저장되거나, 고객 별로 분리되어 녹취 DB(140)에 저장될 수 있으나, 이에 한정되지 않는다. 다른 예에서, 녹취 데이터는 상담사 및 고객의 명칭 등으로 매칭되어 녹취 DB(140)에 저장될 수도 있다.Recording data obtained by the voice counseling recording server 130 may be stored and managed in the recording DB 140 . For example, the recorded data may be separated by counselor and stored in the recording DB 140, or may be divided by customer and stored in the recording DB 140, but is not limited thereto. In another example, recorded data may be stored in the recorded DB 140 after being matched with the counselor's and customer's names.

음성 상담 품질 평가 시스템(150)은 녹취 DB(140)에 저장된 녹취 데이터(142)를 수신하거나 추출할 수 있다. 이 경우, 음성 상담 품질 평가 시스템(150)은 수신된 녹취 데이터(142)를 이용하여 상담사(110)의 음성 상담에 대한 품질 평가를 수행할 수 있다. 예를 들어, 음성 상담에 대한 품질 평가는 도입부 평가, 경청 능력 평가, 묵음 평가, 정보 확인 평가, 끝인사 평가 등을 포함할 수 있으나, 이에 한정되지 않으며, 가점 항목 및 감점 항목을 이용한 추가적인 평가를 더 포함할 수 있다. 예를 들어, 품질 평가에서 상담사가 상담을 친절하게 처리한 것으로 판정되거나, 상담 중 쿠션어를 n회(여기서, n은 자연수) 이상 사용한 경우, 가점이 주어질 수 있다. 다른 예에서, 상담사가 상담을 불친절하게 처리한 것으로 판정되거나, 상담 결과 이력 기록 과정이 특정 기준에 미달된 경우, 감점이 주어질 수 있다.The voice consultation quality evaluation system 150 may receive or extract the recorded data 142 stored in the recorded DB 140 . In this case, the voice counseling quality evaluation system 150 may evaluate the quality of the voice counseling of the counselor 110 using the received recorded data 142 . For example, the quality evaluation for voice counseling may include, but is not limited to, introduction evaluation, listening ability evaluation, silence evaluation, information confirmation evaluation, end greeting evaluation, etc., and additional evaluation using additional points and deduction items can include more. For example, in the quality evaluation, additional points may be given when it is determined that the counselor handled counseling kindly or when a cushion word is used n times (where n is a natural number) or more during counseling. In another example, if it is determined that the counselor handled the counseling unkindly, or if the counseling result history recording process does not meet a specific standard, deduction points may be given.

품질 평가를 위해, 음성 상담 품질 평가 시스템(150)은 수신된 녹취 데이터(142)를 기초로, 시간에 따른 상담사(110) 및 고객(120)의 음성에 대응하는 텍스트 데이터를 생성할 수 있다. 이 경우, 음성을 텍스트로 변환하기 위한 임의의 STT(Speech To Text) 알고리즘이 사용되거나 임의의 기계학습 모델이 이용될 수 있다. 이와 같이 생성된 텍스트 데이터는 상담사(110)의 발화(speaking)와 연관된 텍스트 데이터 및 고객(120)의 발화와 연관된 텍스트 데이터로 구분될 수 있다. 여기서, 텍스트 데이터는 시간에 따른 음성에 대응하도록 생성되므로, 음성 상담에 포함된 상담사(110) 및 고객(120)의 발화의 순서가 생성된 텍스트 데이터 상에서 동일하게 유지될 수 있다. 다시 말해, 텍스트 데이터는 음성 상담과 연관된 시간 정보를 포함할 수 있다.For quality evaluation, the voice counseling quality evaluation system 150 may generate text data corresponding to the voices of the counselor 110 and the customer 120 over time based on the received recorded data 142 . In this case, an arbitrary STT (Speech To Text) algorithm for converting voice into text may be used or an arbitrary machine learning model may be used. The text data generated in this way may be divided into text data related to the speaking of the counselor 110 and text data related to the speaking of the customer 120 . Here, since the text data is generated to correspond to the voice over time, the order of utterances of the counselor 110 and the customer 120 included in the voice counseling can be maintained the same on the generated text data. In other words, text data may include time information associated with voice counseling.

일 실시예에 따르면, 음성 상담 품질 평가 시스템(150)은 생성된 텍스트 데이터를 문장 단위로 분할하여 복수의 텍스트 문장을 생성할 수 있다. 여기서, 음성 상담 품질 평가 시스템(150)은 임의의 알고리즘, 기계학습 모델 등을 이용하여 텍스트 데이터를 문장 단위로 분할할 수 있다. 예를 들어, 음성 상담 품질 평가 시스템(150)은 주어 및 동사를 포함하는 텍스트 데이터의 일부를 하나의 텍스트 문장으로 결정할 수 있으나, 이에 한정되지 않는다.According to an embodiment, the voice consultation quality evaluation system 150 may generate a plurality of text sentences by dividing the generated text data into sentence units. Here, the voice consultation quality evaluation system 150 may divide the text data into sentence units using an arbitrary algorithm, machine learning model, or the like. For example, the voice consultation quality evaluation system 150 may determine a part of text data including a subject and a verb as one text sentence, but is not limited thereto.

음성 상담 품질 평가 시스템(150)은 생성된 복수의 텍스트 문장의 정확성을 검증할 수 있다. 다시 말해, 텍스트 데이터가 각각의 문장 별로 정확히 분할되었는지 여부가 검증될 수 있다. 예를 들어, 음성 상담이 수행되는 경우, 정형화된 문장 뿐만 아니라, 비정형 문장이 포함될 수 있으며, 이러한 비정형 문장 중 적어도 일부는 텍스트 문장으로 정확히 분할되지 않을 수 있다. 따라서, 1차적으로 분할된 복수의 텍스트 문장의 정확성에 대한 검증을 수행하여, 텍스트 데이터에 포함된 모든 문장을 정확히 식별하고 재분할할 수 있다.The voice consultation quality evaluation system 150 may verify the accuracy of a plurality of generated text sentences. In other words, it may be verified whether the text data is accurately divided for each sentence. For example, when voice counseling is performed, not only standard sentences but also irregular sentences may be included, and at least some of these irregular sentences may not be accurately divided into text sentences. Therefore, it is possible to accurately identify and re-segment all sentences included in the text data by verifying the accuracy of the plurality of text sentences that are primarily divided.

일 실시예에 따르면, 음성 상담 품질 평가 시스템(150)은 녹취 데이터(142) 상에 포함된 무음 구간을 추출하고, 추출된 무음 구간을 기초로 생성된 복수의 텍스트 문장의 정확성을 검증할 수 있다. 여기서, 음성 상담 품질 평가 시스템(150)은 녹취 데이터(142)와 연관된 멜 스펙트로그램(mel spectrogram)을 생성하고, 생성된 멜 스펙트로그램을 이용하여 무음 구간을 추출한 후, 추출된 무음 구간을 기초로 생성된 복수의 텍스트 문장의 정확성을 검증할 수 있다. 예를 들어, 분할된 텍스트 문장이 “아 그건 말이죠 (공백) 이만 오천원입니다"와 같이 구성된 경우, 음성 상담 품질 평가 시스템(150)은 해당 텍스트 문장을 공백을 기준으로 2개의 새로운 텍스트 문장으로 재분할할 수 있다.According to an embodiment, the voice consultation quality evaluation system 150 may extract a silent section included in the recorded data 142 and verify the accuracy of a plurality of text sentences generated based on the extracted silent section. . Here, the voice consultation quality evaluation system 150 generates a mel spectrogram associated with the recorded data 142, extracts a silent section using the generated mel spectrogram, and then based on the extracted silent section, Accuracy of a plurality of generated text sentences can be verified. For example, if the divided text sentence is structured as “Oh, that's (blank) is 25,000 won”, the voice consultation quality evaluation system 150 will re-segment the text sentence into two new text sentences based on the space. can

그 후, 음성 상담 품질 평가 시스템(150)은 음성 상담의 시간 구간 및 시간에 따른, 검증된 복수의 텍스트 문장을 이용하여 음성 상담에 대한 품질 평가를 수행할 수 있다. 즉, 음성 상담 품질 평가 시스템(150)은 도입부 평가, 경청 능력 평가, 묵음 평가, 정보 확인 평가, 끝인사 평가 등을 각각 수행하고, 그 결과를 정의된 형식으로 결합하여 품질 보고서(152)를 생성할 수 있다. 이와 같이 생성된 품질 보고서(152)는 품질 평가를 수행하는 사용자 및/또는 상담사(110) 등의 사용자 단말로 전송되거나 전달될 수 있다.Thereafter, the voice consultation quality evaluation system 150 may perform quality evaluation on the voice consultation using a plurality of verified text sentences according to the time interval and time of the voice consultation. That is, the voice consultation quality evaluation system 150 performs introduction evaluation, listening ability evaluation, silence evaluation, information confirmation evaluation, end greeting evaluation, etc., respectively, and generates a quality report 152 by combining the results in a defined format. can do. The quality report 152 generated in this way may be transmitted or forwarded to a user terminal such as a user performing quality evaluation and/or a counselor 110 .

도 2는 본 발명의 일 실시예에 따른 음성 상담 품질 평가 시스템(150)의 내부 구성을 나타내는 기능적인 블록도이다. 도시된 바와 같이, 음성 상담 품질 평가 시스템(150)은 텍스트 변환부(210), 문장 생성부(220), 문장 검증부(230), 품질 평가 수행부(240) 등을 포함할 수 있다. 상술된 바와 같이, 음성 상담 품질 평가 시스템(150)은 녹취 DB 등과 통신하며, 품질 평가에 필요한 데이터 및/또는 정보 등을 주고받을 수 있다.2 is a functional block diagram showing the internal configuration of a voice consultation quality evaluation system 150 according to an embodiment of the present invention. As shown, the voice counseling quality evaluation system 150 may include a text conversion unit 210, a sentence generation unit 220, a sentence verification unit 230, a quality evaluation execution unit 240, and the like. As described above, the voice consultation quality evaluation system 150 communicates with a recording DB and the like, and can exchange data and/or information necessary for quality evaluation.

텍스트 변환부(210)는 음성 상담과 연관된 녹취 데이터를 수신하고, 수신된 녹취 데이터(142)를 기초로, 시간에 따른 상담사 및 고객의 음성에 대응하는 텍스트 데이터를 생성할 수 있다. 예를 들어, 텍스트 변환부(210)는 임의의 STT 알고리즘, 기계학습 모델 등을 이용하여 녹취 데이터를 텍스트 데이터로 변환할 수 있다. 이와 같이, STT 변환이 수행된 경우, 텍스트 데이터는 음성의 인식 결과와 해당 결과의 시간 정보의 집합으로 구성될 수 있다.The text conversion unit 210 may receive recorded data associated with voice counseling, and based on the received recorded data 142, generate text data corresponding to voices of the counselor and the customer over time. For example, the text conversion unit 210 may convert recorded data into text data using an arbitrary STT algorithm or machine learning model. In this way, when STT conversion is performed, text data may be composed of a speech recognition result and a set of time information of the result.

상담 도입 부분에서 첫인사에 대한 발화, 응대 과정, 마무리 과정에서 끝인사에 대한 발화 등을 평가하기 위해서는 음성 상담의 내용을 문장 단위로 처리하는 것이 요구될 수 있다. 따라서, 문장 생성부(220)는 텍스트 데이터를 문장 단위의 스크립트(script)로 변경할 수 있다. 예를 들어, 문장 생성부(220)는 텍스트 데이터에 포함된 단어들 중 문장을 구성하는 단어들을 각각 조합하여 복수의 텍스트 문장을 생성할 수 있다. 이 경우, 문장 생성부(220)는 인접한 시간 범위의 주어 및 동사를 포함하도록 복수의 텍스트 문장을 생성할 수 있으나, 이에 한정되지 않는다.It may be required to process the content of voice counseling in sentence units in order to evaluate the utterance of the first greeting in the introduction part of the counseling, the response process, and the utterance of the final greeting in the closing process. Accordingly, the sentence generating unit 220 may change the text data into a sentence unit script. For example, the sentence generator 220 may generate a plurality of text sentences by combining words constituting sentences among words included in the text data. In this case, the sentence generating unit 220 may generate a plurality of text sentences to include subjects and verbs of adjacent time ranges, but is not limited thereto.

문장 검증부(230)는 녹취 데이터(142)를 이용하여, 문장 생성부(220)에 의해 생성된 복수의 텍스트 문장을 검증할 수 있다. 일 실시예에 따르면, 상담 과정에서 상담사와 고객은 불완전한 문장으로 대화를 주고받을 수 있으며, 이에 따라, 비정형 문장이 녹취 데이터(142)에 포함될 수 있다. 이러한 비정형 문장의 적어도 일부는 문장 생성부(220)에 의해 정확히 구분되지 않을 수 있다. 즉, 문장 검증부(230)는 녹취 데이터(142) 상에 포함된 무음 구간을 추출하고, 추출된 무음 구간을 기초로 생성된 복수의 텍스트 문장의 정확성을 검증할 수 있다. 이 경우, 문장 검증부(230)는 녹취 데이터(142)와 연관된 멜 스펙트로그램을 생성하고, 생성된 멜 스펙트로그램을 이용하여 무음 구간을 추출한 후, 추출된 무음 구간을 기준으로 텍스트 문장을 서로 다른 복수의 텍스트 문장으로 분할할 수 있다.The sentence verifier 230 may verify a plurality of text sentences generated by the sentence generator 220 using the recorded data 142 . According to one embodiment, during the counseling process, the counselor and the customer may exchange conversations with incomplete sentences, and accordingly, atypical sentences may be included in the recorded data 142 . At least some of these irregular sentences may not be accurately distinguished by the sentence generator 220 . That is, the sentence verification unit 230 may extract the silent sections included in the recorded data 142 and verify the accuracy of a plurality of text sentences generated based on the extracted silent sections. In this case, the sentence verification unit 230 generates a mel spectrogram associated with the recorded data 142, extracts a silent section using the generated mel spectrogram, and converts text sentences into different text sentences based on the extracted silent section. It can be split into multiple text sentences.

품질 평가 수행부(240)는 생성되고 검증된 복수의 텍스트 문장을 이용하여, 음성 상담에 대한 품질 평가를 수행할 수 있다. 즉, 품질 평가 수행부(240)는 음성 상담의 시간 구간 및 시간에 따른, 검증된 복수의 텍스트 문장을 이용하여 음성 상담에 대한 품질 평가를 수행할 수 있다. 음성 상담에 대한 품질 평가는 미리 정해진 방식에 따라 여러 단계로 나누어 평가될 수 있으며, 각 단계 마다 품질 평가를 위한 임의의 알고리즘, 기계학습 모델이 사용될 수 있다. 이와 같이, 품질 평가가 완료된 후, 음성 상담 품질 평가 시스템(150)은 음성 상담의 품질 평가에 대한 결과 데이터(예: JSON 형식의 데이터)를 품질 평가의 항목 별로 시각화하여 품질 평가 보고서(152)를 생성하고 생성된 품질 평가 보고서(152)를 사용자들에게 제공할 수 있다.The quality evaluation unit 240 may perform quality evaluation on voice counseling by using a plurality of generated and verified text sentences. That is, the quality evaluation unit 240 may perform quality evaluation on voice counseling using a plurality of verified text sentences according to the time interval and time of voice counseling. Quality evaluation of voice counseling may be evaluated in several steps according to a predetermined method, and an arbitrary algorithm or machine learning model for quality evaluation may be used in each step. In this way, after the quality evaluation is completed, the voice consultation quality evaluation system 150 visualizes the result data (eg, data in JSON format) for the quality evaluation of the voice consultation for each item of the quality evaluation to generate a quality evaluation report 152. Create and provide the generated quality assessment report 152 to users.

도 2에서는 음성 상담 품질 평가 시스템(150)에 포함된 각각의 기능적인 구성이 구분되어 상술되었으나, 이는 발명의 이해를 돕기 위한 것일 뿐이며, 하나의 연산 장치에서 둘 이상의 기능을 수행할 수도 있다. 이와 같은 구성에 의해, 사용자가 직접 상담사들의 모든 음성 상담 내용을 직접 듣지 않고도, 음성 상담에 대한 품질 평가를 효과적으로 수행할 수 있다. 또한, 상담사의 상담 능력 향상 및 추가 교육을 위한 별도의 문서를 생성하지 않고도, 자동적으로 생성되는 품질 평가 보고서(152)를 이용하여 상담사에 대한 교육이 효율적으로 수행될 수 있다.In FIG. 2, each functional component included in the voice consultation quality evaluation system 150 has been separately described, but this is only to aid understanding of the present invention, and one computing device may perform two or more functions. With this configuration, it is possible to effectively evaluate the quality of voice counseling without the user directly listening to all voice counseling contents of counselors. In addition, counselor training can be efficiently performed using the automatically generated quality evaluation report 152 without generating a separate document for improving counseling ability and additional training of counselors.

도 3은 본 발명의 일 실시예에 따른 멜 스펙트로그램(340)이 생성되는 예시를 나타내는 도면이다. 상술된 바와 같이, 음성 상담 품질 평가 시스템(도 1의 150)은 녹취 데이터(310)를 이용하여 멜 스펙트로그램(340)을 생성할 수 있다. 이와 같이, 생성된 멜 스펙트로그램(340)은 텍스트 문장의 정확성을 검증하기 위해 사용될 수 있다. 멜 스펙트로그램(340)은 그래프의 형태로 시각화되어 생성될 수 있으나, 이에 한정되지 않는다.3 is a diagram illustrating an example of generating a MEL spectrogram 340 according to an embodiment of the present invention. As described above, the voice consultation quality evaluation system ( 150 in FIG. 1 ) may generate the Mel spectrogram 340 using the recorded data 310 . In this way, the generated MEL spectrogram 340 can be used to verify the accuracy of text sentences. The MEL spectrogram 340 may be visualized and generated in the form of a graph, but is not limited thereto.

일 실시예에 따르면, 음성 상담 품질 평가 시스템은 녹취 데이터(310)를 이용하여 스펙트럼(spectrum)(320)을 생성할 수 있다. 예를 들어, 음성 상담 품질 평가 시스템은 녹취 데이터(310) 상에 푸리에 변환(Fourier transform)을 적용하여 녹취 데이터(310)와 연관된 주파수 정보를 추출하고, 추출된 주파수 정보를 이용하여 스펙트럼(320)을 생성할 수 있다. 구체적으로, 녹취 데이터(310)는 시간에 따른 음성의 진폭(amplitude)에 대한 정보를 포함할 수 있으며, 스펙트럼(320)은 주파수(frequency)에 따른 진폭에 대한 정보를 포함하도록 구성될 수 있다.According to an embodiment, the voice consultation quality evaluation system may generate a spectrum 320 using the recorded data 310 . For example, the voice counseling quality evaluation system extracts frequency information associated with the recorded data 310 by applying a Fourier transform to the recorded data 310, and uses the extracted frequency information to generate a spectrum 320. can create In detail, the recorded data 310 may include information about amplitude of voice over time, and the spectrum 320 may include information about amplitude over time.

또한, 음성 상담 품질 평가 시스템은 스펙트럼(320)을 이용하여 스펙트로그램(spectrogram)(330)을 생성할 수 있다. 예를 들어, 음성 상담 품질 평가 시스템은 스펙트럼(320)의 각 시간 별로 푸리에 변환을 적용하여 시간별 주파수에 따른 진폭에 대한 정보를 추출할 수 있다. 그리고 나서, 음성 상담 품질 평가 시스템은 진폭을 데시벨(decibel)로 변환하고, 주파수에 로그 스케일(log scale)을 적용하여 스펙트로그램(330)을 생성할 수 있다.Also, the voice consultation quality evaluation system may generate a spectrogram 330 using the spectrum 320 . For example, the voice consultation quality evaluation system may extract information about amplitude according to frequency at each time by applying a Fourier transform for each time of the spectrum 320 . Then, the voice consultation quality evaluation system may generate the spectrogram 330 by converting the amplitude into decibels and applying a log scale to the frequency.

그 후, 음성 상담 품질 평가 시스템은 스펙트로그램(330)을 이용하여 멜 스펙트로그램(mel spectrogram)(340)을 생성할 수 있다. 예를 들어, 스펙트로그램(330) 상의 주파수에 멜 스케일(mel scale)을 매칭하는 경우, 멜 스펙트로그램(340)이 생성될 수 있다. 여기서, 멜 스케일은 고주파수(high frequency)보다 저주파수(low frequency) 대역에서 더 민감하게 반응하는 사람의 특성을 고려하여 생성된 스케일로서, 물리적인 주파수와 실제 사람이 인식하는 주파수의 관계를 나타내는 스케일을 지칭할 수 있다. 상술된 바와 같이, 음성 상담 품질 평가 시스템은 이와 같이 생성된 멜 스펙트로그램(340)을 이용하여 녹취 데이터(310) 상의 무음 구간을 추출하고, 추출된 무음 구간을 기초로 복수의 텍스트 문장의 정확성을 검증할 수 있다. 예를 들어, 음성 상담 품질 평가 시스템은 2 이상의 문장을 포함하는 하나의 텍스트 문장에 있어서, 무음 구간을 기준으로 무음 구간의 이전 문장과 이후 문장의 서로 다른 두 개의 텍스트 문장으로 분할할 수 있다.After that, the voice consultation quality evaluation system may generate a mel spectrogram 340 using the spectrogram 330 . For example, when a frequency on the spectrogram 330 is matched with a mel scale, the mel spectrogram 340 may be generated. Here, the Mel scale is a scale created by considering the characteristics of a person who reacts more sensitively in a low frequency band than in a high frequency band, and is a scale representing the relationship between a physical frequency and a frequency perceived by a real person. can be referred to As described above, the voice counseling quality evaluation system extracts a silent section on the recorded data 310 using the MEL spectrogram 340 generated in this way, and determines the accuracy of a plurality of text sentences based on the extracted silent section. can be verified For example, the voice consultation quality evaluation system may divide a text sentence including two or more sentences into two different text sentences, a sentence before and after the silent section based on a silent section.

도 4는 본 발명의 일 실시예에 따른 음성 상담의 품질 평가 방법(400)의 예시를 나타내는 흐름도이다. 일 실시예에 따르면, 음성 상담의 품질 평가 방법(400)은 프로세서(예를 들어, 음성 상당 품질 평가 시스템의 적어도 하나의 프로세서)에 의해 수행될 수 있다. 도시된 바와 같이, 음성 상담의 품질 평가 방법(400)은 음성 상담이 수행된 경우, 음성 상담과 연관된 녹취 데이터를 수신함으로써 개시될 수 있다(S410).4 is a flowchart illustrating an example of a method 400 for evaluating the quality of voice counseling according to an embodiment of the present invention. According to an embodiment, the voice consultation quality evaluation method 400 may be performed by a processor (eg, at least one processor of a voice equivalent quality evaluation system). As shown, the voice counseling quality evaluation method 400 may be started by receiving recorded data associated with the voice counseling when the voice counseling is performed (S410).

프로세서는 수신된 녹취 데이터를 기초로, 시간에 따른 상담사 및 고객의 음성에 대응하는 텍스트 데이터를 생성할 수 있다(S420). 또한, 프로세서는 생성된 텍스트 데이터를 문장 단위로 분할하여 복수의 텍스트 문장을 생성할 수 있다(S430). 즉, 프로세서는 녹취 데이터로부터 음성 상담을 수행한 상담사 및 고객과 연관된 문장들을 추출할 수 있다.Based on the received recorded data, the processor may generate text data corresponding to voices of the counselor and the customer according to time (S420). Also, the processor may generate a plurality of text sentences by dividing the generated text data into sentence units (S430). That is, the processor may extract sentences related to the counselor and the customer who performed the voice counseling from the recorded data.

프로세서는 녹취 데이터 상에 포함된 무음 구간을 추출하고, 추출된 무음 구간을 기초로 생성된 복수의 텍스트 문장의 정확성을 검증할 수 있다(S440). 예를 들어, 프로세서는 녹취 데이터와 연관된 멜 스펙트로그램을 생성하고, 생성된 멜 스펙트로그램을 이용하여 무음 구간을 추출할 수 있다. 또한, 프로세서는 음성 상담의 시간 구간 및 시간에 따른, 검증된 복수의 텍스트 문장을 이용하여 음성 상담에 대한 품질 평가를 수행할 수 있다(S450). 이 경우, 프로세서는 음성 상담의 품질 평가에 대한 결과 데이터를 품질 평가의 항목 별로 시각화하여 품질 평가 보고서를 생성할 수 있다.The processor may extract silent sections included in the recorded data and verify accuracy of a plurality of text sentences generated based on the extracted silent sections (S440). For example, the processor may generate a mel spectrogram associated with the recorded data and extract a silent section using the generated mel spectrogram. In addition, the processor may perform quality evaluation on the voice counseling using a plurality of verified text sentences according to the time interval and time of the voice counseling (S450). In this case, the processor may generate a quality evaluation report by visualizing result data for quality evaluation of voice counseling for each quality evaluation item.

도 5는 본 발명의 일 실시예에 따른 도입부 평가 방법(500)의 예시를 나타내는 흐름도이다. 일 실시예에 따르면, 도입부 평가 방법(500)은 프로세서(예를 들어, 음성 상당 품질 평가 시스템의 적어도 하나의 프로세서)에 의해 수행될 수 있다. 도시된 바와 같이, 도입부 평가 방법(500)은 프로세서가 복수의 텍스트 문장 중 특정 시간 범위 내에 포함된 제1 세트의 텍스트 문장을 추출함으로써 개시될 수 있다(S510). 여기서, 특정 시간 범위는 대화의 시작부터 미리 정해진 시간 이내의 범위를 포함할 수 있다. 예를 들어, 특정 시간 범위는 인사말이 발화되는 평균 시간의 2배의 시간 범위일 수 있다. 즉, 인사말의 발화는 평균 2초의 시간을 소요하는 것으로 결정된 경우, 특정 시간 범위는 대화가 시작되고 4초 이내의 범위를 포함할 수 있다.5 is a flow diagram illustrating an example of an introduction evaluation method 500 according to one embodiment of the present invention. According to one embodiment, the introduction evaluation method 500 may be performed by a processor (eg, at least one processor of a speech equivalent quality evaluation system). As shown, the introduction evaluation method 500 may be initiated by a processor extracting a first set of text sentences included within a specific time range from among a plurality of text sentences (S510). Here, the specific time range may include a range within a predetermined time from the start of the conversation. For example, the specific time range may be a time range of twice the average time at which the greeting is uttered. That is, when it is determined that the utterance of the greeting takes an average of 2 seconds, the specific time range may include a range within 4 seconds from the start of the conversation.

프로세서는 특정 시간 범위 내에 인사, 소속 및/또는 이름의 정보를 포함하는 인사말이 발화가 되었는지 여부를 기초로 도입부 평가를 수행할 수 있다. 도입부 평가를 위해, 프로세서는 추출된 제1 세트의 텍스트 문장을 복수의 형태소로 분할할 수 있다(S520). 또한, 프로세서는 분할된 복수의 형태소 상에 인사 키워드가 포함되어 있는지 여부를 판정할 수 있다(S530). 여기서, 인사 키워드는 인사말과 연관된 키워드로서, 다양한 인사말에서 공통적으로 사용되는 핵심 공통 키워드일 수 있으나, 이에 한정되지 않으며, 미리 정해진 상담사의 소속, 이름 등에 대한 키워드를 더 포함할 수 있다. 예를 들어, 프로세서는 형태소 분석을 사용하여, 각각의 요소가 발화 내에 존재하는지 여부를 확인할 수 있다. 인사말은 다양한 변이가 허용될 수 있으므로, 프로세서는 형태소 분석 결과 중 인사 키워드가 포함된 경우, 인사말이 존재한다고 판정할 수 있다.The processor may perform introductory evaluation based on whether a greeting including personnel, affiliation, and/or name information has been uttered within a specific time range. For introductory evaluation, the processor may divide the extracted first set of text sentences into a plurality of morphemes (S520). Also, the processor may determine whether a greeting keyword is included on a plurality of divided morphemes (S530). Here, the greeting keyword is a keyword associated with a greeting, and may be a core common keyword commonly used in various greetings, but is not limited thereto, and may further include keywords for a predetermined counselor's affiliation, name, and the like. For example, the processor can use morphological analysis to determine whether each element is present within an utterance. Since various variations of the greeting may be allowed, the processor may determine that the greeting exists when a greeting keyword is included among the results of the morpheme analysis.

추가적으로 또는 대안적으로, 프로세서는 개체명 인식(NER; Named Entity Recognition) 알고리즘을 이용하여 상담사의 이름이 포함되어 있는지 여부를 판정할 수 있다. 즉, 프로세서는 개체명 인식을 통해 사람 이름의 형태를 갖는 명사를 검출하여, 텍스트 문장 내에 상담사의 이름이 포함되어 있는지 여부를 판정할 수 있다. 이 경우, 프로세서는 트랜스포머(transformer) 기반의 BERT 모델 또는 ELECTRA 모델 등의 기계학습 모델을 이용하여 개체명 인식을 수행할 수 있다. 추가적으로 또는 대안적으로, 프로세서는 상담사의 이름을 미리 저장하고, 저장된 상담사의 이름과 매칭되는 명사가 존재하는지 여부를 기초로 상담사의 이름이 포함되어 있는지 여부를 판정할 수도 있다. 이와 같은 구성에 의해, 프로세서는 음성 상담이 시작된 후, 특정 시간 내에 정해진 구성을 포함하는 인사말이 발화되었는지 여부를 간단히 판정할 수 있다.Additionally or alternatively, the processor may determine whether the counselor's name is included using a Named Entity Recognition (NER) algorithm. That is, the processor may determine whether the counselor's name is included in the text sentence by detecting a noun having a form of a person's name through entity name recognition. In this case, the processor may perform entity name recognition using a transformer-based machine learning model such as a BERT model or an ELECTRA model. Additionally or alternatively, the processor may store the counselor's name in advance and determine whether or not the counselor's name is included based on whether a noun matching the counselor's name exists. With this configuration, the processor can simply determine whether or not a greeting including a predetermined configuration has been uttered within a specific time after the voice counseling starts.

도 6은 본 발명의 일 실시예에 따른 경청 능력 평가 방법(600)의 예시를 나타내는 흐름도이다. 일 실시예에 따르면, 경청 능력 평가 방법(600)은 프로세서(예를 들어, 음성 상당 품질 평가 시스템의 적어도 하나의 프로세서)에 의해 수행될 수 있다. 여기서, 경청 능력은 상담사가 고객의 발화 내용을 얼마나 주의 깊게 들었는지에 대한 정량적인 평가 요소로서, 고객의 발화 중 상담사가 개입한 횟수를 탐지하여 판단될 수 있다. 도시된 바와 같이, 경청 능력 평가 방법(600)은 프로세서가 복수의 텍스트 문장을 상담사와 연관된 제2 세트의 텍스트 문장 및 고객과 연관된 제3 세트의 텍스트 문장으로 분할함으로써 개시될 수 있다(S610).6 is a flowchart illustrating an example of a listening ability evaluation method 600 according to an embodiment of the present invention. According to an embodiment, the listening ability evaluation method 600 may be performed by a processor (eg, at least one processor of a speech equivalent quality evaluation system). Here, the ability to listen is a quantitative evaluation factor for how carefully the counselor listens to the customer's utterance, and can be determined by detecting the number of times the counselor intervenes during the customer's speech. As shown, the listening ability assessment method 600 may be initiated by the processor dividing a plurality of text sentences into a second set of text sentences associated with the counselor and a third set of text sentences associated with the customer (S610).

그 후, 프로세서는 제2 세트의 텍스트 문장의 적어도 일부가 제3 세트의 텍스트 문장의 적어도 일부와 중첩되는 횟수를 산출함으로써 상담사의 경청 능력을 평가할 수 있다(S620). 즉, 프로세서는 고객의 발화가 먼저 진행 중인 상태에서 상담사의 발화가 중첩되는 횟수를 산출할 수 있다. 이 경우, 프로세서는 특정 시간(예: 0.5초) 이상의 중첩이 발생하고, 중첩 이후 상담사의 발화가 연속해서 발생하는 횟수만을 상담사와 고객의 발화가 중첩되는 횟수로서 산출할 수 있다. 추가적으로, 중첩 이후 고객의 발화가 발생했다고 하더라도 중첩이 다른 특정 시간(예: 1초)을 초과한 경우에는, 상담사와 고객의 발화가 중첩된 것으로 판정될 수 있다. 이러한 중첩 횟수가 높게 산출될수록 해당 상담사의 경청 능력은 낮게 평가될 수 있다. 이와 같은 구성에 의해, 프로세서는 텍스트 문장을 통해 상담사와 고객의 음성이 중첩되는 구간을 인식하여, 상담사의 경청 능력을 효과적으로 평가할 수 있다.Thereafter, the processor may evaluate the counselor's listening ability by calculating the number of times at least some of the text sentences of the second set overlap with at least some of the text sentences of the third set (S620). That is, the processor may calculate the number of times the counselor's utterance overlaps in the state where the customer's utterance is in progress first. In this case, the processor may calculate, as the number of overlapping times between the counselor and the customer, only the number of overlaps occurring for a specific time (eg, 0.5 second) or more and the counselor's utterances occurring consecutively after the overlap. Additionally, even if the customer's utterance occurs after overlapping, if the overlap exceeds another specific time (eg, 1 second), it may be determined that the counselor and the customer's utterance overlap. The higher the number of overlaps is calculated, the lower the counselor's listening ability may be evaluated. With this configuration, the processor can effectively evaluate the counselor's listening ability by recognizing a section in which the voices of the counselor and the customer overlap through text sentences.

도 7은 본 발명의 일 실시예에 따른 묵음 평가 방법(700)의 예시를 나타내는 흐름도이다. 일 실시예에 따르면, 묵음 평가 방법(700)은 프로세서(예를 들어, 음성 상당 품질 평가 시스템의 적어도 하나의 프로세서)에 의해 수행될 수 있다. 예를 들어, 음성 상담 과정에서 고객의 요구를 충족시키기 위해, 외부 정보를 참조해야 하는 경우, 발생하는 음소거(mute)는 허용될 수 있으나, 그 이외의 경우 발생하는 음소거는 상담 품질을 낮추는 근거로 결정될 수 있다. 따라서, 프로세서는 대기 요청이 없는 음소거 구간을 탐지하여 묵음 평가 방법(700)을 수행할 수 있다. 도시된 바와 같이, 묵음 평가 방법(700)은 음성 상담의 시간 구간 및 시간에 따른 복수의 텍스트 문장을 기초로 특정 시간 동안 음소거가 발생한 구간을 추출함으로써 개시될 수 있다(S710). 여기서, 음소거가 발생한 구간은 묵음이 특정 시간(예: 5초) 동안 진행된 구간을 지칭할 수 있으며, 예를 들어, 특정 텍스트 문장들 사이에서 발생될 수 있다.7 is a flowchart illustrating an example of a silence evaluation method 700 according to an embodiment of the present invention. According to one embodiment, the silence evaluation method 700 may be performed by a processor (eg, at least one processor of a speech equivalent quality evaluation system). For example, in the course of voice counseling, when external information must be consulted to meet the customer's needs, muting that occurs may be acceptable, but muting that occurs in other cases is grounds for lowering the quality of counseling. can be determined Accordingly, the processor may perform the silence evaluation method 700 by detecting a mute section without a standby request. As shown, the silence evaluation method 700 may start by extracting a section in which mute occurs during a specific time based on a time section of voice counseling and a plurality of text sentences according to time (S710). Here, the section in which mute occurs may refer to a section in which silence is performed for a specific time (eg, 5 seconds), and may occur, for example, between specific text sentences.

프로세서는 복수의 텍스트 문장 중 추출된 음소거가 발생한 구간 이전의 텍스트 문장을 결정할 수 있다(S720). 또한, 프로세서는 결정된 음소거가 발생한 구간 이전의 텍스트 문장 상에 대기 요청 키워드가 포함되어 있는지 여부를 판정할 수 있다(S730). 대기 요청 키워드가 포함된 것으로 판정된 경우, 음소거가 발생한 구간의 예외에 해당할 수 있으나, 대기 요청 키워드가 포함되지 않은 것으로 판정된 경우, 음소거가 발생한 구간으로 판단될 수 있다. 일 실시예에 따르면, 대기 요청 키워드가 포함되어 있는지 여부는 도입부 평가 방법(도 6의 600)과 유사하게 수행될 수 있으며, 예를 들어, 프로세서는 음소거가 발생한 구간 이전의 텍스트 문장을 형태소 단위로 분할하고, 분할된 형태소 중 대기 요청 키워드가 포함되어 있는지 여부를 결정함으로써, 묵음 평가를 수행할 수 있다. 이와 같은 구성에 의해, 프로세서는 음성 상담 중 발생하는 묵음 구간들 중 대기 요청이 없는 묵음 구간만을 효과적으로 추출하여 상담 품질 평가를 수행할 수 있다.Among the plurality of text sentences, the processor may determine a text sentence prior to the section in which the extracted mute occurred (S720). In addition, the processor may determine whether the standby request keyword is included in the text sentence before the determined muting section (S730). When it is determined that the standby request keyword is included, it may correspond to an exception of a section in which mute occurs, but when it is determined that the standby request keyword is not included, it may be determined as a section in which mute occurs. According to an embodiment, whether or not the wait request keyword is included may be performed similarly to the introductory evaluation method (600 in FIG. 6). For example, the processor converts a text sentence prior to a muted section into morpheme units. Silence evaluation may be performed by dividing and determining whether the wait request keyword is included among the divided morphemes. With this configuration, the processor can perform consultation quality evaluation by effectively extracting only the silent sections without a waiting request among the silent sections occurring during voice counseling.

도 8은 본 발명의 일 실시예에 따른 정보 확인 평가 방법(800)의 예시를 나타내는 흐름도이다. 일 실시예에 따르면, 정보 확인 평가 방법(800)은 프로세서(예를 들어, 음성 상당 품질 평가 시스템의 적어도 하나의 프로세서)에 의해 수행될 수 있다. 여기서, 정보 확인은 고객이 요청하는 서비스를 제공하기 위해, 고객이 서비스의 접근 권한이 있는지 여부를 확인하기 위한 과정으로서, 고객의 이름, 연락처, 계좌번호 등을 확인하는 것을 지칭할 수 있다. 정보 확인을 위해, 상담사는 고객에게 정보를 요청하고, 고객으로부터 전달받은 내용을 복창한 후, 감사 인사 등을 수행할 수 있다. 도시된 바와 같이, 정보 확인 평가 방법(800)은 복수의 텍스트 문장을 상담사와 연관된 제2 세트의 텍스트 문장 및 고객과 연관된 제3 세트의 텍스트 문장으로 분할함으로써 개시될 수 있다(S810).8 is a flowchart illustrating an example of an information verification evaluation method 800 according to an embodiment of the present invention. According to an embodiment, the information verification evaluation method 800 may be performed by a processor (eg, at least one processor of a speech equivalent quality evaluation system). Here, the information check is a process for checking whether the customer has access to the service in order to provide the service requested by the customer, and may refer to checking the customer's name, contact information, account number, and the like. In order to confirm information, the counselor may request information from the customer, repeat the content delivered from the customer, and then perform a thank you speech or the like. As shown, the information validation evaluation method 800 may begin by dividing a plurality of text sentences into a second set of text sentences associated with the counselor and a third set of text sentences associated with the customer (S810).

프로세서는 제3 세트의 텍스트 문장 중 고객의 개인 정보를 포함하는 개인 정보 텍스트 문장을 추출할 수 있다(S820). 예를 들어, 개인 정보 텍스트 문장은 고객의 이름, 연락처, 계좌번호 등을 포함하는 문장을 지칭할 수 있다. 프로세서는 상술된 개체명 인식 및/또는 형태소 분석 등을 이용하여, 개인 정보 텍스트 문장을 추출할 수 있으나, 이에 한정되지 않으며, 특정 정보를 포함하는 텍스트를 추출하기 위한 임의의 알고리즘을 이용하여 개인 정보 텍스트 문장을 추출할 수도 있다.The processor may extract personal information text sentences including personal information of the customer from among the text sentences of the third set (S820). For example, the personal information text sentence may refer to a sentence including a customer's name, contact information, account number, and the like. The processor may extract personal information text sentences using the above-described entity name recognition and/or morpheme analysis, but is not limited thereto, and may use any algorithm for extracting text including specific information to extract personal information. You can also extract text sentences.

프로세서는 제2 세트의 텍스트 문장 중 추출된 개인 정보 텍스트 문장 직후의 문장을 추출할 수 있다(S830). 또한, 프로세서는 추출된 개인 정보 텍스트 문장 직후의 문장에 고객의 개인 정보에 대응하는 키워드가 포함되어 있는지 여부를 판정할 수 있다(S840). 일 실시예에 따르면, 프로세서는 개인 정보 텍스트 문장으로부터 고객의 개인 정보에 해당하는 키워드를 추출하고, 추출된 키워드와 대응되는 키워드가 추출된 개인 정보 텍스트 문장 직후의 상담사의 발화에 포함되어 있는지 여부를 판정할 수 있다. 추가적으로, 프로세서는 상담사가 고객의 개인 정보를 확인한 이후에, 감사 인사 등을 포함하는 텍스트 문장이 존재하는지 여부를 판정할 수도 있다. 이와 같은 구성에 의해, 프로세서는 고객의 개인 정보가 발화된 경우에, 상담사가 해당 내용을 복창하며 개인 정보의 정확성에 대한 재확인을 수행하였는지 여부를 간단히 인식할 수 있다.The processor may extract a sentence right after the extracted personal information text sentence among the text sentences of the second set (S830). In addition, the processor may determine whether a keyword corresponding to the customer's personal information is included in the sentence immediately following the extracted personal information text sentence (S840). According to one embodiment, the processor extracts a keyword corresponding to the customer's personal information from the personal information text sentence, and determines whether the keyword corresponding to the extracted keyword is included in the counselor's utterance immediately after the extracted personal information text sentence. can judge Additionally, the processor may determine whether a text sentence including a thank-you note exists after the counselor confirms the customer's personal information. With this configuration, when the customer's personal information is uttered, the processor can simply recognize whether or not the counselor repeats the corresponding content and reconfirms the accuracy of the personal information.

도 9는 본 발명의 일 실시예에 따른 끝인사 평가 방법(900)의 예시를 나타내는 흐름도이다. 일 실시예에 따르면, 끝인사 평가 방법(900)은 프로세서(예를 들어, 음성 상당 품질 평가 시스템의 적어도 하나의 프로세서)에 의해 수행될 수 있다. 여기서, 끝인사는 음성 상담이 종료되는 마무리 부분에서 추가문의 여부, 소속, 이름 및 인사말로 구성될 수 있다. 즉, 프로세서는 마무리 부분에서 상담사가 추가문의 여부, 소속, 이름 및/또는 인사말을 포함하는 끝인사를 발화하였는지 여부를 기초로 끝인사 평가를 수행할 수 있다. 도시된 바와 같이, 끝인사 평가 방법(900)은 프로세서가 복수의 텍스트 문장 중 마지막 특정 개수(예: 5개)에 대응하는 제4 세트의 텍스트 문장을 추출함으로써 개시될 수 있다(S910).9 is a flow chart showing an example of an end greeting evaluation method 900 according to an embodiment of the present invention. According to one embodiment, the end greeting evaluation method 900 may be performed by a processor (eg, at least one processor of a speech equivalent quality evaluation system). Here, the end greeting may include whether or not to make an additional inquiry, affiliation, name, and greeting at the end of the voice counseling. That is, the processor may perform a closing greeting evaluation based on whether or not the counselor has uttered a closing greeting including whether or not additional inquiries are made, affiliation, name, and/or greeting at the end. As shown, the final salutation evaluation method 900 may be initiated by a processor extracting a fourth set of text sentences corresponding to the last specific number (eg, 5) of a plurality of text sentences (S910).

프로세서는 개체명 인식(NER; Named Entity Recognition) 알고리즘을 이용하여 제4 세트의 텍스트 문장 상에 상담사의 이름이 포함되어 있는지 여부를 판정할 수 있다. 즉, 프로세서는 개체명 인식을 통해 사람 이름의 형태를 갖는 명사를 검출하여, 텍스트 문장 내에 상담사의 이름이 포함되어 있는지 여부를 판정할 수 있다. 이 경우, 프로세서는 트랜스포머(transformer) 기반의 BERT 모델 또는 ELECTRA 모델 등의 기계학습 모델을 이용하여 개체명 인식을 수행할 수 있다.The processor may determine whether the counselor's name is included in the fourth set of text sentences using a Named Entity Recognition (NER) algorithm. That is, the processor may determine whether the counselor's name is included in the text sentence by detecting a noun having a form of a person's name through entity name recognition. In this case, the processor may perform entity name recognition using a transformer-based machine learning model such as a BERT model or an ELECTRA model.

추가적으로, 프로세서는 제4 세트의 텍스트 문장을 복수의 형태소로 분할할 수 있다. 또한, 프로세서는 분할된 복수의 형태소 상에 마무리 키워드가 포함되어 있는지 여부를 판정할 수 있다. 여기서, 마무리 키워드는 끝인사, 추가문의 확인 등과 연관된 키워드일 수 있으나, 이에 한정되지 않으며, 미리 정해진 상담사의 소속, 이름 등에 대한 키워드를 더 포함할 수 있다. 예를 들어, 프로세서는 형태소 분석을 사용하여, 각각의 요소가 발화 내에 존재하는지 여부를 확인할 수 있다. 인사말은 다양한 변이가 허용될 수 있으므로, 프로세서는 형태소 분석 결과 중 마무리 키워드가 포함된 경우, 인사말이 존재한다고 판정할 수 있다.Additionally, the processor may divide the fourth set of text sentences into a plurality of morphemes. In addition, the processor may determine whether a finishing keyword is included on a plurality of divided morphemes. Here, the closing keyword may be a keyword associated with a closing greeting, confirmation of an additional inquiry, etc., but is not limited thereto, and may further include keywords related to a predetermined counselor's affiliation, name, and the like. For example, the processor can use morphological analysis to determine whether each element is present within an utterance. Since various variations of the greeting may be allowed, the processor may determine that the greeting exists when an ending keyword is included in the morpheme analysis result.

상술된 방법 및/또는 다양한 실시예들은, 디지털 전자 회로, 컴퓨터 하드웨어, 펌웨어, 소프트웨어 및/또는 이들의 조합으로 실현될 수 있다. 본 발명의 다양한 실시예들은 데이터 처리 장치, 예를 들어, 프로그래밍 가능한 하나 이상의 프로세서 및/또는 하나 이상의 컴퓨팅 장치에 의해 실행되거나, 컴퓨터 판독 가능한 기록 매체 및/또는 컴퓨터 판독 가능한 기록 매체에 저장된 컴퓨터 프로그램으로 구현될 수 있다. 상술된 컴퓨터 프로그램은 컴파일된 언어 또는 해석된 언어를 포함하여 임의의 형태의 프로그래밍 언어로 작성될 수 있으며, 독립 실행형 프로그램, 모듈, 서브 루틴 등의 임의의 형태로 배포될 수 있다. 컴퓨터 프로그램은 하나의 컴퓨팅 장치, 동일한 네트워크를 통해 연결된 복수의 컴퓨팅 장치 및/또는 복수의 상이한 네트워크를 통해 연결되도록 분산된 복수의 컴퓨팅 장치를 통해 배포될 수 있다.The above-described methods and/or various embodiments may be realized with digital electronic circuits, computer hardware, firmware, software, and/or combinations thereof. Various embodiments of the present invention may be performed by a data processing device, eg, one or more programmable processors and/or one or more computing devices, or as a computer readable recording medium and/or a computer program stored on a computer readable recording medium. can be implemented The above-described computer programs may be written in any form of programming language, including compiled or interpreted languages, and may be distributed in any form, such as a stand-alone program, module, or subroutine. A computer program may be distributed over one computing device, multiple computing devices connected through the same network, and/or distributed over multiple computing devices connected through multiple different networks.

상술된 방법 및/또는 다양한 실시예들은, 입력 데이터를 기초로 동작하거나 출력 데이터를 생성함으로써, 임의의 기능, 함수 등을 처리, 저장 및/또는 관리하는 하나 이상의 컴퓨터 프로그램을 실행하도록 구성된 하나 이상의 프로세서에 의해 수행될 수 있다. 예를 들어, 본 발명의 방법 및/또는 다양한 실시예는 FPGA(Field Programmable Gate Array) 또는 ASIC(Application Specific Integrated Circuit)과 같은 특수 목적 논리 회로에 의해 수행될 수 있으며, 본 발명의 방법 및/또는 실시예들을 수행하기 위한 장치 및/또는 시스템은 FPGA 또는 ASIC와 같은 특수 목적 논리 회로로서 구현될 수 있다.The methods and/or various embodiments described above may be performed by one or more processors configured to execute one or more computer programs that process, store, and/or manage any function, function, or the like, by operating on input data or generating output data. can be performed by For example, the method and/or various embodiments of the present invention may be performed by a special purpose logic circuit such as a Field Programmable Gate Array (FPGA) or an Application Specific Integrated Circuit (ASIC), and the method and/or various embodiments of the present invention may be performed. Apparatus and/or systems for performing the embodiments may be implemented as special purpose logic circuits such as FPGAs or ASICs.

컴퓨터 프로그램을 실행하는 하나 이상의 프로세서는, 범용 목적 또는 특수 목적의 마이크로 프로세서 및/또는 임의의 종류의 디지털 컴퓨팅 장치의 하나 이상의 프로세서를 포함할 수 있다. 프로세서는 읽기 전용 메모리, 랜덤 액세스 메모리의 각각으로부터 명령 및/또는 데이터를 수신하거나, 읽기 전용 메모리와 랜덤 액세스 메모리로부터 명령 및/또는 데이터를 수신할 수 있다. 본 발명에서, 방법 및/또는 실시예들을 수행하는 컴퓨팅 장치의 구성 요소들은 명령어들을 실행하기 위한 하나 이상의 프로세서, 명령어들 및/또는 데이터를 저장하기 위한 하나 이상의 메모리 디바이스를 포함할 수 있다.The one or more processors executing the computer program may include a general purpose or special purpose microprocessor and/or one or more processors of any kind of digital computing device. The processor may receive instructions and/or data from each of the read-only memory and the random access memory, or receive instructions and/or data from the read-only memory and the random access memory. In the present invention, components of a computing device performing methods and/or embodiments may include one or more processors for executing instructions, and one or more memory devices for storing instructions and/or data.

일 실시예에 따르면, 컴퓨팅 장치는 데이터를 저장하기 위한 하나 이상의 대용량 저장 장치와 데이터를 주고받을 수 있다. 예를 들어, 컴퓨팅 장치는 자기 디스크(magnetic disc) 또는 광 디스크(optical disc)로부터 데이터를 수신하거나/수신하고, 자기 디스크 또는 광 디스크로 데이터를 전송할 수 있다. 컴퓨터 프로그램과 연관된 명령어들 및/또는 데이터를 저장하기에 적합한 컴퓨터 판독 가능한 저장 매체는, EPROM(Erasable Programmable Read-Only Memory), EEPROM(Electrically Erasable PROM), 플래시 메모리 장치 등의 반도체 메모리 장치를 포함하는 임의의 형태의 비 휘발성 메모리를 포함할 수 있으나, 이에 한정되지 않는다. 예를 들어, 컴퓨터 판독 가능한 저장 매체는 내부 하드 디스크 또는 이동식 디스크와 같은 자기 디스크, 광 자기 디스크, CD-ROM 및 DVD-ROM 디스크를 포함할 수 있다.According to one embodiment, a computing device may exchange data with one or more mass storage devices for storing data. For example, a computing device may receive/receive data from and transfer data to a magnetic or optical disc. A computer-readable storage medium suitable for storing instructions and/or data associated with a computer program includes semiconductor memory devices such as Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable PROM (EEPROM), and flash memory devices. Any type of non-volatile memory may be included, but is not limited thereto. For example, computer readable storage media may include magnetic disks such as internal hard disks or removable disks, magneto-optical disks, CD-ROM and DVD-ROM disks.

사용자와의 상호 작용을 제공하기 위해, 컴퓨팅 장치는 정보를 사용자에게 제공하거나 디스플레이하기 위한 디스플레이 장치(예를 들어, CRT (Cathode Ray Tube), LCD(Liquid Crystal Display) 등) 및 사용자가 컴퓨팅 장치 상에 입력 및/또는 명령 등을 제공할 수 있는 포인팅 장치(예를 들어, 키보드, 마우스, 트랙볼 등)를 포함할 수 있으나, 이에 한정되지 않는다. 즉, 컴퓨팅 장치는 사용자와의 상호 작용을 제공하기 위한 임의의 다른 종류의 장치들을 더 포함할 수 있다. 예를 들어, 컴퓨팅 장치는 사용자와의 상호 작용을 위해, 시각적 피드백, 청각 피드백 및/또는 촉각 피드백 등을 포함하는 임의의 형태의 감각 피트백을 사용자에게 제공할 수 있다. 이에 대해, 사용자는 시각, 음성, 동작 등의 다양한 제스처를 통해 컴퓨팅 장치로 입력을 제공할 수 있다.To provide interaction with a user, a computing device includes a display device (eg, a cathode ray tube (CRT), a liquid crystal display (LCD), etc.) It may include a pointing device (eg, a keyboard, mouse, trackball, etc.) capable of providing input and/or commands to, but is not limited thereto. That is, the computing device may further include any other type of device for providing interaction with a user. For example, a computing device may provide any form of sensory feedback to a user for interaction with the user, including visual feedback, auditory feedback, and/or tactile feedback. In this regard, the user may provide input to the computing device through various gestures such as visual, voice, and motion.

본 발명에서, 다양한 실시예들은 백엔드 구성 요소(예: 데이터 서버), 미들웨어 구성 요소(예: 애플리케이션 서버) 및/또는 프론트 엔드 구성 요소를 포함하는 컴퓨팅 시스템에서 구현될 수 있다. 이 경우, 구성 요소들은 통신 네트워크와 같은 디지털 데이터 통신의 임의의 형태 또는 매체에 의해 상호 연결될 수 있다. 예를 들어, 통신 네트워크는 LAN(Local Area Network), WAN(Wide Area Network) 등을 포함할 수 있다.In the present invention, various embodiments may be implemented in a computing system including a back-end component (eg, a data server), a middleware component (eg, an application server), and/or a front-end component. In this case, the components may be interconnected by any form or medium of digital data communication, such as a communication network. For example, the communication network may include a local area network (LAN), a wide area network (WAN), and the like.

본 명세서에서 기술된 예시적인 실시예들에 기반한 컴퓨팅 장치는, 사용자 디바이스, 사용자 인터페이스(UI) 디바이스, 사용자 단말 또는 클라이언트 디바이스를 포함하여 사용자와 상호 작용하도록 구성된 하드웨어 및/또는 소프트웨어를 사용하여 구현될 수 있다. 예를 들어, 컴퓨팅 장치는 랩톱(laptop) 컴퓨터와 같은 휴대용 컴퓨팅 장치를 포함할 수 있다. 추가적으로 또는 대안적으로, 컴퓨팅 장치는, PDA(Personal Digital Assistants), 태블릿 PC, 게임 콘솔(game console), 웨어러블 디바이스(wearable device), IoT(internet of things) 디바이스, VR(virtual reality) 디바이스, AR(augmented reality) 디바이스 등을 포함할 수 있으나, 이에 한정되지 않는다. 컴퓨팅 장치는 사용자와 상호 작용하도록 구성된 다른 유형의 장치를 더 포함할 수 있다. 또한, 컴퓨팅 장치는 이동 통신 네트워크 등의 네트워크를 통한 무선 통신에 적합한 휴대용 통신 디바이스(예를 들어, 이동 전화, 스마트 전화, 무선 셀룰러 전화 등) 등을 포함할 수 있다. 컴퓨팅 장치는, 무선 주파수(RF; Radio Frequency), 마이크로파 주파수(MWF; Microwave Frequency) 및/또는 적외선 주파수(IRF; Infrared Ray Frequency)와 같은 무선 통신 기술들 및/또는 프로토콜들을 사용하여 네트워크 서버와 무선으로 통신하도록 구성될 수 있다.A computing device based on the example embodiments described herein may be implemented using hardware and/or software configured to interact with a user, including a user device, user interface (UI) device, user terminal, or client device. can For example, the computing device may include a portable computing device such as a laptop computer. Additionally or alternatively, the computing device may include personal digital assistants (PDAs), tablet PCs, game consoles, wearable devices, internet of things (IoT) devices, virtual reality (VR) devices, AR (augmented reality) device, etc. may be included, but is not limited thereto. A computing device may further include other types of devices configured to interact with a user. Further, the computing device may include a portable communication device (eg, a mobile phone, smart phone, wireless cellular phone, etc.) suitable for wireless communication over a network, such as a mobile communication network. A computing device communicates wirelessly with a network server using wireless communication technologies and/or protocols such as radio frequency (RF), microwave frequency (MWF) and/or infrared ray frequency (IRF). It can be configured to communicate with.

본 발명에서 특정 구조적 및 기능적 세부 사항을 포함하는 다양한 실시예들은 예시적인 것이다. 따라서, 본 발명의 실시예들은 상술된 것으로 한정되지 않으며, 여러 가지 다른 형태로 구현될 수 있다. 또한, 본 발명에서 사용된 용어는 일부 실시예를 설명하기 위한 것이며 실시예를 제한하는 것으로 해석되지 않는다. 예를 들어, 단수형 단어 및 상기는 문맥상 달리 명확하게 나타내지 않는 한 복수형도 포함하는 것으로 해석될 수 있다.The various embodiments herein, including specific structural and functional details, are exemplary. Accordingly, the embodiments of the present invention are not limited to those described above and may be implemented in many different forms. In addition, terms used in the present invention are for describing some embodiments and are not construed as limiting the embodiments. For example, the singular and the above may be construed to include the plural as well, unless the context clearly dictates otherwise.

본 발명에서, 달리 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함하여 본 명세서에서 사용되는 모든 용어는 이러한 개념이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 갖는다. 또한, 사전에 정의된 용어와 같이 일반적으로 사용되는 용어들은 관련 기술의 맥락에서의 의미와 일치하는 의미를 갖는 것으로 해석되어야 한다.In the present invention, unless defined otherwise, all terms used in this specification, including technical or scientific terms, have the same meaning as commonly understood by a person of ordinary skill in the art to which such concept belongs. . In addition, terms commonly used, such as terms defined in a dictionary, should be interpreted as having a meaning consistent with the meaning in the context of the related technology.

본 명세서에서는 본 발명이 일부 실시예들과 관련하여 설명되었지만, 본 발명의 발명이 속하는 기술분야의 통상의 기술자가 이해할 수 있는 본 발명의 범위를 벗어나지 않는 범위에서 다양한 변형 및 변경이 이루어질 수 있다. 또한, 그러한 변형 및 변경은 본 명세서에 첨부된 특허청구의 범위 내에 속하는 것으로 생각되어야 한다.Although the present invention has been described in relation to some embodiments in this specification, various modifications and changes can be made without departing from the scope of the present invention that can be understood by those skilled in the art. Moreover, such modifications and variations are intended to fall within the scope of the claims appended hereto.

110: 상담사 120: 고객
130: 음성 상담 녹취 서버 140: 녹취 DB
142: 녹취 데이터 150: 음성 상담 품질 평가 시스템
152: 평가 보고서110: Counselor 120: Customer
130: Voice consultation recording server 140: Recording DB
142: recorded data 150: voice consultation quality evaluation system
152: evaluation report

Claims

적어도 하나의 프로세서에 의해 수행되는 음성 상담의 품질 평가 방법으로서,
음성 상담이 수행된 경우, 상기 음성 상담과 연관된 녹취 데이터를 수신하는 단계;
상기 수신된 녹취 데이터를 기초로, 시간에 따른 상담사 및 고객의 음성에 대응하는 텍스트 데이터를 생성하는 단계;
상기 생성된 텍스트 데이터를 문장 단위로 분할하여 복수의 텍스트 문장을 생성하는 단계;
상기 녹취 데이터 상에 포함된 무음 구간을 추출하고, 상기 추출된 무음 구간을 기초로 상기 생성된 복수의 텍스트 문장의 정확성을 검증하는 단계; 및
상기 음성 상담의 시간 구간 및 상기 시간에 따른, 검증된 복수의 텍스트 문장을 이용하여 상기 음성 상담에 대한 품질 평가를 수행하는 단계;
를 포함하는, 음성 상담의 품질 평가 방법.
A method for evaluating the quality of voice consultation performed by at least one processor,
When voice counseling is performed, receiving recorded data associated with the voice counseling;
generating text data corresponding to voices of the counselor and the customer over time based on the received recorded data;
generating a plurality of text sentences by dividing the generated text data into sentence units;
extracting a silent section included in the recorded data and verifying accuracy of the plurality of generated text sentences based on the extracted silent section; and
performing a quality evaluation on the voice counseling using a time interval of the voice counseling and a plurality of verified text sentences according to the time;
Including, quality evaluation method of voice counseling.

제1항에 있어서,
상기 녹취 데이터 상에 포함된 무음 구간을 추출하고, 상기 추출된 무음 구간을 기초로 상기 생성된 복수의 텍스트 문장의 정확성을 검증하는 단계는,
상기 녹취 데이터와 연관된 멜 스펙트로그램(mel spectrogram)을 생성하고, 상기 생성된 멜 스펙트로그램을 이용하여 상기 무음 구간을 추출하는 단계;
를 포함하는, 음성 상담의 품질 평가 방법.
According to claim 1,
The step of extracting a silent section included in the recorded data and verifying the accuracy of the plurality of generated text sentences based on the extracted silent section,
generating a mel spectrogram associated with the recorded data and extracting the silent section using the generated mel spectrogram;
Including, quality evaluation method of voice counseling.

제1항에 있어서,
상기 음성 상담에 대한 품질 평가를 수행하는 단계는,
상기 복수의 텍스트 문장 중 특정 시간 범위 내에 포함된 제1 세트의 텍스트 문장을 추출하는 단계;
상기 추출된 제1 세트의 텍스트 문장을 복수의 형태소로 분할하는 단계; 및
상기 분할된 복수의 형태소 상에 인사말과 연관된 인사 키워드가 포함되어 있는지 여부를 판정하는 단계;
를 포함하는, 음성 상담의 품질 평가 방법.
According to claim 1,
The step of performing quality evaluation on the voice counseling,
extracting a first set of text sentences included in a specific time range from among the plurality of text sentences;
dividing the extracted first set of text sentences into a plurality of morphemes; and
determining whether a greeting keyword associated with a greeting is included in the divided plurality of morphemes;
Including, quality evaluation method of voice counseling.

제1항에 있어서,
상기 음성 상담에 대한 품질 평가를 수행하는 단계는,
상기 복수의 텍스트 문장을 상기 상담사와 연관된 제2 세트의 텍스트 문장 및 상기 고객과 연관된 제3 세트의 텍스트 문장으로 분할하는 단계; 및
상기 제2 세트의 텍스트 문장의 적어도 일부가 상기 제3 세트의 텍스트 문장의 적어도 일부와 중첩(overlay)되는 횟수를 산출하는 단계;
를 포함하는, 음성 상담의 품질 평가 방법.
According to claim 1,
The step of performing quality evaluation on the voice counseling,
dividing the plurality of text sentences into a second set of text sentences associated with the counselor and a third set of text sentences associated with the customer; and
calculating the number of times at least some of the text sentences in the second set overlap with at least some of the text sentences in the third set;
Including, quality evaluation method of voice counseling.

제1항에 있어서,
상기 음성 상담에 대한 품질 평가를 수행하는 단계는,
상기 음성 상담의 시간 구간 및 시간에 따른 복수의 텍스트 문장을 기초로 특정 시간 동안 음소거(mute)가 발생한 구간을 추출하는 단계;
상기 복수의 텍스트 문장 중 상기 음소거가 발생한 구간 이전의 텍스트 문장을 결정하는 단계; 및
상기 결정된 음소거가 발생한 구간 이전의 텍스트 문장 상에 대기 요청 키워드가 포함되어 있는지 여부를 판정하는 단계;
를 포함하는, 음성 상담의 품질 평가 방법.
According to claim 1,
The step of performing quality evaluation on the voice counseling,
Extracting a section in which mute occurs during a specific time based on the time section of the voice counseling and a plurality of text sentences according to time;
determining a text sentence preceding the muted section from among the plurality of text sentences; and
determining whether a standby request keyword is included in a text sentence preceding the determined muting section;
Including, quality evaluation method of voice counseling.

제1항에 있어서,
상기 음성 상담에 대한 품질 평가를 수행하는 단계는,
상기 복수의 텍스트 문장을 상기 상담사와 연관된 제2 세트의 텍스트 문장 및 상기 고객과 연관된 제3 세트의 텍스트 문장으로 분할하는 단계; 및
상기 제3 세트의 텍스트 문장 중 상기 고객의 개인 정보를 포함하는 개인 정보 텍스트 문장을 추출하는 단계;
상기 제2 세트의 텍스트 문장 중 상기 추출된 개인 정보 텍스트 문장 직후의 텍스트 문장을 추출하는 단계; 및
상기 추출된 개인 정보 텍스트 문장 직후의 텍스트 문장에 상기 고객의 개인 정보에 대응하는 키워드가 포함되어 있는지 여부를 판정하는 단계;
를 포함하는, 음성 상담의 품질 평가 방법.
According to claim 1,
The step of performing quality evaluation on the voice counseling,
dividing the plurality of text sentences into a second set of text sentences associated with the counselor and a third set of text sentences associated with the customer; and
extracting personal information text sentences including personal information of the customer from among the text sentences of the third set;
extracting a text sentence immediately following the extracted personal information text sentence from among the second set of text sentences; and
determining whether a keyword corresponding to the personal information of the customer is included in a text sentence immediately following the extracted personal information text sentence;
Including, quality evaluation method of voice counseling.

제1항에 있어서,
상기 음성 상담에 대한 품질 평가를 수행하는 단계는,
상기 복수의 텍스트 문장 중 마지막 특정 개수에 대응하는 제4 세트의 텍스트 문장을 추출하는 단계; 및
개체명 인식(NER; Named Entity Recognition) 알고리즘을 이용하여 상기 제4 세트의 텍스트 문장 상에 상기 상담사와 연관된 정보가 포함되어 있는지 여부를 판정하는 단계;
를 포함하는, 음성 상담의 품질 평가 방법.
According to claim 1,
The step of performing quality evaluation on the voice counseling,
extracting a fourth set of text sentences corresponding to the last specific number of text sentences among the plurality of text sentences; and
determining whether information associated with the counselor is included in the fourth set of text sentences using a Named Entity Recognition (NER) algorithm;
Including, quality evaluation method of voice counseling.

제1항에 있어서,
상기 음성 상담의 품질 평가에 대한 결과 데이터를 상기 품질 평가의 항목 별로 시각화하여 품질 평가 보고서를 생성하는 단계;
를 더 포함하는, 음성 상담의 품질 평가 방법.
According to claim 1,
generating a quality evaluation report by visualizing result data for quality evaluation of the voice consultation for each quality evaluation item;
Further comprising, quality evaluation method of voice consultation.

제1항 내지 제8항 중 어느 한 항에 따른 음성 상담의 품질 평가 방법을 컴퓨터에서 실행하기 위해 컴퓨터 판독 가능한 기록 매체에 저장된 컴퓨터 프로그램.
A computer program stored in a computer-readable recording medium to execute the quality evaluation method of voice counseling according to any one of claims 1 to 8 in a computer.

음성 상담 품질 평가 시스템으로서,
음성 상담이 수행된 경우, 상기 음성 상담과 연관된 녹취 데이터를 수신하고, 수신된 녹취 데이터를 기초로, 시간에 따른 상담사 및 고객의 음성에 대응하는 텍스트 데이터를 생성하는 텍스트 변환부;
상기 생성된 텍스트 데이터를 문장 단위로 분할하여 복수의 텍스트 문장을 생성하는 문장 생성부;
상기 녹취 데이터 상에 포함된 무음 구간을 추출하고, 상기 추출된 무음 구간을 기초로 상기 생성된 복수의 텍스트 문장의 정확성을 검증하는 문장 검증부; 및
상기 음성 상담의 시간 구간 및 상기 시간에 따른, 검증된 복수의 텍스트 문장을 이용하여 상기 음성 상담에 대한 품질 평가를 수행하는 품질 평가 수행부;
를 포함하는, 음성 상담 품질 평가 시스템.As a voice consultation quality evaluation system,
a text converter for receiving recorded data related to the voice counseling and generating text data corresponding to the voices of the counselor and the customer over time based on the received recorded data when voice counseling is performed;
a sentence generation unit generating a plurality of text sentences by dividing the generated text data into sentence units;
a sentence verification unit extracting a silent section included in the recorded data and verifying accuracy of the plurality of generated text sentences based on the extracted silent section; and
a quality evaluation unit performing quality evaluation on the voice counseling by using a time interval of the voice counseling and a plurality of verified text sentences according to the time;
Including, voice consultation quality evaluation system.