WO2014069121A1

WO2014069121A1 - Conversation analysis device and conversation analysis method

Info

Publication number: WO2014069121A1
Application number: PCT/JP2013/075243
Authority: WO
Inventors: 真寺尾; 祥史大西; 真宏谷; 岡部　浩司
Original assignee: 日本電気株式会社
Priority date: 2012-10-31
Filing date: 2013-09-19
Publication date: 2014-05-08
Also published as: JPWO2014069121A1; JP6365304B2

Abstract

Provided is a conversation analysis device comprising: an expression detection unit that detects data related to thanking expressions uttered by a first conversation participant and/or data related to apology expressions uttered by a second conversation participant as specific expression data from data corresponding to audio of only the closing segment of a conversation between the first conversation participant and the second conversation participant; and an estimation unit that estimates the degree of satisfaction or dissatisfaction of the first conversation participant in the conversation in accordance with the detection result of the specific expression data.

Description

会話分析装置及び会話分析方法Conversation analyzer and conversation analysis method

　本発明は、会話の分析技術に関する。 The present invention relates to a conversation analysis technique.

　会話を分析する技術の一例として、通話データを分析する技術がある。例えば、コールセンタ、コンタクトセンタなどと呼ばれる部署で行われる通話のデータが分析される。以降、このような、商品やサービスに関する問い合わせや苦情や注文といった顧客からの電話に応対する業務を専門的に行う部署をコンタクトセンタと表記する。 An example of a technology for analyzing conversation is a technology for analyzing call data. For example, data of a call performed in a department called a call center or a contact center is analyzed. Hereinafter, such a department that specializes in the business of responding to customer calls such as inquiries, complaints and orders regarding products and services will be referred to as a contact center.

　コンタクトセンタに寄せられる顧客の声には、顧客ニーズや満足度などが反映されている場合が多く、顧客との通話からそのような顧客の感情やニーズを抽出することは、リピータ顧客を増加させるために企業にとって大変重要である。そこで、音声を分析することで、顧客の感情（怒り、苛立ち、不快感など）等を抽出する各種手法が提案されている。下記特許文献１には、顧客とオペレータとの通話内容に対して音声認識を行うことによりその通話にクレーム時に発せられるキーワードが含まれるかどうかを判定し、この判定結果により顧客のＣＳ（顧客満足度）レベルを判断する手法が提案されている。 Customer feedback from contact centers often reflects customer needs and satisfaction, and extracting such customer emotions and needs from customer calls increases repeat customers. Therefore, it is very important for companies. Therefore, various methods have been proposed for extracting customer emotions (anger, irritation, discomfort, etc.) by analyzing voice. In the following Patent Document 1, it is determined whether or not a keyword issued at the time of complaint is included in the call by performing voice recognition on the call contents between the customer and the operator, and the customer's CS (customer satisfaction) is determined based on the determination result. A method to judge the level is proposed.

特開２００５－２５２８４５号公報JP 2005-252845 A

　しかしながら、上記提案手法では、会話に参加する者（以降、会話参加者と表記する）、即ち、顧客の満足度又は不満度を適切に判断できない可能性がある。例えば、満足感を表し得る表現（キーワード）であっても、満足感とは無関係に発声される場合があるからである。「ありがとうございます」というお礼表現は、満足感を表し得る。しかしながら、その表現は、以下のような対話で用いられる場合、必ずしも満足感を表しているわけではない。
　オペレータ「その症状でしたら、まずそのＰＣを再起動して頂いて、～」
　顧客「ありがとうございます。ただ、それはもう試しました。」 However, in the proposed method, there is a possibility that the degree of satisfaction or dissatisfaction of the person who participates in the conversation (hereinafter referred to as “conversation participant”), that is, the degree of satisfaction or dissatisfaction of the customer cannot be determined appropriately. For example, even expressions (keywords) that can express satisfaction may be uttered regardless of satisfaction. The thank-you expression of “Thank you” can express satisfaction. However, the expression does not necessarily indicate satisfaction when used in the following dialogue.
Operator “If this is the case, please reboot the PC first.”
Customer “Thank you. I just tried it.”

　また、上記提案手法で用いられる音声認識では、挿入誤り、脱落誤りといった誤認識が生じ得る。その誤認識によれば、その会話（通話）で実際には発声されていない表現が認識されたり、その会話で実際に発声されている表現が認識されなかったりする。これにより、抽出すべきキーワードが誤検出されたり脱落したりしてしまい、ひいては、そのキーワードに基づく顧客の満足度又は不満度の推定精度が低下することになる。 Also, in the speech recognition used in the above proposed method, misrecognition such as insertion error and omission error may occur. According to the misrecognition, an expression that is not actually uttered in the conversation (call) is recognized, or an expression that is actually uttered in the conversation is not recognized. As a result, the keyword to be extracted is erroneously detected or dropped, and as a result, the estimation accuracy of customer satisfaction or dissatisfaction based on the keyword decreases.

　本発明は、このような事情に鑑みてなされたものであり、会話参加者の満足度又は不満度を高精度に推定する技術を提供する。ここで、会話参加者の満足度又は不満度とは、会話において少なくとも一方の会話参加者が感じたであろう満足感又は不満感の程度を意味する。また、満足感の程度は、満足感有り又は満足感なしのみを示すことも含み、不満感の程度は、不満感有り又は不満感なしのみを示すことも含む。 The present invention has been made in view of such circumstances, and provides a technique for accurately estimating the degree of satisfaction or dissatisfaction of a conversation participant. Here, the degree of satisfaction or dissatisfaction of a conversation participant means the degree of satisfaction or dissatisfaction that at least one conversation participant felt in the conversation. In addition, the degree of satisfaction includes only indicating satisfaction or not, and the degree of dissatisfaction includes indicating only dissatisfaction or not.

　本発明の各態様では、上述した課題を解決するために、それぞれ以下の構成を採用する。 Each aspect of the present invention employs the following configurations in order to solve the above-described problems.

　第１の態様は、会話分析装置に関する。第１態様に係る会話分析装置は、第１会話参加者と第２会話参加者との間の会話のクロージング区間のみの音声に対応するデータから、第１会話参加者により発声されたお礼表現データ及び第２会話参加者により発声された謝罪表現データの少なくとも一方を特定表現データとして検出する表現検出部と、特定表現データの検出結果に応じて、当該会話における第１会話参加者の満足度又は不満度を推定する推定部と、を有する。 The first aspect relates to a conversation analysis device. The conversation analysis device according to the first aspect provides thank-you expression data uttered by the first conversation participant from data corresponding to the voice of only the closing section of the conversation between the first conversation participant and the second conversation participant. And an expression detection unit for detecting at least one of the apology expression data uttered by the second conversation participant as specific expression data, and the satisfaction level of the first conversation participant in the conversation or An estimation unit for estimating the degree of dissatisfaction.

　第２の態様は、少なくとも１つのコンピュータにより実行される会話分析方法に関する。第２態様に係る会話分析方法は、第１会話参加者と第２会話参加者との間の会話のクロージング区間のみの音声に対応するデータから、第１会話参加者により発声されたお礼表現データ及び第２会話参加者により発声された謝罪表現データの少なくとも一方を特定表現データとして検出し、特定表現データの検出結果に応じて、当該会話における第１会話参加者の満足度又は不満度を推定する、ことを含む。 The second aspect relates to a conversation analysis method executed by at least one computer. The conversation analysis method according to the second aspect includes thank-you expression data uttered by the first conversation participant from data corresponding to the voice of only the closing section of the conversation between the first conversation participant and the second conversation participant. And at least one of the apology expression data uttered by the second conversation participant is detected as specific expression data, and the degree of satisfaction or dissatisfaction of the first conversation participant in the conversation is estimated according to the detection result of the specific expression data Including.

　なお、本発明の他の態様としては、上記第１の態様における各構成を少なくとも１つのコンピュータに実現させるプログラムであってもよいし、このようなプログラムを記録したコンピュータが読み取り可能な記録媒体であってもよい。この記録媒体は、非一時的な有形の媒体を含む。 Another aspect of the present invention may be a program that causes at least one computer to implement each configuration in the first aspect, or a computer-readable recording medium that records such a program. There may be. This recording medium includes a non-transitory tangible medium.

　上記各態様によれば、会話参加者の満足度又は不満度を高精度に推定する技術を提供することができる。 According to each of the above aspects, it is possible to provide a technique for accurately estimating the degree of satisfaction or dissatisfaction of a conversation participant.

　上述した目的、およびその他の目的、特徴および利点は、以下に述べる好適な実施の形態、およびそれに付随する以下の図面によってさらに明らかになる。 The above-described object and other objects, features, and advantages will be further clarified by a preferred embodiment described below and the following drawings attached thereto.

第１実施形態におけるコンタクトセンタシステムの構成例を示す概念図である。It is a conceptual diagram which shows the structural example of the contact center system in 1st Embodiment. 第１実施形態における通話分析サーバの処理構成例を概念的に示す図である。It is a figure which shows notionally the process structural example of the call analysis server in 1st Embodiment. 第１実施形態における通話分析サーバの動作例を示すフローチャートである。It is a flowchart which shows the operation example of the telephone call analysis server in 1st Embodiment. 第２実施形態における通話分析サーバの処理構成例を概念的に示す図である。It is a figure which shows notionally the process structural example of the call analysis server in 2nd Embodiment. 第２実施形態における通話分析サーバの動作例を示すフローチャートである。It is a flowchart which shows the operation example of the call analysis server in 2nd Embodiment. 第１変形例における通話分析サーバの処理構成例を概念的に示す図である。It is a figure which shows notionally the process structural example of the call analysis server in a 1st modification. 第２変形例における通話分析サーバの処理構成例を概念的に示す図である。It is a figure which shows notionally the process structural example of the call analysis server in a 2nd modification.

　以下、本発明の実施の形態について説明する。なお、以下に挙げる各実施形態はそれぞれ例示であり、本発明は以下の各実施形態の構成に限定されない。 Hereinafter, embodiments of the present invention will be described. In addition, each embodiment given below is an illustration, respectively, and this invention is not limited to the structure of each following embodiment.

　本実施形態に係る会話分析装置は、第１会話参加者と第２会話参加者との間の会話のクロージング区間のみの音声に対応するデータから、第１会話参加者により発声されたお礼表現データ及び第２会話参加者により発声された謝罪表現データの少なくとも一方を特定表現データとして検出する表現検出部と、特定表現データの検出結果に応じて、当該会話における第１会話参加者の満足度又は不満度を推定する推定部と、を有する。 The conversation analysis apparatus according to the present embodiment provides thank-you expression data uttered by the first conversation participant from data corresponding to the voice of only the closing section of the conversation between the first conversation participant and the second conversation participant. And an expression detection unit for detecting at least one of the apology expression data uttered by the second conversation participant as specific expression data, and the satisfaction level of the first conversation participant in the conversation or An estimation unit for estimating the degree of dissatisfaction.

　本実施形態に係る会話分析方法は、少なくとも１つのコンピュータにより実行され、第１会話参加者と第２会話参加者との間の会話のクロージング区間のみの音声に対応するデータから、第１会話参加者により発声されたお礼表現データ及び第２会話参加者により発声された謝罪表現データの少なくとも一方を特定表現データとして検出し、特定表現データの検出結果に応じて、当該会話における第１会話参加者の満足度又は不満度を推定する、ことを含む。 The conversation analysis method according to the present embodiment is executed by at least one computer, and from the data corresponding to the voice of only the closing section of the conversation between the first conversation participant and the second conversation participant, the first conversation participation At least one of the thank-you expression data uttered by the person and the apology expression data uttered by the second conversation participant is detected as specific expression data, and the first conversation participant in the conversation according to the detection result of the specific expression data Estimating the degree of satisfaction or dissatisfaction.

　ここで、会話とは、２以上の話者が、言語の発声などによる意思表示によって、話をすることを意味する。会話には、銀行の窓口や店舗のレジ等のように、会話参加者が直接、話をする形態もあれば、通話機を用いた通話やテレビ会議等のように、離れた位置にいる会話参加者同士が話をする形態もあり得る。本実施形態では、対象会話の内容や形態は制限されないが、友達同士の会話など私的な会話よりは、公的な会話のほうが対象会話として望ましい。 Here, “conversation” means that two or more speakers speak by expressing their intentions by uttering a language. In some conversations, conversation participants can speak directly, such as at bank counters and cash registers at stores, and in remote conversations such as telephone conversations and video conferencing. There may be a form in which the participants talk. In the present embodiment, the content and form of the target conversation are not limited, but a public conversation is more preferable as the target conversation than a private conversation such as a conversation between friends.

　また、上述のお礼表現データ、謝罪表現データ及び特定表現データとは、単語、複数単語の並びである単語列、又は、会話内の或る発話において散在する単語の集合である。以降、お礼表現データとお礼表現、謝罪表現データと謝罪表現、特定表現データと特定表現とは、区別されず用いられる場合がある。お礼表現データとしては、単語の「ありがとう」、単語列の「ありがとう」、「ござい」及び「ます」、単語集合の「本当」及び「ありがとう」などがあり得る。また、謝罪表現データとしては、単語の「申し訳」、単語列の「申し訳」、「ござい」、「ませ」及び「ん」などがあり得る。 Further, the above-mentioned thank-you expression data, apology expression data, and specific expression data are a word, a word string that is a sequence of a plurality of words, or a set of words scattered in a certain utterance in a conversation. Hereinafter, the thank-you expression data and the thank-you expression, the apology expression data and the apology expression, the specific expression data and the specific expression may be used without distinction. As a thank-you expression data, the word “Thank you”, the word string “Thank you”, “Yes” and “Masu”, the word set “True” and “Thank you” can be included. Further, the apology expression data may include the word “sorry”, the word string “sorry”, “present”, “no”, “n”, and the like.

　会話参加者は、その会話において満足感を得ると、お礼表現を発する場合が多い。一方、会話参加者は、自身の側の非により会話相手が不満を感じていることを察すると、謝罪表現を発する場合が多い。しかしながら、上述したように、お礼表現でも、会話参加者の満足感とは無関係に発声される場合がある。同様に、謝罪表現についても、会話相手の不満とは無関係に発声される場合がある。例えば、会話参加者が、その会話の場から離れる際に、「申し訳ございませんが、少しお待ち下さい。」というように定型的に謝罪表現を発する場合がある。この場合、その会話参加者は、会話相手の感情とは直接関係なく、その謝罪表現を発している。 Talkers often express their gratitude when they are satisfied with the conversation. On the other hand, a conversation participant often gives an apology when he / she feels that his / her conversation partner is dissatisfied due to his / her disagreement. However, as described above, even a thank-you expression may be uttered regardless of the satisfaction of the conversation participants. Similarly, an apology may be uttered regardless of the dissatisfaction of the conversation partner. For example, when a conversation participant leaves the place of the conversation, there is a case where a typical apology expression is issued such as “I'm sorry, please wait for a while”. In this case, the conversation participant is expressing his apology regardless of the emotion of the conversation partner.

　本発明者らは、会話の終了過程において、その会話全体に関する会話参加者の感情、特に、満足感及び不満感が表出し易いことを見出し、この知見から更に、会話の終了過程で発声されたお礼表現及び謝罪表現は、会話参加者の感情を表す可能性が高いことを見出した。 The present inventors have found that the conversation participants' feelings regarding the whole conversation, especially satisfaction and dissatisfaction, are easily expressed in the conversation end process, and this knowledge was further spoken during the conversation end process. We found that thankfulness and apology are likely to express the emotions of the participants.

　そこで、本実施形態は、会話の終了過程を意味するクロージング区間という概念を設け、このクロージング区間のみの音声に対応するデータから、第１会話参加者により発声されたお礼及び第２会話参加者により発声された謝罪の少なくとも一方を表す特定表現データが検出される。例えば、クロージング区間の終端時間は会話の終了時間に設定される。会話の終了は、例えば、通話の場合には通話の切断で表され、通話以外の会話の場合には会話参加者の解散により表される。クロージング区間の始端時間の決定方法は様々である。また、会話参加者の止むを得ない事情などの特定突発原因で会話が終了される場合には、その会話にはクロージング区間が存在しない場合もあり得る。 In view of this, the present embodiment provides the concept of a closing section that means the conversation end process, and the thanks given by the first conversation participant and the second conversation participant from the data corresponding to the voice of only this closing section. Specific expression data representing at least one of the spoken apologies is detected. For example, the closing time of the closing section is set to the conversation end time. The end of the conversation is represented, for example, by disconnecting the call in the case of a call, and by the dissolution of the conversation participants in the case of a conversation other than a call. There are various methods for determining the start time of the closing section. In addition, when a conversation is terminated due to a specific sudden cause such as a situation in which the conversation participant cannot help, there may be no closing section in the conversation.

　このように、当該特定表現データの検出対象をクロージング区間の音声に対応するデータに絞ることで、本実施形態では、第１会話参加者の満足感及び不満感とは無関係に発声されるお礼表現及び謝罪表現を、第１会話参加者の満足度又は不満度の推定材料から排除する。 In this way, in this embodiment, by narrowing down the detection target of the specific expression data to data corresponding to the voice of the closing section, in the present embodiment, a thank expression that is uttered regardless of the satisfaction and dissatisfaction of the first conversation participant And the apology expression is excluded from the estimation material of satisfaction or dissatisfaction of the first conversation participant.

　更に、上述のように、当該特定表現データの検出対象をクロージング区間の音声に対応するデータに絞ることで、本実施形態によれば、クロージング区間外の音声に対する音声認識の誤認識に伴う雑音情報も、第１会話参加者の満足度又は不満度の推定材料から除外することができる。具体的には、クロージング区間外で会話参加者から実際には発声されていないお礼表現又は謝罪表現が誤認識されている場合に、その誤認識されたお礼表現及び謝罪表現が当該推定材料から除外される。 Further, as described above, by narrowing down the detection target of the specific expression data to data corresponding to the voice in the closing section, according to the present embodiment, noise information accompanying voice recognition misrecognition for voice outside the closing section. Can also be excluded from the estimation material of satisfaction or dissatisfaction of the first conversation participant. Specifically, if a thank-you expression or apology expression that is not actually uttered by a conversation participant outside the closing section is misrecognized, the mis-recognized thank-you expression and apology expression are excluded from the estimated material. Is done.

　結果、本実施形態では、会話参加者の満足度又は不満度を表す可能性が高い特定表現データのみを対象に、第１会話参加者の満足度又は不満度が推定される。従って、本実施形態によれば、第１会話参加者の満足感又は不満感を反映していない特定表現及び音声認識の誤認識に基づく雑音データを除いた純度の高い当該特定表現データにより、会話参加者の満足度又は不満度を高精度に推定することができる。 As a result, in the present embodiment, the satisfaction level or dissatisfaction level of the first conversation participant is estimated only for specific expression data that is likely to represent the satisfaction level or dissatisfaction level of the conversation participant. Therefore, according to the present embodiment, the specific expression that does not reflect the satisfaction or dissatisfaction of the first conversation participant and the specific expression data having high purity excluding noise data based on misrecognition of speech recognition, Participant satisfaction or dissatisfaction can be estimated with high accuracy.

　以下、上述の実施形態について更に詳細を説明する。以下には、詳細実施形態として、第１実施形態及び第２実施形態を例示する。以下の各実施形態は、上述の会話分析装置及び会話分析方法をコンタクトセンタシステムに適用した場合の例である。なお、上述の会話分析装置及び会話分析方法は、通話データを扱うコンタクトセンタシステムへの適用に限定されるものではなく、会話データを扱う様々な態様に適用可能である。例えば、それらは、コンタクトセンタ以外の社内の通話管理システムや、個人が所有する、ＰＣ（Personal Computer）、固定電話機、携帯電話機、タブレット端末、スマートフォン等の通話端末などに適用することも可能である。更に、会話データとしては、例えば、銀行の窓口や店舗のレジにおける、担当者と顧客の会話を示すデータなどが例示できる。 Hereinafter, further details of the above-described embodiment will be described. Below, 1st Embodiment and 2nd Embodiment are illustrated as detailed embodiment. Each of the following embodiments is an example when the above-described conversation analysis device and conversation analysis method are applied to a contact center system. The conversation analysis apparatus and the conversation analysis method described above are not limited to application to a contact center system that handles call data, and can be applied to various modes that handle conversation data. For example, they can also be applied to in-house call management systems other than contact centers, and personal terminals owned by PCs (Personal Computers), fixed telephones, mobile phones, tablet terminals, smartphones, etc. . Furthermore, as conversation data, for example, data indicating conversation between a person in charge and a customer at a bank counter or a store cash register can be exemplified.

　以下、各実施形態で扱われる通話とは、或る通話者と或る通話者とがそれぞれ持つ通話端末間が呼接続されてから呼切断されるまでの間の呼を意味する。また、通話の音声中、一人の通話者が声を発している連続領域を発話又は発話区間と表記する。例えば、発話区間は、通話者の音声波形において所定値以上の振幅が継続している区間として検出される。通常の通話は、各通話者の発話区間、無声区間などから形成される。 Hereinafter, a call handled in each embodiment refers to a call from when a call terminal possessed by a certain caller and a certain caller is connected between the call connection and the call disconnection. In addition, a continuous area in which a single caller is speaking in a call voice is referred to as an utterance or an utterance section. For example, the speech segment is detected as a segment in which the amplitude of a predetermined value or more continues in the voice waveform of the caller. A normal call is formed from each speaker's utterance section, silent section, and the like.

　［第１実施形態］
　〔システム構成〕
　図１は、第１実施形態におけるコンタクトセンタシステム１の構成例を示す概念図である。第１実施形態におけるコンタクトセンタシステム１は、交換機（ＰＢＸ）５、複数のオペレータ電話機６、複数のオペレータ端末７、ファイルサーバ９、通話分析サーバ１０等を有する。通話分析サーバ１０は、上述の実施形態における会話分析装置に相当する構成を含む。第１実施形態では、顧客が上述の第１会話参加者に相当し、オペレータが上述の第２会話参加者に相当する。 [First Embodiment]
〔System configuration〕
FIG. 1 is a conceptual diagram showing a configuration example of a contact center system 1 in the first embodiment. The contact center system 1 in the first embodiment includes an exchange (PBX) 5, a plurality of operator telephones 6, a plurality of operator terminals 7, a file server 9, a call analysis server 10, and the like. The call analysis server 10 includes a configuration corresponding to the conversation analysis device in the above-described embodiment. In the first embodiment, the customer corresponds to the first conversation participant described above, and the operator corresponds to the second conversation participant described above.

　交換機５は、通信網２を介して、顧客により利用される、ＰＣ、固定電話機、携帯電話機、タブレット端末、スマートフォン等の通話端末（顧客電話機）３と通信可能に接続されている。通信網２は、インターネットやＰＳＴＮ（Public Switched Telephone Network）等のような公衆網、無線通信ネットワーク等である。更に、交換機５は、コンタクトセンタの各オペレータが用いる各オペレータ電話機６とそれぞれ接続される。交換機５は、顧客からの呼を受け、その呼に応じたオペレータのオペレータ電話機６にその呼を接続する。 The exchange 5 is communicably connected via a communication network 2 to a call terminal (customer telephone) 3 such as a PC, a fixed telephone, a mobile phone, a tablet terminal, or a smartphone that is used by a customer. The communication network 2 is a public network such as the Internet or a PSTN (Public Switched Telephone Network), a wireless communication network, or the like. Further, the exchange 5 is connected to each operator telephone 6 used by each operator of the contact center. The exchange 5 receives the call from the customer and connects the call to the operator telephone 6 of the operator corresponding to the call.

　各オペレータは、オペレータ端末７をそれぞれ用いる。各オペレータ端末７は、コンタクトセンタシステム１内の通信網８（ＬＡＮ（Local Area Network）等）に接続される、ＰＣ等のような汎用コンピュータである。例えば、各オペレータ端末７は、各オペレータと顧客との間の通話における顧客の音声データ及びオペレータの音声データをそれぞれ録音する。顧客の音声データとオペレータの音声データとは、混合状態から所定の音声処理により分離されて生成されてもよい。なお、本実施形態は、このような音声データの録音手法及び録音主体を限定しない。各音声データの生成は、オペレータ端末７以外の他の装置（図示せず）により行われてもよい。 Each operator uses an operator terminal 7. Each operator terminal 7 is a general-purpose computer such as a PC connected to a communication network 8 (LAN (Local Area Network) or the like) in the contact center system 1. For example, each operator terminal 7 records customer voice data and operator voice data in a call between each operator and the customer. The customer voice data and the operator voice data may be generated by being separated from the mixed state by predetermined voice processing. Note that this embodiment does not limit the recording method and the recording subject of such audio data. Each voice data may be generated by a device (not shown) other than the operator terminal 7.

　ファイルサーバ９は、一般的なサーバコンピュータにより実現される。ファイルサーバ９は、顧客とオペレータとの間の各通話の通話データを、各通話の識別情報と共にそれぞれ格納する。各通話データには、顧客の音声データとオペレータの音声データとのペア、及び、その通話が切断された時間を示す切断時間データがそれぞれ含まれる。ファイルサーバ９は、顧客及びオペレータの各音声を録音する他の装置（各オペレータ端末７等）から、顧客の音声データとオペレータの音声データとを取得する。また、ファイルサーバ９は、切断時間データを、各オペレータ電話機６、交換機５等から取得する。 The file server 9 is realized by a general server computer. The file server 9 stores the call data of each call between the customer and the operator together with the identification information of each call. Each call data includes a pair of customer voice data and operator voice data, and disconnection time data indicating the time when the call was disconnected. The file server 9 acquires customer voice data and operator voice data from another device (each operator terminal 7 or the like) that records each voice of the customer and the operator. Further, the file server 9 acquires disconnection time data from each operator telephone 6, the exchange 5 and the like.

　通話分析サーバ１０は、ファイルサーバ９に格納される各通話データに関し、顧客の満足度又は不満度を推定する。
　通話分析サーバ１０は、図１に示されるように、ハードウェア構成として、ＣＰＵ（Central Processing Unit）１１、メモリ１２、入出力インタフェース（Ｉ／Ｆ）１３、通信装置１４等を有する。メモリ１２は、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、ハードディスク、可搬型記憶媒体等である。入出力Ｉ／Ｆ１３は、キーボード、マウス等のようなユーザ操作の入力を受け付ける装置、ディスプレイ装置やプリンタ等のようなユーザに情報を提供する装置などと接続される。通信装置１４は、通信網８を介して、ファイルサーバ９などと通信を行う。なお、通話分析サーバ１０のハードウェア構成は制限されない。 The call analysis server 10 estimates the degree of customer satisfaction or dissatisfaction for each call data stored in the file server 9.
As shown in FIG. 1, the call analysis server 10 includes a CPU (Central Processing Unit) 11, a memory 12, an input / output interface (I / F) 13, a communication device 14 and the like as a hardware configuration. The memory 12 is a RAM (Random Access Memory), a ROM (Read Only Memory), a hard disk, a portable storage medium, or the like. The input / output I / F 13 is connected to a device that accepts an input of a user operation such as a keyboard and a mouse, and a device that provides information to the user such as a display device and a printer. The communication device 14 communicates with the file server 9 and the like via the communication network 8. Note that the hardware configuration of the call analysis server 10 is not limited.

　〔処理構成〕
　図２は、第１実施形態における通話分析サーバ１０の処理構成例を概念的に示す図である。第１実施形態における通話分析サーバ１０は、通話データ取得部２０、音声認識部２１、クロージング検出部２３、特定表現テーブル２５、表現検出部２６、推定部２７等を有する。これら各処理部は、例えば、ＣＰＵ１１によりメモリ１２に格納されるプログラムが実行されることにより実現される。また、当該プログラムは、例えば、ＣＤ（Compact Disc）、メモリカード等のような可搬型記録媒体やネットワーク上の他のコンピュータから入出力Ｉ／Ｆ１３を介してインストールされ、メモリ１２に格納されてもよい。 [Processing configuration]
FIG. 2 is a diagram conceptually illustrating a processing configuration example of the call analysis server 10 in the first embodiment. The call analysis server 10 in the first embodiment includes a call data acquisition unit 20, a voice recognition unit 21, a closing detection unit 23, a specific expression table 25, an expression detection unit 26, an estimation unit 27, and the like. Each of these processing units is realized, for example, by executing a program stored in the memory 12 by the CPU 11. Further, the program may be installed from a portable recording medium such as a CD (Compact Disc) or a memory card, or another computer on the network via the input / output I / F 13 and stored in the memory 12. Good.

　通話データ取得部２０は、ファイルサーバ９から、分析対象となる通話の通話データをその通話の識別情報と共に取得する。通話データには、上述したように、切断時間データが含まれる。当該通話データは、通話分析サーバ１０とファイルサーバ９との間の通信により取得されてもよいし、可搬型記録媒体を介して取得されてもよい。 The call data acquisition unit 20 acquires the call data of the call to be analyzed from the file server 9 together with the identification information of the call. As described above, the call data includes disconnection time data. The call data may be acquired by communication between the call analysis server 10 and the file server 9, or may be acquired via a portable recording medium.

　音声認識部２１は、通話データに含まれるオペレータ及び顧客の各音声データに対して音声認識処理をそれぞれ行う。これにより、音声認識部２１は、当該通話データから、オペレータ音声及び顧客音声に対応する各音声テキストデータ及び各発声時間データをそれぞれ取得する。ここで、音声テキストデータとは、顧客又はオペレータにより発された声がテキスト化された文字データである。各音声テキストデータは、単語（品詞）ごとにそれぞれ区分けされている。各発声時間データには、各音声テキストデータの単語毎の発声時間データが含まれる。 The voice recognition unit 21 performs voice recognition processing on each voice data of the operator and the customer included in the call data. Thereby, the voice recognition unit 21 acquires each voice text data and each utterance time data corresponding to the operator voice and the customer voice from the call data. Here, the voice text data is character data in which a voice uttered by a customer or an operator is converted into text. Each voice text data is divided for each word (part of speech). Each utterance time data includes utterance time data for each word of each voice text data.

　音声認識部２１は、オペレータ及び顧客の各音声データから、オペレータ及び顧客の各発話区間をそれぞれ検出し、各発話区間の始端時間及び終端時間を取得するようにしてもよい。この場合、音声認識部２１は、各音声テキストデータにおける、各発話区間に相当する単語列ごとに発声時間を決定し、各発話区間に相当する単語列ごとの発声時間を上記発声時間データとするようにしてもよい。 The voice recognition unit 21 may detect the utterance sections of the operator and the customer from the voice data of the operator and the customer, respectively, and acquire the start time and the end time of each utterance section. In this case, the speech recognition unit 21 determines an utterance time for each word string corresponding to each utterance section in each speech text data, and uses the utterance time for each word string corresponding to each utterance section as the utterance time data. You may do it.

　音声認識処理では、コンタクトセンタにおける通話用に適合された音声認識パラメータ（以降、基準音声認識パラメータと表記する）が用いられる。この音声認識パラメータとしては、例えば、複数の音声サンプルから学習された、音響モデルと言語モデルとが用いられる。なお、本実施形態では、この音声認識処理には、周知な手法が利用されればよく、その音声認識処理自体及びその音声認識処理で利用される各種音声認識パラメータを制限しない。また、本実施形態では、発話区間の検出手法についても制限しない。 In the voice recognition process, a voice recognition parameter (hereinafter referred to as a reference voice recognition parameter) adapted for a call in a contact center is used. As this speech recognition parameter, for example, an acoustic model and a language model learned from a plurality of speech samples are used. In the present embodiment, a known method may be used for the voice recognition process, and the voice recognition process itself and various voice recognition parameters used in the voice recognition process are not limited. In the present embodiment, the method for detecting the utterance section is not limited.

　音声認識部２１は、クロージング検出部２３及び表現検出部２６の各処理内容に応じて、顧客及びオペレータのいずれか一方の音声データに対してのみ音声認識処理を行うようにしてもよい。例えば、後述するような所定のクロージング文句の検索によりクロージング区間を検出する場合には、クロージング検出部２３は、オペレータの音声テキストデータを必要とする。また、表現検出部２６は、お礼表現データの検出を行う場合には、顧客の音声テキストデータを必要とする。表現検出部２６は、謝罪表現データの検出を行う場合には、オペレータの音声テキストデータを必要とする。 The voice recognition unit 21 may perform the voice recognition process only on the voice data of either the customer or the operator according to the processing contents of the closing detection unit 23 and the expression detection unit 26. For example, when a closing section is detected by searching for a predetermined closing phrase as will be described later, the closing detection unit 23 requires the operator's voice text data. Moreover, the expression detection part 26 requires a customer's audio | voice text data, when detecting thanks expression data. The expression detection unit 26 requires the voice text data of the operator when detecting apology expression data.

　クロージング検出部２３は、通話データに含まれる切断時間データと、音声認識部２１により取得されたオペレータ又は顧客の音声テキストデータ及びその発声時間データとに基づいて、対象通話のクロージング区間を検出する。クロージング検出部２３は、検出されたクロージング区間の始端時間と終端時間とを含むクロージング区間データを生成する。クロージング区間の終端時間は、切断時間データにより示される切断時間に設定される。 The closing detection unit 23 detects the closing period of the target call based on the disconnection time data included in the call data, the voice text data of the operator or customer acquired by the voice recognition unit 21 and the utterance time data thereof. The closing detection unit 23 generates closing section data including the start time and the end time of the detected closing section. The end time of the closing section is set to the cutting time indicated by the cutting time data.

　クロージング区間の始端時間は、例えば、次のように設定される。クロージング検出部２３は、通話の切断時間から所定数分の発話区間の始端時間を、クロージング区間の始端時間に決定する。また、クロージング検出部２３は、通話の切断時間から所定時間遡った時点を、クロージング区間の始端時間に決定してもよい。これらクロージング区間の始端時間の決定方法によれば、表現検出部２６で用いられる、オペレータ及び顧客のいずれか一方の音声テキストデータのみに基づいて、クロージング区間の始端時間を決定することができる。クロージング区間の幅を決めるための所定発話数及び所定時間は、オペレータマニュアル等に記載されるクロージングの定型文や、コンタクトセンタでの音声データの検聴結果等により、予め決定される。 The start time of the closing section is set as follows, for example. The closing detection unit 23 determines the start end time of a predetermined number of utterance sections from the call disconnection time as the start end time of the closing section. In addition, the closing detection unit 23 may determine a time point that is a predetermined time later than the call disconnection time as the start time of the closing section. According to these methods for determining the start time of the closing section, the start time of the closing section can be determined based only on the voice text data of either the operator or the customer used in the expression detection unit 26. The predetermined number of utterances and the predetermined time for determining the width of the closing section are determined in advance according to a closing sentence described in an operator manual or the like, a result of listening to audio data at a contact center, or the like.

　更に、クロージング検出部２３は、オペレータの音声テキストデータの中で、最前の所定のクロージング文句の発声時間を、クロージング区間の始端時間に決定してもよい。ここで、クロージング文句とは、最終挨拶文句のような、通話の終了過程でオペレータが発するフレーズである。コンタクトセンタでは、通話の終了過程でオペレータが発すべきフレーズがマニュアルにより決められている場合が多い。また、コンタクトセンタのような専門部署に属さない一般的な通話者においても、通話の終了過程で発声されるフレーズには或る程度決まったフレーズが存在する。そこで、クロージング検出部２３は、そのような複数の所定のクロージング文句のデータを予め調整可能に保持するようにしてもよい。このような所定クロージング文句のデータは、入力画面等に基づいてユーザにより入力されてもよいし、可搬型記録媒体、他のコンピュータ等から入出力Ｉ／Ｆ１３を経由して取得されてもよい。 Furthermore, the closing detection unit 23 may determine the utterance time of the previous predetermined closing phrase in the voice text data of the operator as the start time of the closing section. Here, the closing phrase is a phrase issued by the operator in the process of ending the call, such as a final greeting phrase. In a contact center, a phrase to be issued by an operator in the process of terminating a call is often determined manually. Also, even for a general caller who does not belong to a specialized department such as a contact center, there is a certain fixed phrase as a phrase uttered in the process of ending a call. Therefore, the closing detection unit 23 may hold data of a plurality of such predetermined closing phrases so as to be adjustable in advance. Such predetermined closing phrase data may be input by a user based on an input screen or the like, or may be acquired from a portable recording medium, another computer, or the like via the input / output I / F 13.

　特定表現テーブル２５は、お礼表現データ及び謝罪表現データを特定表現データとして保持する。具体的には、特定表現テーブル２５は、表現検出部２６による検出対象となる特定表現データをお礼表現データと謝罪表現データとに区別可能に保持する。特定表現テーブル２５は、表現検出部２６の処理に応じて、お礼表現データ及び謝罪表現データのいずれか一方のみを保持するようにしてもよい。 The specific expression table 25 holds thanks expression data and apology expression data as specific expression data. Specifically, the specific expression table 25 holds the specific expression data to be detected by the expression detection unit 26 so that it can be distinguished into thank-you expression data and apology expression data. The specific expression table 25 may hold only one of thank-you expression data and apology expression data in accordance with the processing of the expression detection unit 26.

　表現検出部２６は、検出対象となる特定表現データに応じて次のような３タイプの処理のいずれか１つを実行する。第１の処理タイプは、お礼表現データのみを検出対象とし、第２の処理タイプは、謝罪表現データのみを検出対象とし、第３の処理タイプは、お礼表現データ及び謝罪表現データの両方を検出対象とする。 The expression detection unit 26 executes any one of the following three types of processing according to the specific expression data to be detected. The first processing type detects only thank-you expression data, the second processing type detects only apology expression data, and the third processing type detects both thank-you expression data and apology expression data. set to target.

　第１の処理タイプでは、表現検出部２６は、音声認識部２１により取得された顧客の音声テキストデータから、クロージング検出部２３により生成されたクロージング区間データで示される時間範囲内の発声時間を持つ音声テキストデータを抽出する。表現検出部２６は、この抽出されたクロージング区間に対応する顧客の音声テキストデータの中から、特定表現テーブル２５に保持されるお礼表現データを検出する。この検出と共に、表現検出部２６は、お礼表現データの検出数をカウントする。 In the first processing type, the expression detection unit 26 has an utterance time within the time range indicated by the closing section data generated by the closing detection unit 23 from the voice text data of the customer acquired by the voice recognition unit 21. Extract voice text data. The expression detection unit 26 detects thank-you expression data held in the specific expression table 25 from the voice text data of the customer corresponding to the extracted closing section. Along with this detection, the expression detection unit 26 counts the number of thanks expression data detected.

　第２の処理タイプでは、表現検出部２６は、音声認識部２１により取得されたオペレータの音声テキストデータから、クロージング検出部２３により生成されたクロージング区間データで示される時間範囲内の発声時間を持つ音声テキストデータを抽出する。表現検出部２６は、この抽出されたクロージング区間に対応するオペレータの音声テキストデータの中から、特定表現テーブル２５に保持される謝罪表現データを検出する。この検出と共に、表現検出部２６は、謝罪表現データの検出数をカウントする。 In the second processing type, the expression detection unit 26 has an utterance time within the time range indicated by the closing section data generated by the closing detection unit 23 from the voice text data of the operator acquired by the voice recognition unit 21. Extract voice text data. The expression detection unit 26 detects apology expression data held in the specific expression table 25 from the voice text data of the operator corresponding to the extracted closing section. Along with this detection, the expression detection unit 26 counts the number of detected apology expression data.

　第３の処理タイプでは、表現検出部２６は、音声認識部２１により取得された顧客及びオペレータの各音声テキストデータから、クロージング検出部２３により生成されたクロージング区間データで示される時間範囲内の発声時間を持つ各音声テキストデータをそれぞれ抽出する。表現検出部２６は、この抽出されたクロージング区間に対応するオペレータの音声テキストデータの中から、特定表現テーブル２５に保持される謝罪表現データを検出し、当該抽出されたクロージング区間に対応する顧客の音声テキストデータの中から、特定表現テーブル２５に保持されるお礼表現データを検出する。表現検出部２６は、これら検出と共に、お礼表現データの検出数及び謝罪表現データの検出数を区別してそれぞれカウントする。 In the third processing type, the expression detection unit 26 utters speech within the time range indicated by the closing section data generated by the closing detection unit 23 from the voice text data of the customer and the operator acquired by the voice recognition unit 21. Each speech text data having time is extracted. The expression detection unit 26 detects apology expression data held in the specific expression table 25 from the voice text data of the operator corresponding to the extracted closing section, and the customer's corresponding to the extracted closing section is detected. Thanks expression data held in the specific expression table 25 is detected from the voice text data. Along with these detections, the expression detection unit 26 separately counts the number of detected thank-you expression data and the number of detected apology expression data.

　推定部２７は、表現検出部２６によりカウントされたお礼表現データの検出数に応じて、対象通話における顧客の満足度及び不満度の少なくとも一方を推定する。例えば、推定部２７は、お礼表現データの検出数が所定閾値以上の場合、満足感有りと推定する。また、お礼表現データの検出数が所定閾値以上の場合、不満感無しと推定しても良い。さらに、推定部２７は、お礼表現データの検出数が所定閾値より小さい場合、満足感無しと推定しても良い。満足感や不満感の有無を推定するための上記所定閾値は、コンタクトセンタでの音声データの検聴結果等により、予め決定される。 The estimation unit 27 estimates at least one of customer satisfaction and dissatisfaction in the target call according to the number of thank-you expression data detected by the expression detection unit 26. For example, when the number of thank-you expression data detected is greater than or equal to a predetermined threshold, the estimation unit 27 estimates that there is satisfaction. Further, when the number of thank-you expression data detected is equal to or greater than a predetermined threshold, it may be estimated that there is no dissatisfaction. Furthermore, the estimation unit 27 may estimate that there is no satisfaction when the number of thank-you expression data detected is smaller than a predetermined threshold. The predetermined threshold for estimating the presence or absence of satisfaction or dissatisfaction is determined in advance based on the result of listening to audio data at the contact center.

　下表は、コンタクトセンタの通話のクロージング区間において顧客がお礼を述べた回数と、顧客の満足感および不満感の有無との関係を調べた結果である。表中の「中立」は、顧客が満足感も不満感も感じていないことを示す。下表より、クロージング区間にてお礼を述べた回数が多くなるほど顧客が満足感を感じている確率が大きくなり、不満感を感じている確率が小さくなることがわかる。満足感や不満感の有無を推定するための上記閾値は、このような調査結果に基づいて予め決定される。例えば、下表に基づくと、お礼回数３回以上とすれば満足感有りを８０％程度の精度で推定できると期待できる。また、お礼回数１回未満（すなわちゼロ）とすれば満足感なしを８８％程度の精度で推定できると期待できる。 The table below shows the results of examining the relationship between the number of times the customer expressed gratitude in the closing period of the contact center call and the satisfaction and dissatisfaction of the customer. “Neutral” in the table indicates that the customer does not feel satisfaction or dissatisfaction. From the table below, it can be seen that the greater the number of times thanking in the closing section, the greater the probability that the customer feels satisfied and the less likely that the customer feels dissatisfied. The threshold value for estimating the presence or absence of satisfaction or dissatisfaction is determined in advance based on such survey results. For example, based on the table below, it can be expected that satisfaction can be estimated with an accuracy of about 80% if the number of thanks is three or more. Moreover, if the number of thanks is less than 1 (that is, zero), it can be expected that satisfaction can be estimated with an accuracy of about 88%.

　また、推定部２７は、表現検出部２６によりカウントされた謝罪表現データの検出数に応じて、対象通話における顧客の不満度及び満足度の少なくとも一方を推定する。例えば、推定部２７は、謝罪表現データの検出数が所定閾値以上の場合、不満感有りと推定する。また、推定部２７は、お礼表現データの検出数に応じた満足度レベル値や不満度レベル値を決定しても良い。同様に、推定部２７は、謝罪表現データの検出数に応じた不満度レベル値や満足度レベル値を決定するようにしてもよい。 Further, the estimation unit 27 estimates at least one of the degree of customer dissatisfaction and satisfaction in the target call according to the number of detected apology expression data counted by the expression detection unit 26. For example, the estimation unit 27 estimates that there is dissatisfaction when the detected number of apology expression data is greater than or equal to a predetermined threshold. Moreover, the estimation part 27 may determine the satisfaction level value and dissatisfaction level value according to the detection number of thanks expression data. Similarly, the estimation unit 27 may determine a dissatisfaction level value or a satisfaction level value corresponding to the number of detected apology expression data.

　更に、推定部２７は、お礼表現データ及び謝罪表現データの両方の検出数がカウントされた場合には、その両方の検出数に応じて、対象通話における顧客の満足度及び不満度の少なくとも一方を推定するようにしてもよい。例えば、推定部２７は、お礼表現データの検出数が他方より大きい場合には、満足感有りと推定し、謝罪表現データの検出数が他方より大きい場合には、不満感有りと推定する。また、推定部２７は、各検出数に応じた満足度レベル値及び不満度レベル値を決定してもよいし、両者の差分値により満足度レベル値又は不満度レベル値を決定するようにしてもよい。 Furthermore, when the number of detections of both the thank-you expression data and the apology expression data is counted, the estimation unit 27 calculates at least one of customer satisfaction and dissatisfaction level in the target call according to the number of detections of both. You may make it estimate. For example, the estimation unit 27 estimates that there is satisfaction when the number of thanks expression data detected is greater than the other, and estimates that there is dissatisfaction when the number of detected apology data is greater than the other. In addition, the estimation unit 27 may determine a satisfaction level value and a dissatisfaction level value according to the number of detections, or determine a satisfaction level value or a dissatisfaction level value based on a difference value between the two. Also good.

　推定部２７は、推定結果を示す情報を含む出力データを生成し、入出力Ｉ／Ｆ１３を介して表示部や他の出力装置にその判定結果を出力する。本実施形態は、この判定結果の出力の具体的形態を制限しない。 The estimation unit 27 generates output data including information indicating the estimation result, and outputs the determination result to the display unit or another output device via the input / output I / F 13. The present embodiment does not limit the specific form of output of the determination result.

　〔動作例〕
　以下、第１実施形態における通話分析方法について図３を用いて説明する。図３は、第１実施形態における通話分析サーバ１０の動作例を示すフローチャートである。 [Operation example]
Hereinafter, the call analysis method according to the first embodiment will be described with reference to FIG. FIG. 3 is a flowchart showing an operation example of the call analysis server 10 in the first embodiment.

　まず、お礼表現のみが用いられる場合の通話分析方法について説明する。
　通話分析サーバ１０は、通話データを取得する（Ｓ３０）。第１実施形態では、通話分析サーバ１０は、ファイルサーバ９に格納される複数の通話データの中から、分析対象となる通話データを取得する。 First, a call analysis method when only a thank-you expression is used will be described.
The call analysis server 10 acquires call data (S30). In the first embodiment, the call analysis server 10 acquires call data to be analyzed from a plurality of call data stored in the file server 9.

　通話分析サーバ１０は、（Ｓ３０）で取得された通話データに含まれる顧客の音声データに対して音声認識処理を行う（Ｓ３１）。これにより、通話分析サーバ１０は、顧客の音声テキストデータ及び発声時間データを取得する。顧客の音声テキストデータは、単語（品詞）ごとにそれぞれ区分けされている。また、発声時間データには、単語毎又は各発話区間に相当する単語列毎の発声時間データが含まれる。 The call analysis server 10 performs voice recognition processing on the customer voice data included in the call data acquired in (S30) (S31). Thereby, the call analysis server 10 acquires the customer's voice text data and utterance time data. The customer's voice text data is divided for each word (part of speech). The utterance time data includes utterance time data for each word or for each word string corresponding to each utterance section.

　通話分析サーバ１０は、（Ｓ３０）で取得された通話データに含まれる切断時間データ、及び、（Ｓ３１）で取得された発声時間データに基づいて、対象通話のクロージング区間を検出する（Ｓ３２）。例えば、通話分析サーバ１０は、切断時間データにより示される通話切断時間から所定時間遡った時点を、クロージング区間の始端時間に決定する。他の例としては、通話分析サーバ１０は、当該通話切断時間から、顧客の所定数分の発話区間の始端時間を、クロージング区間の始端時間に決定する。通話分析サーバ１０は、検出されたクロージング区間の始端時間及び終端時間を示すクロージング区間データを生成する。 The call analysis server 10 detects the closing section of the target call based on the disconnection time data included in the call data acquired in (S30) and the utterance time data acquired in (S31) (S32). For example, the call analysis server 10 determines a time point that is a predetermined time back from the call disconnection time indicated by the disconnection time data as the start time of the closing section. As another example, the call analysis server 10 determines the start time of the utterance section for a predetermined number of customers as the start time of the closing section from the call disconnection time. The call analysis server 10 generates closing section data indicating the start time and the end time of the detected closing section.

　通話分析サーバ１０は、（Ｓ３１）で取得された顧客の音声テキストデータの中から、（Ｓ３２）で生成されたクロージング区間データで示される時間範囲内の発声時間に対応する音声テキストデータを抽出し、この抽出された音声テキストデータの中から、特定表現データとしてのお礼表現データを検出する（Ｓ３３）。この検出に伴い、通話分析サーバ１０は、お礼表現データの検出数をカウントする（Ｓ３４）。 The call analysis server 10 extracts voice text data corresponding to the utterance time within the time range indicated by the closing section data generated in (S32) from the customer voice text data acquired in (S31). From the extracted speech text data, thank-you expression data as specific expression data is detected (S33). With this detection, the call analysis server 10 counts the number of thank-you expression data detected (S34).

　通話分析サーバ１０は、（Ｓ３４）でカウントされたお礼表現データの検出数に基づいて、対象通話の顧客の満足度を推定する（Ｓ３５）。例えば、通話分析サーバ１０は、お礼表現データの検出数が所定閾値より大きい場合、満足感有り、かつ、不満感なしと推定する。また、お礼表現データの検出数が所定閾値より小さい場合、通話分析サーバ１０は、満足感なしと推定する。通話分析サーバ１０は、推定された満足度や不満度の有無、又は、レベル値を示す出力データを生成する。 The call analysis server 10 estimates the customer satisfaction of the target call based on the number of thank-you expression data detected in (S34) (S35). For example, when the number of thank-you expression data detected is greater than a predetermined threshold, the call analysis server 10 estimates that there is satisfaction and no dissatisfaction. When the number of thank-you expression data detected is smaller than the predetermined threshold, the call analysis server 10 estimates that there is no satisfaction. The call analysis server 10 generates output data indicating the presence or absence of the estimated satisfaction or dissatisfaction level, or a level value.

　次に、謝罪表現のみを用いる場合の通話分析方法について説明する。
　この場合、（Ｓ３１）では、通話分析サーバ１０は、当該通話データに含まれるオペレータの音声データに対して音声認識処理を行う。これにより、通話分析サーバ１０は、オペレータの音声テキストデータ及び発声時間データを取得する。 Next, a call analysis method using only an apology expression will be described.
In this case, in (S31), the call analysis server 10 performs a voice recognition process on the operator's voice data included in the call data. Thereby, the call analysis server 10 acquires the operator's voice text data and utterance time data.

　（Ｓ３２）では、通話分析サーバ１０は、（Ｓ３０）で取得された通話データに含まれる切断時間データ、及び、（Ｓ３１）で取得されたオペレータの音声テキストデータに基づいて、対象通話のクロージング区間を検出する。この場合、通話分析サーバ１０は、オペレータの音声テキストデータの中で、最前の所定のクロージング文句の発声時間を、クロージング区間の始端時間に決定する。 In (S32), the call analysis server 10 closes the closing period of the target call based on the disconnection time data included in the call data acquired in (S30) and the voice text data of the operator acquired in (S31). Is detected. In this case, the call analysis server 10 determines the utterance time of the first predetermined closing phrase in the voice text data of the operator as the start time of the closing section.

　（Ｓ３３）では、通話分析サーバ１０は、（Ｓ３１）で取得されたオペレータの音声テキストデータの中から、（Ｓ３２）で生成されたクロージング区間データで示される時間範囲内の発声時間に対応する音声テキストデータを抽出し、この抽出された音声テキストデータの中から、特定表現データとしての謝罪表現データを検出する。（Ｓ３４）では、通話分析サーバ１０は、謝罪表現データの検出数をカウントする（Ｓ３４）。 In (S33), the call analysis server 10 selects the voice corresponding to the utterance time within the time range indicated by the closing section data generated in (S32) from the voice text data of the operator acquired in (S31). Text data is extracted, and apology expression data as specific expression data is detected from the extracted speech text data. In (S34), the call analysis server 10 counts the number of detected apology expression data (S34).

　（Ｓ３５）では、通話分析サーバ１０は、（Ｓ３４）でカウントされた謝罪表現データの検出数に基づいて、対象通話の顧客の不満度を推定する（Ｓ３５）。通話分析サーバ１０は、謝罪表現データの検出数が所定閾値より大きい場合、不満感有りと推定し、それ以外の場合、不満感なしと推定する。 In (S35), the call analysis server 10 estimates the degree of dissatisfaction of the customer of the target call based on the detected number of apology expression data counted in (S34) (S35). The call analysis server 10 estimates that there is dissatisfaction if the detected number of apology expression data is greater than a predetermined threshold value, and otherwise estimates that there is no dissatisfaction.

　以下、お礼表現及び謝罪表現の両方を特定表現として用いる場合の通話分析方法について説明する。この場合、（Ｓ３１）では、通話分析サーバ１０は、顧客及びオペレータの各音声データに対してそれぞれ音声認識処理を行う。これにより、通話分析サーバ１０は、顧客及びオペレータに関する音声テキストデータ及び発声時間データをそれぞれ取得する。 The following describes the call analysis method when both the thank-you expression and the apology expression are used as specific expressions. In this case, in (S31), the call analysis server 10 performs voice recognition processing on each voice data of the customer and the operator. Thereby, the call analysis server 10 acquires voice text data and utterance time data related to the customer and the operator, respectively.

　（Ｓ３３）及び（Ｓ３４）では、通話分析サーバ１０は、上述の２つの場合の（Ｓ３３）及び（Ｓ３４）をそれぞれ実行する。これにより、お礼表現データの検出数及び謝罪表現データの検出数がそれぞれカウントされる。 In (S33) and (S34), the call analysis server 10 executes (S33) and (S34) in the above two cases, respectively. As a result, the number of detected thank-you expression data and the number of detected apology data are counted.

　（Ｓ３５）では、通話分析サーバ１０は、（Ｓ３４）でカウントされたお礼表現データの検出数及び謝罪表現データの検出数に基づいて、対象通話の顧客の満足度及び不満度の少なくとも一方を推定する。 In (S35), the call analysis server 10 estimates at least one of the satisfaction level and dissatisfaction level of the customer of the target call based on the detected number of thank-you expression data and the detected number of apology data counted in (S34). To do.

　〔第１実施形態の作用及び効果〕
　上述したように第１実施形態では、対象通話のクロージング区間の音声に対応するデータから検出される、顧客により発声されるお礼表現データの検出数及びオペレータにより発声される謝罪表現データの検出数の少なくとも一方に基づいて、対象通話の顧客の満足度及び不満度の少なくとも一方が推定される。本実施形態によれば、クロージング区間のみからお礼表現や謝罪表現を検出しているため、これら特定表現は顧客の満足感または不満感を反映している可能性が高く、かつ、クロージング区間以外で誤認識された特定表現の悪影響を受けなくなるため、顧客の満足度または不満度を高精度に推定することができる。 [Operation and Effect of First Embodiment]
As described above, in the first embodiment, the detected number of thank-you expression data uttered by the customer and the detected number of apology expression data uttered by the operator, which are detected from the data corresponding to the voice of the closing period of the target call, are detected. Based on at least one, at least one of customer satisfaction and dissatisfaction of the target call is estimated. According to this embodiment, since a thank-you expression or an apology expression is detected only from the closing section, these specific expressions are highly likely to reflect customer satisfaction or dissatisfaction, and other than the closing section. Since it is not adversely affected by the misrecognized specific expression, the satisfaction or dissatisfaction of the customer can be estimated with high accuracy.

　更に、本実施形態によれば、顧客及びオペレータのいずれか一方のみの音声テキストデータがあれば、上述の実施形態で述べたように、高精度に、顧客の満足度又は不満度を推定することができる。従って、本実施形態によれば、顧客及びオペレータの両方の音声データに対して音声認識処理を行う形態に比べて、音声認識処理の負荷を軽減することもできる。 Furthermore, according to the present embodiment, if there is voice text data of only one of the customer and the operator, the satisfaction or dissatisfaction level of the customer can be estimated with high accuracy as described in the above embodiment. Can do. Therefore, according to the present embodiment, it is possible to reduce the load of the voice recognition process as compared with the form in which the voice recognition process is performed on the voice data of both the customer and the operator.

　また、第１実施形態では、顧客により発声されるお礼表現データの検出数及びオペレータにより発声される謝罪表現データの検出数の両方に基づいて、対象通話の顧客の満足度及び不満度の少なくとも一方を推定することもできる。このようにすれば、顧客の満足度及び不満度と強い相関を持つ、顧客によるお礼表現及びオペレータによる謝罪表現の両方が加味されるため、顧客の満足度又は不満度の推定精度を更に向上させることができる。 In the first embodiment, at least one of the satisfaction level and the dissatisfaction level of the customer of the target call is based on both the detected number of thank-you expression data uttered by the customer and the detected number of apology expression data uttered by the operator. Can also be estimated. In this way, both the customer's gratitude expression and the operator's apology expression that have a strong correlation with the customer's satisfaction and dissatisfaction are taken into account, so the accuracy of estimating the customer's satisfaction or dissatisfaction is further improved. be able to.

　［第２実施形態］
　第２実施形態では、お礼表現及び謝罪表現を認識し易いように重み付けされた音声認識パラメータを用いて、クロージング区間の音声データに対する音声認識処理が行われる。以下、第２実施形態におけるコンタクトセンタシステム１について、第１実施形態と異なる内容を中心に説明する。以下の説明では、第１実施形態と同様の内容については適宜省略する。 [Second Embodiment]
In the second embodiment, speech recognition processing is performed on speech data in a closing section using speech recognition parameters weighted so as to easily recognize a thank-you expression and an apology expression. Hereinafter, the contact center system 1 in the second embodiment will be described focusing on the content different from the first embodiment. In the following description, the same contents as those in the first embodiment are omitted as appropriate.

　〔処理構成〕
　図４は、第２実施形態における通話分析サーバ１０の処理構成例を概念的に示す図である。第２実施形態における通話分析サーバ１０は、第１実施形態の構成に加えて、音声認識部４１を更に有する。音声認識部４１は、他の処理部と同様に、例えば、ＣＰＵ１１によりメモリ１２に格納されるプログラムが実行されることにより実現される。 [Processing configuration]
FIG. 4 is a diagram conceptually illustrating a processing configuration example of the call analysis server 10 in the second embodiment. The call analysis server 10 in the second embodiment further includes a voice recognition unit 41 in addition to the configuration of the first embodiment. The voice recognition unit 41 is realized by executing a program stored in the memory 12 by the CPU 11, for example, in the same manner as the other processing units.

　音声認識部２１は、通話データに含まれるオペレータの音声データに対して、基準音声認識パラメータＬＭ－１を用いて、音声認識処理を行う。この音声認識処理で取得される音声テキストデータは、クロージング検出部２３のみにより利用されるため、音声認識処理は、オペレータの音声データのみに対して行われればよい。なお、音声認識部２１は、オペレータ及び顧客の両方の音声データに対して音声認識処理を行うようにしてもよい。音声認識部２１は、コンタクトセンタにおける通話全般用に予め学習された基準音声認識パラメータＬＭ－１を予め保持する。 The voice recognition unit 21 performs voice recognition processing on the voice data of the operator included in the call data, using the reference voice recognition parameter LM-1. Since the voice text data acquired by the voice recognition process is used only by the closing detection unit 23, the voice recognition process may be performed only on the voice data of the operator. Note that the voice recognition unit 21 may perform voice recognition processing on the voice data of both the operator and the customer. The voice recognition unit 21 holds in advance a reference voice recognition parameter LM-1 that has been learned in advance for general calls in the contact center.

　音声認識部４１は、音声認識部２１により用いられる基準音声認識パラメータＬＭ－１が、表現検出部２６で検出される特定表現データが他の単語データよりも認識され易くなるように重み付けされた音声認識パラメータ（以降、加重音声認識パラメータと表記）ＬＭ－２を用いて、対象通話のクロージング区間の音声データに対して音声認識処理を行う。図４では、音声認識部２１と音声認識部４１とが区別されて示されるが、両者は１つの処理部として実現され、用いられる音声認識パラメータが切り替えられるようにしてもよい。 The voice recognition unit 41 is weighted so that the reference voice recognition parameter LM-1 used by the voice recognition unit 21 is weighted so that the specific expression data detected by the expression detection unit 26 can be recognized more easily than other word data. Using the recognition parameter (hereinafter referred to as a weighted speech recognition parameter) LM-2, speech recognition processing is performed on the speech data in the closing section of the target call. In FIG. 4, the voice recognition unit 21 and the voice recognition unit 41 are distinguished from each other, but both may be realized as one processing unit, and the voice recognition parameters to be used may be switched.

　加重音声認識パラメータＬＭ－２は、例えば、基準音声認識パラメータＬＭ－１に基づいて所定手法により算出され、予め、音声認識部４１により保持される。下記式は、音声認識パラメータとしてＮ－ｇｒａｍ言語モデルが利用される場合における、加重音声認識パラメータＬＭ－２の算出例を示す図である。

The weighted speech recognition parameter LM-2 is calculated by a predetermined method based on the reference speech recognition parameter LM-1, for example, and is held in advance by the speech recognition unit 41. The following equation is a diagram illustrating an example of calculating the weighted speech recognition parameter LM-2 when the N-gram language model is used as the speech recognition parameter.

　上記式の左辺Ｐ_ｎｅｗ（ｗ_ｉ｜ｗ_{ｉ－ｎ＋１} ^ｉ－１）は、加重音声認識パラメータＬＭ－２に相当するＮ－ｇｒａｍ言語モデルを示し、（ｉ－ｎ＋１）番目から（ｉ－１）番目までの単語列ｗ_{ｉ－ｎ＋１} ^ｉ－１の条件下におけるｉ番目の単語ｗ_ｉの出現確率を示す。上記式の右辺のＰ_ｏｌｄ（ｗ_ｉ｜ｗ_{ｉ－ｎ＋１} ^ｉ－１）は、基準音声認識パラメータＬＭ－１に相当するＮ－ｇｒａｍ言語モデルを示す。上記式の右辺のＰ_ｎｅｗ（ｗ_ｉ）は、お礼表現及び謝罪表現の出現確率を大きくしたユニグラム言語モデルであり、上記式の右辺のＰ_ｏｌｄ（ｗ_ｉ）は、基準音声認識パラメータＬＭ－１に相当するユニグラム言語モデルである。上記式によれば、コンタクトセンタにおける通話全般用に予め学習されたＮ－ｇｒａｍ言語モデルが、お礼表現及び謝罪表現の出現確率が大きくなるように、（Ｐ_ｎｅｗ（ｗ_ｉ）／Ｐ_ｏｌｄ（ｗ_ｉ））で重み付けされたＮ－ｇｒａｍ言語モデルが、加重音声認識パラメータＬＭ－２として算出される。 The left side P _new (w _i | w _{i−n + 1} ⁱ⁻¹ ) of the above formula represents an N-gram language model corresponding to the weighted speech recognition parameter LM-2, and the (i−n + 1) th to (i−1) The appearance probability of the i-th word w _i under the condition of the word string w _{i-n + 1} ^{i−1 up to} the th is shown. P _old (w _i | w _{i−n + 1} ⁱ⁻¹ ) on the right side of the above expression represents an N-gram language model corresponding to the reference speech recognition parameter LM−1. P _new (w _i ) on the right side of the above expression is a unigram language model in which the appearance probability of a thankful expression and an apology expression is increased, and P _old (w _i ) on the right side of the above expression is a reference speech recognition parameter LM-1 Is a unigram language model equivalent to According to the above formula, (P _new (w _i ) / P _old (w) is set so that the N-gram language model learned in advance for general calls in the contact center increases the appearance probability of a thank-you expression and an apology expression. _The N-gram language model weighted in _i )) is calculated as the weighted speech recognition parameter LM-2.

　音声認識部４１は、クロージング検出部２３により生成されるクロージング区間データにより示される時間範囲内の音声データに対してのみ音声認識処理を行う。また、音声認識部４１は、表現検出部２６の処理内容に応じて、顧客及びオペレータの両方の音声データを音声認識処理の対象としてもよいし、顧客及びオペレータのいずれか一方の音声データのみを音声認識処理の対象としてもよい。 The voice recognition unit 41 performs the voice recognition process only on the voice data within the time range indicated by the closing section data generated by the closing detection unit 23. Further, the voice recognition unit 41 may set both voice data of the customer and the operator as a target of voice recognition processing according to the processing content of the expression detection unit 26, or only the voice data of one of the customer and the operator. It may be a target of voice recognition processing.

　表現検出部２６は、音声認識部４１により取得された音声テキストデータの中から、特定表現テーブル２５に保持されるお礼表現データ及び謝罪表現データの少なくとも一方を検出する。 The expression detection unit 26 detects at least one of thanks expression data and apology expression data held in the specific expression table 25 from the voice text data acquired by the voice recognition unit 41.

　〔動作例〕
　以下、第２実施形態における通話分析方法について図５を用いて説明する。図５は、第２実施形態における通話分析サーバ１０の動作例を示すフローチャートである。図５では、図３と同じ工程については、図３と同じ符号が付されている。 [Operation example]
Hereinafter, a call analysis method according to the second embodiment will be described with reference to FIG. FIG. 5 is a flowchart illustrating an operation example of the call analysis server 10 according to the second embodiment. In FIG. 5, the same steps as those in FIG. 3 are denoted by the same reference numerals as those in FIG.

　通話分析サーバ１０は、（Ｓ３０）で取得された通話データに含まれる音声データの中の、（Ｓ３２）で生成されたクロージング区間データで示される時間範囲の音声データに対して、加重音声認識パラメータＬＭ－２を用いた音声認識を行う（Ｓ５１）。
　通話分析サーバ１０は、（Ｓ５１）で取得された音声テキストデータの中から、特定表現データとしてのお礼表現データ及び謝罪表現データの少なくとも一方を検出する（Ｓ３３）。 The call analysis server 10 applies weighted speech recognition parameters to the voice data in the time range indicated by the closing section data generated in (S32) among the voice data included in the call data acquired in (S30). Speech recognition using LM-2 is performed (S51).
The call analysis server 10 detects at least one of thank-you expression data and apology expression data as specific expression data from the speech text data acquired in (S51) (S33).

　〔第２実施形態の作用及び効果〕
　上述のように、第２実施形態では、お礼表現及び謝罪表現を認識し易いように重み付けされた加重音声認識パラメータを用いて、クロージング区間の音声データに対する音声認識処理が行われる。そして、この音声認識処理で取得される音声テキストデータから、お礼表現データ及び謝罪表現データの少なくとも一方が検出され、この検出結果に基づいて対象通話の顧客の満足度又は不満度が推定される。 [Operation and Effect of Second Embodiment]
As described above, in the second embodiment, the speech recognition process is performed on the speech data in the closing section using the weighted speech recognition parameters weighted so as to easily recognize the thanks and apologies. Then, at least one of thank-you expression data and apology expression data is detected from the speech text data acquired by this speech recognition process, and the satisfaction or dissatisfaction level of the customer of the target call is estimated based on the detection result.

　通話の終了過程では、お礼表現及び謝罪表現が発声されている可能性が、他の区間よりも高い。これにより、クロージング区間の音声データに対して行われる音声認識処理では、お礼表現及び謝罪表現を認識し易いように重み付けされた加重音声認識パラメータが用いられる。従って、第２実施形態によれば、クロージング区間の音声データから確実にお礼表現データ及び謝罪表現データを検出することができる。 In the process of ending the call, there is a higher possibility that a thank-you expression and an apology expression are uttered compared to other sections. Thereby, in the speech recognition processing performed on the speech data in the closing section, weighted speech recognition parameters weighted so as to easily recognize the thank-you expression and the apology expression are used. Therefore, according to the second embodiment, it is possible to reliably detect thank-you expression data and apology expression data from the audio data in the closing section.

　一方、このような加重音声認識パラメータを用いた音声認識処理が、クロージング区間以外の区間の音声データに対して行われた場合、お礼表現及び謝罪表現の認識誤り率が増加する可能性が高まり、ひいては、顧客の満足度又は不満度の推定精度が低下する可能性がある。これに対して、第２実施形態では、上述のように、加重音声認識パラメータを用いた音声認識処理を、お礼表現及び謝罪表現の出現確率の高いクロージング区間の音声データに絞って行っているため、そのような推定精度の低下を避けることができる。 On the other hand, when speech recognition processing using such weighted speech recognition parameters is performed on speech data in a section other than the closing section, there is a high possibility that the recognition error rate of the thank-you expression and the apology expression will increase. As a result, the estimation accuracy of customer satisfaction or dissatisfaction may be reduced. On the other hand, in the second embodiment, as described above, the speech recognition process using the weighted speech recognition parameters is performed only on the speech data in the closing section where the appreciation expression and the apology expression have a high appearance probability. Such a decrease in estimation accuracy can be avoided.

　第２実施形態では、このようにお礼表現及び謝罪表現の検出率を上げているため、それでもお礼表現が検出されなかった場合には、その検出結果に応じた顧客の満足感なしとの推定は、極めて高い精度（純度）を示すことになる。よって、第２実施形態によれば、お礼表現の検出数が０の場合に満足感なしと推定することで、その推定精度が非常に高いことを期待できる。また、第２実施形態においてはお礼表現を認識し易いように重み付けした言語モデルを用いているため、お礼表現の検出数が０の場合、顧客が全くお礼を述べなかった可能性が特に高いため、その通話に関し不満有りとの推定をすることも可能である。 In the second embodiment, since the detection rate of the thank-you expression and the apology expression is increased in this way, if the thank-you expression is still not detected, it is estimated that the customer is not satisfied according to the detection result. , Extremely high accuracy (purity) will be exhibited. Therefore, according to the second embodiment, it can be expected that the estimation accuracy is very high by estimating that there is no satisfaction when the number of detected thank-you expressions is 0. In the second embodiment, a weighted language model is used so that a thank-you expression can be easily recognized. Therefore, when the number of detected thank-you expressions is 0, there is a high possibility that the customer did not say thank-you at all. It is also possible to estimate that there is dissatisfaction with the call.

　［第１変形例］
　以下、第１実施形態における通話分析サーバ１０の変形例を第１変形例として説明する。図６は、第１変形例における通話分析サーバ１０の処理構成例を概念的に示す図である。第１変形例では、クロージング検出部２３は、通話データ取得部２０により取得された通話データに含まれる音声データ及び切断時間データの少なくとも一方を用いてクロージング区間を検出する。 [First Modification]
Hereinafter, a modification of the call analysis server 10 in the first embodiment will be described as a first modification. FIG. 6 is a diagram conceptually illustrating a processing configuration example of the call analysis server 10 in the first modification. In the first modification, the closing detection unit 23 detects a closing section using at least one of voice data and disconnection time data included in the call data acquired by the call data acquisition unit 20.

　クロージング検出部２３は、切断時間データが示す通話切断時間をクロージング区間の終端時間に設定し、その通話切断時間から所定時間幅をクロージング区間の始端時間と決定してもよい。また、クロージング検出部２３は、各クロージング文句の音声データから得られる各音声信号波形を保持し、当該各音声信号波形と通話データに含まれる音声データの波形とをそれぞれ照合することにより、クロージング文句の発声時間を取得するようにしてもよい。 The closing detection unit 23 may set the call disconnection time indicated by the disconnection time data as the end time of the closing section, and determine a predetermined time width from the call disconnection time as the start end time of the closing section. In addition, the closing detection unit 23 holds each voice signal waveform obtained from the voice data of each closing phrase, and collates each voice signal waveform with the waveform of the voice data included in the call data, thereby closing the phrase. May be acquired.

　第１変形例では、音声認識部２１は、対象通話のクロージング区間の音声データに対して音声認識処理を行えばよい。
　第１変形例における通話分析方法では、図３に示される工程（Ｓ３１）が、工程（Ｓ３２）の後で工程（Ｓ３３）の前に実行されればよい。 In the first modification, the voice recognition unit 21 may perform voice recognition processing on the voice data in the closing section of the target call.
In the call analysis method in the first modification, the step (S31) shown in FIG. 3 may be executed after the step (S32) and before the step (S33).

　［第２変形例］
　以下、第２実施形態における通話分析サーバ１０の変形例を第２変形例として説明する。図７は、第２変形例における通話分析サーバ１０の処理構成例を概念的に示す図である。第２変形例では、通話分析サーバ１０は、音声認識部２１を持たなくてもよい。クロージング検出部２３は、通話データ取得部２０により取得された通話データに含まれる音声データ及び切断時間データの少なくとも一方を用いてクロージング区間を検出する。第２変形例におけるクロージング検出部２３の処理内容は、第１変形例と同様でよいため、ここでは説明を省略する。 [Second Modification]
Hereinafter, a modification of the call analysis server 10 in the second embodiment will be described as a second modification. FIG. 7 is a diagram conceptually illustrating a processing configuration example of the call analysis server 10 in the second modification. In the second modification, the call analysis server 10 may not have the voice recognition unit 21. The closing detection unit 23 detects a closing section using at least one of voice data and disconnection time data included in the call data acquired by the call data acquisition unit 20. Since the processing content of the closing detection unit 23 in the second modification may be the same as that in the first modification, description thereof is omitted here.

　第２変形例における通話分析方法では、図５に示される工程（Ｓ３１）が省かれる。第１変形例および第２変形例によれば、クロージング検出部によって検出された区間のみに音声認識を適用するため、顧客の満足度や不満度の推定に要する計算時間が少なくて済むという利点がある。 In the call analysis method in the second modification, the step (S31) shown in FIG. 5 is omitted. According to the first modification and the second modification, since voice recognition is applied only to the section detected by the closing detection unit, there is an advantage that the calculation time required for estimating the degree of satisfaction or dissatisfaction of the customer can be reduced. is there.

　［その他の変形例］
　上述の各実施形態及び各変形例では、お礼表現データの検出数及び謝罪表現データの検出数により顧客の満足度又は不満度が推定された。しかしながら、顧客の満足度又は不満度は、検出数以外から推定されるようにしてもよい。例えば、特定表現テーブル２５において、お礼表現データ毎に満足度ポイントを、謝罪表現毎に不満度ポイントをそれぞれ予め付与しておき、検出されたお礼表現データの満足度ポイントの合計値、及び、検出された謝罪表現データの不満度ポイントの合計値から、顧客の満足度レベル値及び不満度レベル値が推定されるようにしてもよい。 [Other variations]
In each of the above-described embodiments and modifications, customer satisfaction or dissatisfaction is estimated based on the number of thank-you expression data detected and the number of apology expression data detected. However, customer satisfaction or dissatisfaction may be estimated from other than the number of detections. For example, in the specific expression table 25, a satisfaction point is given in advance for each thanks expression data, and a dissatisfaction point is given in advance for each apology expression. The customer satisfaction level value and the dissatisfaction level value may be estimated from the total value of the dissatisfaction points of the apology expression data.

　上述の各実施形態及び各変形例は、コンタクトセンタシステム１を例示するため、基準音声認識パラメータが、コンタクトセンタにおける通話全般用に適合（学習）されている例が示された。基準音声認識パラメータは、扱われる通話の形態に適合されればよい。例えば、通話端末による一般的な通話が扱われる場合には、そのような一般的な通話用に適合された基準音声認識パラメータが利用されればよい。 In each of the above-described embodiments and modifications, the contact center system 1 is exemplified, and an example in which the reference voice recognition parameter is adapted (learned) for general calls in the contact center is shown. The reference speech recognition parameters need only be adapted to the type of call being handled. For example, when a general call by a call terminal is handled, a reference speech recognition parameter adapted for such a general call may be used.

　上述の各実施形態及び各変形例では、通話データには切断時間データが含まれ、その切断時間データが各オペレータ電話機６や交換機５等により生成される例が示されたが、切断時間データは、顧客の音声データから切断音を検出することにより、生成されるようにしてもよい。この場合、切断時間データは、ファイルサーバ９が生成してもよいし、通話分析サーバ１０が生成してもよい。 In each of the above-described embodiments and modifications, the call data includes disconnection time data, and the disconnection time data is generated by each operator telephone 6, the exchange 5, or the like. It may be generated by detecting a cutting sound from customer voice data. In this case, the disconnection time data may be generated by the file server 9 or the call analysis server 10.

　また、上述の通話分析サーバ１０は、複数のコンピュータとして実現されてもよい。この場合、例えば、通話分析サーバ１０は、表現検出部２６及び推定部２７のみを有し、他のコンピュータが他の処理部を有するように構成される。更に、クロージング検出部２３は、クロージング区間データを、入力画面等に基づいて入力装置をユーザが操作することにより取得してもよいし、可搬型記録媒体、他のコンピュータ等から入出力Ｉ／Ｆ１３を経由して取得してもよい。 Further, the above-described call analysis server 10 may be realized as a plurality of computers. In this case, for example, the call analysis server 10 includes only the expression detection unit 26 and the estimation unit 27, and is configured such that another computer has another processing unit. Further, the closing detection unit 23 may acquire the closing section data by the user operating the input device based on the input screen or the like, or the input / output I / F 13 from a portable recording medium, another computer, or the like. May be obtained via.

　［他の実施形態］
　上述の各実施形態及び各変形例では、通話データが扱われたが、上述の会話分析装置及び会話分析方法は、通話以外の会話データを扱う装置やシステムに適用されてもよい。この場合、例えば、分析対象となる会話を録音する録音装置がその会話が行われる場所（会議室、銀行の窓口、店舗のレジなど）に設置される。また、会話データが複数の会話参加者の声が混合された状態で録音される場合には、その混合状態から所定の音声処理により会話参加者毎の音声データに分離される。 [Other Embodiments]
In each of the above-described embodiments and modifications, call data is handled. However, the above-described conversation analysis device and conversation analysis method may be applied to a device or system that handles conversation data other than a call. In this case, for example, a recording device for recording a conversation to be analyzed is installed at a place (conference room, bank window, store cash register, etc.) where the conversation is performed. Further, when the conversation data is recorded in a state in which the voices of a plurality of conversation participants are mixed, the conversation data is separated from the mixed state into voice data for each conversation participant by a predetermined voice process.

　また、上述の各実施形態及び各変形例では、会話の終了時点を示すデータとして通話の切断時間データが用いられたが、通話データ以外の会話データが扱われる形態では、会話の終了を示す事象が自動又は手動で検出され、この検出時点が会話の終了時間データとして扱われるようにすればよい。自動検出では、会話参加者全員の発声の終了が検出されてもよいし、会話参加者の解散を示す人の動きがセンサ等で検出されてもよい。また、手動検出では、会話参加者による会話終了を通知するための入力操作が検出されてもよい。 Further, in each of the above-described embodiments and modifications, the call disconnection time data is used as the data indicating the end time of the conversation. However, in the form in which the conversation data other than the call data is handled, the event indicating the end of the conversation May be detected automatically or manually, and this detection time point may be treated as conversation end time data. In the automatic detection, the end of the utterance of all the conversation participants may be detected, or the movement of the person indicating the dissolution of the conversation participants may be detected by a sensor or the like. In the manual detection, an input operation for notifying the conversation end by the conversation participant may be detected.

　また、通話データ以外の会話データが扱われる形態では、クロージング検出部２３は、会話データに含まれる会話終了時間データと、音声認識部２１により取得された会話参加者の音声テキストデータ及びその発声時間データとに基づいて、対象会話のクロージング区間を検出すればよい。この場合、クロージング区間の幅を決めるための所定発話数及び所定時間は、銀行の窓口で行われる会話、店舗のレジで行われる会話、施設のインフォメーションセンタで行われる会話などのようなその会話種に応じて決められる。また、所定のクロージング文句についても同様に、会話種に応じてそれぞれ決められる。 In the form in which conversation data other than the call data is handled, the closing detection unit 23 includes the conversation end time data included in the conversation data, the voice text data of the conversation participant acquired by the voice recognition unit 21, and the utterance time thereof. The closing section of the target conversation may be detected based on the data. In this case, the predetermined number of utterances and the predetermined time for determining the width of the closing section are the conversation types such as conversations conducted at bank counters, conversations conducted at store cash registers, conversations conducted at facility information centers, etc. It is decided according to. Similarly, a predetermined closing phrase is determined according to the conversation type.

　なお、上述の説明で用いた複数のフローチャートでは、複数の工程（処理）が順番に記載されているが、本実施形態で実行される工程の実行順序は、その記載の順番に制限されない。本実施形態では、図示される工程の順番を内容的に支障のない範囲で変更することができる。また、上述の各実施形態及び各変形例は、内容が相反しない範囲で組み合わせることができる。 In the plurality of flowcharts used in the above description, a plurality of steps (processes) are described in order, but the execution order of the steps executed in the present embodiment is not limited to the description order. In the present embodiment, the order of the illustrated steps can be changed within a range that does not hinder the contents. Moreover, each above-mentioned embodiment and each modification can be combined in the range with which the content does not conflict.

　上記の各実施形態及び各変形例の一部又は全部は、以下の付記のようにも特定され得る。但し、各実施形態及び各変形例が以下の記載に限定されるものではない。 Some or all of the above embodiments and modifications may be specified as in the following supplementary notes. However, each embodiment and each modification are not limited to the following description.

（付記１）
　第１会話参加者と第２会話参加者との間の会話のクロージング区間のみの音声に対応するデータから、該第１会話参加者により発声されたお礼表現データ及び該第２会話参加者により発声された謝罪表現データの少なくとも一方を特定表現データとして検出する表現検出部と、
　前記特定表現データの検出結果に応じて、前記会話における前記第１会話参加者の満足度又は不満度を推定する推定部と、
　を備える会話分析装置。 (Appendix 1)
Thank-you expression data uttered by the first conversation participant and data uttered by the second conversation participant from data corresponding to the voice of only the closing section of the conversation between the first conversation participant and the second conversation participant. An expression detector that detects at least one of the apologized expression data as specific expression data;
An estimation unit for estimating satisfaction or dissatisfaction of the first conversation participant in the conversation according to a detection result of the specific expression data;
Conversation analyzer with

（付記２）
　前記表現検出部は、
　　前記会話を含む所定形態の会話の音声認識に適合された基準音声認識パラメータが、前記特定表現データが他の単語データよりも認識され易くなるように重み付けされた音声認識パラメータを用いて、前記会話の前記クロージング区間の音声データに対して音声認識処理を行う音声認識部、
　を含み、
　前記音声認識部の前記音声認識処理により得られる、前記会話の前記クロージング区間の音声テキストデータの中から、前記特定表現データを検出する
　付記１に記載の会話分析装置。 (Appendix 2)
The expression detection unit
The speech recognition parameter weighted so that the specific expression data is more easily recognized than other word data is used as the reference speech recognition parameter adapted for speech recognition of a predetermined form of conversation including the conversation. A speech recognition unit that performs speech recognition processing on the speech data of the closing section;
Including
The conversation analysis device according to claim 1, wherein the specific expression data is detected from speech text data in the closing section of the conversation obtained by the speech recognition process of the speech recognition unit.

（付記３）
　前記表現検出部は、前記特定表現データを前記お礼表現データと前記謝罪表現データとに区別可能に保持する特定表現テーブルに基づいて、前記特定表現データを検出することにより、前記お礼表現データおよび前記謝罪表現データの少なくとも一方の検出数をカウントし、
　前記推定部は、前記お礼表現データの検出数または前記謝罪表現データの検出数に基づいて、前記会話における前記第１会話参加者の満足度及び不満度の少なくとも一方を推定する、
　付記１又は２に記載の会話分析装置。 (Appendix 3)
The expression detection unit detects the specific expression data based on a specific expression table that holds the specific expression data in a distinguishable manner between the thanks expression data and the apology expression data. Count the number of detected apology data,
The estimation unit estimates at least one of satisfaction and dissatisfaction of the first conversation participant in the conversation based on the number of detections of the thanks expression data or the number of detections of the apology expression data.
The conversation analyzer according to

appendix

1 or 2.

（付記４）
　前記表現検出部は、前記特定表現データを前記お礼表現データと前記謝罪表現データとに区別可能に保持する特定表現テーブルに基づいて、前記特定表現データを検出することにより、前記お礼表現データの検出数及び前記謝罪表現データの検出数をそれぞれカウントし、
　前記推定部は、前記お礼表現データの検出数及び前記謝罪表現データの検出数に基づいて、前記会話における前記第１会話参加者の満足度及び不満度の少なくとも一方を推定する、
　付記１又は２に記載の会話分析装置。 (Appendix 4)
The expression detection unit detects the thanks expression data by detecting the specific expression data based on a specific expression table that holds the specific expression data in a distinguishable manner between the thanks expression data and the apology expression data. Count the number and the number of detected apology data,
The estimation unit estimates at least one of satisfaction and dissatisfaction of the first conversation participant in the conversation based on the number of detections of the thanks expression data and the number of detections of the apology expression data.
The conversation analyzer according to

appendix

1 or 2.

（付記５）
　少なくとも１つのコンピュータにより実行される会話分析方法において、
　第１会話参加者と第２会話参加者との間の会話のクロージング区間のみの音声に対応するデータから、該第１会話参加者により発声されたお礼表現データ及び該第２会話参加者により発声された謝罪表現データの少なくとも一方を特定表現データとして検出し、
　前記特定表現データの検出結果に応じて、前記会話における前記第１会話参加者の満足度又は不満度を推定する、
　ことを含む会話分析方法。 (Appendix 5)
In a conversation analysis method performed by at least one computer,
Thank-you expression data uttered by the first conversation participant and utterance by the second conversation participant from data corresponding to the voice of only the closing section of the conversation between the first conversation participant and the second conversation participant. Detecting at least one of the received apology expression data as specific expression data,
According to the detection result of the specific expression data, the satisfaction level or dissatisfaction level of the first conversation participant in the conversation is estimated.
Conversation analysis method including things.

（付記６）
　前記会話を含む所定形態の会話の音声認識に適合された基準音声認識パラメータが、前記特定表現データが他の単語データよりも認識され易くなるように重み付けされた音声認識パラメータを用いて、前記会話の前記クロージング区間の音声データに対して音声認識処理を行う、
　ことを更に含み、
　前記特定表現データの検出は、前記音声認識処理により得られる、前記会話の前記クロージング区間の音声テキストデータの中から、前記特定表現データを検出する、
　付記５に記載の会話分析方法。 (Appendix 6)
The speech recognition parameter weighted so that the specific expression data is more easily recognized than other word data is used as the reference speech recognition parameter adapted for speech recognition of a predetermined form of conversation including the conversation. Voice recognition processing is performed on the voice data of the closing section.
Further including
The specific expression data is detected by detecting the specific expression data from the speech text data of the closing section of the conversation obtained by the speech recognition process.
The conversation analysis method according to appendix 5.

（付記７）
　前記特定表現データを前記お礼表現データと前記謝罪表現データとに区別可能に保持する特定表現テーブルに基づいて、前記特定表現データを検出することにより、前記お礼表現データおよび前記謝罪表現データの少なくとも一方の検出数をカウントする、
　ことを更に含み、
　前記推定は、前記お礼表現データの検出数又は前記謝罪表現データの検出数に基づいて、前記会話における前記第１会話参加者の満足度及び不満度の少なくとも一方を推定する、
　付記５又は６に記載の会話分析方法。 (Appendix 7)
At least one of the thank-you expression data and the apology-expression data by detecting the specific-expression data based on a specific-expression table that holds the specific-expression data separately in the thank-you expression data and the apology-expression data Count the number of detected
Further including
The estimation estimates at least one of satisfaction and dissatisfaction of the first conversation participant in the conversation based on the number of detected thanksgiving data or the number of detected apology data.
The conversation analysis method according to appendix 5 or 6.

（付記８）
　前記特定表現データを前記お礼表現データと前記謝罪表現データとに区別可能に保持する特定表現テーブルに基づいて、前記特定表現データを検出することにより、前記お礼表現データの検出数及び前記謝罪表現データの検出数をそれぞれカウントする、
　ことを更に含み、
　前記推定は、前記お礼表現データの検出数及び前記謝罪表現データの検出数に基づいて、前記会話における前記第１会話参加者の満足度及び不満度の少なくとも一方を推定する、
　付記５又は６に記載の会話分析方法。 (Appendix 8)
By detecting the specific expression data based on a specific expression table that distinguishably holds the specific expression data between the thank-you expression data and the apology expression data, the number of thank-you expression data detected and the apology expression data Count the number of detected
Further including
The estimation is based on the number of detections of the thank-you expression data and the number of detections of the apology expression data, and estimates at least one of satisfaction and dissatisfaction of the first conversation participant in the conversation
The conversation analysis method according to appendix 5 or 6.

（付記９）
　少なくとも１つのコンピュータに、付記５から８のいずれか１つに記載の会話分析方法を実行させるプログラム。 (Appendix 9)
A program for causing at least one computer to execute the conversation analysis method according to any one of appendices 5 to 8.

（付記１０）付記９に記載のプログラムを記録したコンピュータが読み取り可能な記録媒体。 (Supplementary Note 10) A computer-readable recording medium on which the program according to Supplementary Note 9 is recorded.

　この出願は、２０１２年１０月３１日に出願された日本出願特願２０１２－２４０７５０号を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims priority based on Japanese Patent Application No. 2012-240750 filed on October 31, 2012, the entire disclosure of which is incorporated herein.

Claims

　第１会話参加者と第２会話参加者との間の会話のクロージング区間のみの音声に対応するデータから、該第１会話参加者により発声されたお礼表現データ及び該第２会話参加者により発声された謝罪表現データの少なくとも一方を特定表現データとして検出する表現検出部と、
　前記特定表現データの検出結果に応じて、前記会話における前記第１会話参加者の満足度又は不満度を推定する推定部と、
　を備える会話分析装置。 Thank-you expression data uttered by the first conversation participant and data uttered by the second conversation participant from data corresponding to the voice of only the closing section of the conversation between the first conversation participant and the second conversation participant. An expression detector that detects at least one of the apologized expression data as specific expression data;
An estimation unit for estimating satisfaction or dissatisfaction of the first conversation participant in the conversation according to a detection result of the specific expression data;
Conversation analyzer with
　前記表現検出部は、
　　前記会話を含む所定形態の会話の音声認識に適合された基準音声認識パラメータが、前記特定表現データが他の単語データよりも認識され易くなるように重み付けされた音声認識パラメータを用いて、前記会話の前記クロージング区間の音声データに対して音声認識処理を行う音声認識部、
　を含み、
　前記音声認識部の前記音声認識処理により得られる、前記会話の前記クロージング区間の音声テキストデータの中から、前記特定表現データを検出する
　請求項１に記載の会話分析装置。 The expression detection unit
The speech recognition parameter weighted so that the specific expression data is more easily recognized than other word data is used as the reference speech recognition parameter adapted for speech recognition of a predetermined form of conversation including the conversation. A speech recognition unit that performs speech recognition processing on the speech data of the closing section;
Including
The conversation analysis device according to claim 1, wherein the specific expression data is detected from speech text data in the closing section of the conversation obtained by the speech recognition process of the speech recognition unit.
　前記表現検出部は、前記特定表現データを前記お礼表現データと前記謝罪表現データとに区別可能に保持する特定表現テーブルに基づいて、前記特定表現データを検出することにより、前記お礼表現データおよび前記謝罪表現データの少なくとも一方の検出数をカウントし、
　前記推定部は、前記お礼表現データの検出数または前記謝罪表現データの検出数に基づいて、前記会話における前記第１会話参加者の満足度及び不満度の少なくとも一方を推定する、
　請求項１又は２に記載の会話分析装置。 The expression detection unit detects the specific expression data based on a specific expression table that holds the specific expression data in a distinguishable manner between the thanks expression data and the apology expression data. Count the number of detected apology data,
The estimation unit estimates at least one of satisfaction and dissatisfaction of the first conversation participant in the conversation based on the number of detections of the thanks expression data or the number of detections of the apology expression data.
The conversation analysis device according to claim 1 or 2.
　前記表現検出部は、前記特定表現データを前記お礼表現データと前記謝罪表現データとに区別可能に保持する特定表現テーブルに基づいて、前記特定表現データを検出することにより、前記お礼表現データの検出数及び前記謝罪表現データの検出数をそれぞれカウントし、
　前記推定部は、前記お礼表現データの検出数及び前記謝罪表現データの検出数に基づいて、前記会話における前記第１会話参加者の満足度及び不満度の少なくとも一方を推定する、
　請求項１又は２に記載の会話分析装置。 The expression detection unit detects the thanks expression data by detecting the specific expression data based on a specific expression table that holds the specific expression data in a distinguishable manner between the thanks expression data and the apology expression data. Count the number and the number of detected apology data,
The estimation unit estimates at least one of satisfaction and dissatisfaction of the first conversation participant in the conversation based on the number of detections of the thanks expression data and the number of detections of the apology expression data.
The conversation analysis device according to claim 1 or 2.
　少なくとも１つのコンピュータにより実行される会話分析方法において、
　第１会話参加者と第２会話参加者との間の会話のクロージング区間のみの音声に対応するデータから、該第１会話参加者により発声されたお礼表現データ及び該第２会話参加者により発声された謝罪表現データの少なくとも一方を特定表現データとして検出し、
　前記特定表現データの検出結果に応じて、前記会話における前記第１会話参加者の満足度又は不満度を推定する、
　ことを含む会話分析方法。 In a conversation analysis method performed by at least one computer,
Thank-you expression data uttered by the first conversation participant and data uttered by the second conversation participant from data corresponding to the voice of only the closing section of the conversation between the first conversation participant and the second conversation participant. Detecting at least one of the received apology expression data as specific expression data,
According to the detection result of the specific expression data, the satisfaction level or dissatisfaction level of the first conversation participant in the conversation is estimated.
Conversation analysis method including things.
　前記会話を含む所定形態の会話の音声認識に適合された基準音声認識パラメータが、前記特定表現データが他の単語データよりも認識され易くなるように重み付けされた音声認識パラメータを用いて、前記会話の前記クロージング区間の音声データに対して音声認識処理を行う、
　ことを更に含み、
　前記特定表現データの検出は、前記音声認識処理により得られる、前記会話の前記クロージング区間の音声テキストデータの中から、前記特定表現データを検出する、
　請求項５に記載の会話分析方法。 The speech recognition parameter weighted so that the specific expression data is more easily recognized than other word data is used as the reference speech recognition parameter adapted for speech recognition of a predetermined form of conversation including the conversation. Voice recognition processing is performed on the voice data of the closing section.
Further including
The specific expression data is detected by detecting the specific expression data from the speech text data of the closing section of the conversation obtained by the speech recognition process.
The conversation analysis method according to claim 5.
　前記特定表現データを前記お礼表現データと前記謝罪表現データとに区別可能に保持する特定表現テーブルに基づいて、前記特定表現データを検出することにより、前記お礼表現データおよび前記謝罪表現データの少なくとも一方の検出数をカウントする、
　ことを更に含み、
　前記推定は、前記お礼表現データの検出数又は前記謝罪表現データの検出数に基づいて、前記会話における前記第１会話参加者の満足度及び不満度の少なくとも一方を推定する、
　請求項５又は６に記載の会話分析方法。 At least one of the thank-you expression data and the apology-expression data by detecting the specific-expression data based on a specific-expression table that holds the specific-expression data separately in the thank-you expression data and the apology-expression data Count the number of detected
Further including
The estimation estimates at least one of satisfaction and dissatisfaction of the first conversation participant in the conversation based on the number of detected thanksgiving data or the number of detected apology data.
The conversation analysis method according to claim 5 or 6.
　前記特定表現データを前記お礼表現データと前記謝罪表現データとに区別可能に保持する特定表現テーブルに基づいて、前記特定表現データを検出することにより、前記お礼表現データの検出数及び前記謝罪表現データの検出数をそれぞれカウントする、
　ことを更に含み、
　前記推定は、前記お礼表現データの検出数及び前記謝罪表現データの検出数に基づいて、前記会話における前記第１会話参加者の満足度及び不満度の少なくとも一方を推定する、
　請求項５又は６に記載の会話分析方法。 By detecting the specific expression data based on a specific expression table that distinguishably holds the specific expression data between the thank-you expression data and the apology expression data, the number of thank-you expression data detected and the apology expression data Count the number of detected
Further including
The estimation is based on the number of detections of the thank-you expression data and the number of detections of the apology expression data, and estimates at least one of satisfaction and dissatisfaction of the first conversation participant in the conversation
The conversation analysis method according to claim 5 or 6.
　少なくとも１つのコンピュータに、請求項５から８のいずれか１項に記載の会話分析方法を実行させるプログラム。 A program for causing at least one computer to execute the conversation analysis method according to any one of claims 5 to 8.