JP4761391B2

JP4761391B2 - Listening quality evaluation method and apparatus

Info

Publication number: JP4761391B2
Application number: JP2007001787A
Authority: JP
Inventors: 郷志上村; 徳広福元; 秀昭山田
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2007-01-09
Filing date: 2007-01-09
Publication date: 2011-08-31
Anticipated expiration: 2027-01-09
Also published as: JP2008172365A

Description

本発明は、受聴品質評価方法および装置に係り、特に、パケット通信網を介して提供される音声や楽曲等の音声系IPパケットメディアサービスの品質を端末側で客観的に評価できる受聴品質評価方法および装置に関する。 The present invention relates to a listening quality evaluation method and apparatus, and in particular, a listening quality evaluation method capable of objectively evaluating the quality of a voice IP packet media service such as voice or music provided via a packet communication network on a terminal side. And device.

パケット通信網を利用したVoIPなどの音声サービスでは、音声品質がパケット通信網のトラヒック変動などのパケット転送品質に大きく影響される。そのため、サービス中に何らかの原因で音声品質が低下した際の品質を把握し、品質を回復・維持させるための制御に反映できるように、端末側において音声品質を測定・評価する技術が研究されている。 In voice services such as VoIP using a packet communication network, the voice quality is greatly affected by packet transfer quality such as traffic fluctuations in the packet communication network. Therefore, technology that measures and evaluates voice quality on the terminal side has been studied so that the quality when voice quality is degraded for some reason during service can be understood and reflected in the control to restore and maintain the quality. Yes.

従来、音声や楽曲品質の評価を行う際には、様々な年代の男女を被験者として実際に音声データを受聴させ、５段階の評点を行わせる主観評価法が用いられてきた。主観評価法は、全被験者から得られた評価結果を統計処理することによってMOS(Mean Opinion Score)と呼ばれるスカラ量を算出し、その値により音声や楽曲品質の良し悪しについて評価しようとするものである。 Conventionally, when evaluating voice and music quality, a subjective evaluation method has been used in which voice data is actually received by males and females of various ages, and a five-point score is given. The subjective evaluation method calculates the scalar quantity called MOS (Mean Opinion Score) by statistically processing the evaluation results obtained from all subjects, and uses that value to evaluate the quality of speech and music. is there.

しかしながら、主観評価法は被験者の収集、評価試験実施環境の構築作業などに多大な労力を要し、また被験者の負担も大きい。さらに、被験者の年代あるいは性別の偏りや試験当日の体調など、評価結果に大きな影響を与える不確定要素があるため、常に一定の評価結果を得ることが難しい。 However, the subjective evaluation method requires a great amount of labor for collecting subjects and constructing an environment for conducting an evaluation test, and the burden on the subject is also great. Furthermore, since there are uncertain factors that greatly affect the evaluation result, such as the age or gender bias of the subject and the physical condition of the test day, it is difficult to always obtain a constant evaluation result.

これを背景に、主観評価の要因となる物理的な特徴量（例えば、歪み）を選び出し、それら特徴量を客観的に測定することにより、人手を介さずに品質を評価する客観評価法がこれまでに幾つか提案されてきた。 Against this backdrop, this is an objective evaluation method that evaluates quality without human intervention by selecting physical features (for example, distortion) that cause subjective evaluation and objectively measuring those features. Some have been proposed.

音声や楽曲の信号品質の客観評価装置は、原信号を参照信号として用い、評価対象信号との比較により推定MOS値を算出するイントルーシブ方式、および評価対象信号のみを分析し推定MOS値を算出するノンイントルーシブ方式に大別できる。音声信号のイントルーシブ方式としてITU-T勧告P.862、同音声信号のノンイントルーシブ方式としてITU-T勧告P.563，G.107が国際標準化されている。 An objective evaluation device for signal quality of voice and music uses an original signal as a reference signal, calculates the estimated MOS value by comparing with the evaluation target signal, and calculates the estimated MOS value by analyzing only the evaluation target signal It can be roughly divided into non-intrusive methods. ITU-T recommendations P.862 and ITU-T recommendations P.563 and G.107 are internationally standardized as intrusive systems for audio signals, and non-intrusive systems for audio signals.

ITU-T勧告P.862はPESQ（Perceptual Evaluation of Speech Quality）として知られており、主観評価法により得られるMOS値との相関も良好であり、信頼性の高い音声評価法として一般に用いられている。ITU-T勧告G.107は、エコー量およびパケットロス率、ディレイ等のパケット転送品質情報を元にR値と呼ばれるスカラ量を算出し、その値により音声品質の良し悪しについて評価しようとするものである。 ITU-T recommendation P.862 is known as PESQ (Perceptual Evaluation of Speech Quality), and has good correlation with MOS value obtained by subjective evaluation method, and is generally used as a highly reliable speech evaluation method. Yes. ITU-T recommendation G.107 calculates the amount of scalar called R value based on packet transfer quality information such as echo amount, packet loss rate, delay, etc., and uses that value to evaluate the quality of voice. It is.

パケット通信網を利用した音声サービスにおいて、その音声品質を端末エンドで評価する従来技術として特許文献１がある。特許文献１では、リアルタイム通信型プロトコルであるRTP（Real-time Transport Protocol）およびその制御プロトコルであるRTCP（RTP Control Protocol）を利用して、パケットロス率、ディレイ、ジッタ等のIPパケット転送品質情報を収集することによって音声品質の評価を行っている。
特開２００６−１５７５０７号公報 In a voice service using a packet communication network, there is Patent Document 1 as a conventional technique for evaluating the voice quality at a terminal end. In Patent Document 1, IP packet transfer quality information such as packet loss rate, delay, and jitter is utilized by using RTP (Real-time Transport Protocol), which is a real-time communication protocol, and RTCP (RTP Control Protocol), which is a control protocol thereof. Voice quality is evaluated by collecting
JP 2006-157507 A

上述のイントルーシブ方式では、客観評価値を算出する際に評価用の参照信号を必要とするため、サービス中に受信側でその音声品質を測定することは困難である。また、ノンイントルーシブ方式は前述の参照信号を必要としないため、サービス中に受信側でその品質を測定可能であるものの、対象となる信号に対して周波数分析、ピッチ抽出等の処理負荷の高い信号処理を施す必要がある。このため、CPU能力およびメモリ量が限られている携帯端末装置等では、その処理を行うことは困難である。 In the intrusive method described above, since an evaluation reference signal is required when calculating an objective evaluation value, it is difficult to measure the voice quality on the receiving side during service. In addition, since the non-intrusive method does not require the above-mentioned reference signal, its quality can be measured on the receiving side during service, but the processing load such as frequency analysis and pitch extraction on the target signal is reduced. It is necessary to perform high signal processing. For this reason, it is difficult to perform the processing in a portable terminal device or the like with limited CPU capacity and memory capacity.

一方、ITU-T勧告G.107をはじめ、パケットロス率、ジッタ、ディレイ等のパケット転送品質情報を用いて音声品質を算出する方法は、前述の端末上でも比較的容易に実装可能である。ただし、この方法による音声品質の評価結果は、実際に人間が知覚する音声品質と大きく乖離する場合がある。これは、送信されて来る全てのパケットが一様に音声品質評価をする上で重要なデータを含んでいることを前提とし、発生したパケットロス全てを区別なく等価に扱っているためである。 On the other hand, the method of calculating voice quality using packet transfer quality information such as packet loss rate, jitter, delay and the like, including ITU-T recommendation G.107, can be implemented relatively easily on the aforementioned terminal. However, the voice quality evaluation result according to this method may greatly deviate from the voice quality actually perceived by humans. This is because all packets that have been transmitted are treated equally and indifferently on the premise that all packets that are transmitted contain data that is important for evaluating voice quality uniformly.

ところで、通常の通話音声の場合、単語と単語の間における息継ぎ、沈黙など無音区間が存在するため、有音区間は会話全体の４０％程度であることが知られている。したがって、VoIPサービス中に多くのパケットがロスした場合であっても、それらが全て無音区間で発生していればユーザは音質の劣化をほとんど知覚することはない。 By the way, in the case of normal call voice, since there are silent sections such as breathing and silence between words, it is known that the voiced section is about 40% of the whole conversation. Therefore, even when many packets are lost during the VoIP service, the user hardly perceives the deterioration of the sound quality if they all occur in the silent period.

一方、わずかなパケットロスであっても、それらが母音部あるいは話頭部などで発生したものであれば、ユーザは著しく音質が劣化したと感じる。また、パケットロスがバースト的に発生する場合とランダム的に発生する場合とでは、ユーザの知覚する音声品質が大きく異なるため、単にIPレベルで観測したパケットロス率の大小によって受聴品質を精度良く推定することは困難である。 On the other hand, even if there is a slight packet loss, the user feels that the sound quality has deteriorated remarkably if they occur in the vowel part or the head of the talk. Also, since the voice quality perceived by the user differs greatly between when packet loss occurs in a burst manner and when it occurs randomly, the listening quality is accurately estimated simply by the magnitude of the packet loss rate observed at the IP level. It is difficult to do.

本発明の目的は、上記した従来技術の課題を解決し、パケット通信網を介して提供される音声や楽曲等の音声系のIPパケットメディアサービスの品質を端末側で客観的に評価できる受聴品質評価方法および装置を提供することにある。 The object of the present invention is to solve the above-mentioned problems of the prior art, and listening quality capable of objectively evaluating the quality of voice IP packet media services such as voice and music provided via the packet communication network on the terminal side It is to provide an evaluation method and apparatus.

上記した目的を達成するために、本発明の受聴品質評価装置は、パケットロスを検出するパケットロス検出手段と、受信パケットのペイロード情報に基づいてロスパケットの特徴量を予測する特徴量予測手段と、この特徴量の予測結果に基づいて、各ロスパケットに対して、受聴品質に与える影響が反映された重み付けを行う加重処理手段と、各ロスパケットの重み情報に基づいて、パケット転送品質に関する物理パラメータを算出する物理パラメータ算出手段と、算出された物理パラメータに基づいて受聴品質を算出する信号品質評価手段とを含むことを特徴とする。 In order to achieve the above object, the listening quality evaluation apparatus of the present invention includes a packet loss detection unit that detects a packet loss, and a feature amount prediction unit that predicts a feature amount of a lost packet based on payload information of a received packet. Based on the prediction result of the feature amount, weighting processing means for weighting each loss packet reflecting the influence on the listening quality, and physical information on the packet transfer quality based on the weight information of each loss packet It includes physical parameter calculation means for calculating parameters, and signal quality evaluation means for calculating listening quality based on the calculated physical parameters.

また、本発明では前記特徴量予測手段が、受信パケットのペイロード情報から求まる正規化パワー、ゼロ交叉数、および／または符号化情報に基づいてロスパケットの符号化情報を予測し、前記加重処理手段が、ロスパケットの特徴量と所定の閾値との比較結果に基づいてロスパケットを重み付けし、重み付けされたロスパケットに基づいて受聴品質が算出されるようにしたことを特徴とする。 In the present invention, the feature amount predicting means predicts encoding information of a lost packet based on normalized power, zero crossing number, and / or encoding information obtained from payload information of the received packet, and the weighting processing means However, the loss packet is weighted based on the comparison result between the characteristic amount of the loss packet and a predetermined threshold value, and the listening quality is calculated based on the weighted loss packet.

本発明によれば、以下のような効果が達成される。
(1)各ロスパケットに対して、受聴品質に与える影響が反映された重み付けが行われ、重み付けされたロスパケットに基づいて受聴品質が評価されるので、パケット通信網を介して提供される音声や楽曲等の音声系のIPパケットメディアサービスの品質を端末側で客観的に評価できるようになる。
(2)受聴品質を評価する際に、受信信号に対して周波数分析やピッチ抽出等の処理負荷の高い信号処理を施す必要がないので、携帯電話相当の非力な処理能力しか持たない端末装置においても受聴品質の評価が可能になる。 According to the present invention, the following effects are achieved.
(1) Each loss packet is weighted to reflect the effect on listening quality, and the listening quality is evaluated based on the weighted loss packet, so the audio provided via the packet communication network The quality of voice IP packet media services such as music and music can be objectively evaluated on the terminal side.
(2) When evaluating listening quality, it is not necessary to perform signal processing with high processing load such as frequency analysis or pitch extraction on the received signal. The listening quality can be evaluated.

以下、図面を参照して本発明の最良の実施の形態について詳細に説明する。図１は、本発明に係る受聴品質評価装置を備えた携帯電話端末１の主要部の構成を示したブロック図であり、ここでは、本発明の説明に不用な構成は図示が省略されている。 DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, the best embodiment of the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a main part of a mobile phone terminal 1 provided with a listening quality evaluation apparatus according to the present invention, and here, configurations unnecessary for the description of the present invention are omitted. .

パケットロス検出部１０１は、一定周期(例えば、５秒)の間に受信された音声パケットを対象にパケットロスの検出を行う。図２は、VoIPサービスの通信プロトコルとして利用されるリアルタイム伝送プロトコルRTPのRTPヘッダフォーマットを示した図である。 The packet loss detection unit 101 detects packet loss for voice packets received during a certain period (for example, 5 seconds). FIG. 2 is a diagram showing an RTP header format of the real-time transmission protocol RTP used as a communication protocol for the VoIP service.

ヘッダ内のシーケンス番号(sequence number)フィールドには、送信側におけるパケット送信順に従って、その値が１ずつ増加する値が挿入される。したがって、今回の受信パケットのシーケンス番号が、直前に受信したパケットのシーケンス番号に１を加えた値より大きければ、今回と前回との間でパケットロスが発生したと判断できる。 In the sequence number field in the header, a value whose value increases by 1 is inserted according to the packet transmission order on the transmission side. Therefore, if the sequence number of the current received packet is larger than the value obtained by adding 1 to the sequence number of the packet received immediately before, it can be determined that a packet loss has occurred between this time and the previous time.

パケットロス検出部１０１は、受信パケットのsequence numberフィールドを観測することにより、図３に一例を示したように、IPレベルで観察できるロスパターン情報を生成する。ここで生成されたロスパターン情報は、特徴量予測部１０２および加重処理部１０３へ提供される。 By observing the sequence number field of the received packet, the packet loss detection unit 101 generates loss pattern information that can be observed at the IP level as shown in FIG. The loss pattern information generated here is provided to the feature amount prediction unit 102 and the weight processing unit 103.

特徴量予測部１０２は、ロスパケットの特徴量を、当該ロスパケットの送信タイミングの直前に送信されて携帯電話端末１に到着した受信パケットのペイロード情報に基づいて予測する。 The feature amount prediction unit 102 predicts the feature amount of the lost packet based on the payload information of the received packet that is transmitted immediately before the transmission timing of the lost packet and arrives at the mobile phone terminal 1.

図４は、本実施形態におけるロスパケットの特徴量の予測方法を説明するための図であり、８秒間の音声データを２０ミリ秒毎に切り出し、その正規化パワーを算出した様子を示している。 FIG. 4 is a diagram for explaining a method for predicting a feature amount of a lost packet according to the present embodiment, and shows a state in which audio data for 8 seconds is cut out every 20 milliseconds and the normalized power thereof is calculated. .

図示したように、音声信号の正規化パワーは局所的に観察すると比較的滑らかに変化していることがわかる。すなわち、連続する２つの音声パケットについて、その正規化パワーの大きさは類似していると言える。そこで、本実施形態ではパケットロスが検出されると、その直前の受信パケットから音声信号に相当するペイロード情報を抽出して正規化パワーを算出し、この正規化パワーがロスパケットの特徴量として予測される。ここで算出された各ロスパケットの特徴量（正規化パワー）は加重処理部１０３に供給される。 As shown in the figure, it can be seen that the normalized power of the audio signal changes relatively smoothly when observed locally. That is, it can be said that the normalization power of two consecutive voice packets is similar. Therefore, in this embodiment, when a packet loss is detected, payload information corresponding to the audio signal is extracted from the immediately preceding received packet to calculate normalized power, and this normalized power is predicted as a feature amount of the lost packet. Is done. The characteristic amount (normalized power) of each loss packet calculated here is supplied to the weighting processing unit 103.

前記加重処理部１０３は、前記パケットロス検出部１０１から提供されるロスパターン情報、および前記特徴量予測部１０２から提供される各ロスパケットの特徴量（本実施形態では、正規化パワー）に基づいて重み付きロスパターン情報を生成する。 The weighting processing unit 103 is based on the loss pattern information provided from the packet loss detection unit 101 and the feature amount (normalized power in this embodiment) of each loss packet provided from the feature amount prediction unit 102. To generate weighted loss pattern information.

加重処理部１０３には、各ロスパケットが音声受聴品質に実際に与える影響度合を、その特徴量に基づいて定量的に定義する影響度合テーブル１０３ａが予め登録されている。図５は、影響度合テーブル１０３ａの一例を示した図であり、ロスパケットの特徴量と影響度合αとが対応付けられており、各ロスパケットが、音声受聴品質に与える影響度合の観点から何個分のロスパケットに相当するかが、実数で表現される影響度合αとして登録されている。 In the weighting processing unit 103, an influence degree table 103a that quantitatively defines the influence degree that each loss packet actually has on the voice listening quality based on the feature amount is registered in advance. FIG. 5 is a diagram showing an example of the influence degree table 103a, in which the characteristic amount of the lost packet and the influence degree α are associated with each other, and what is considered from the viewpoint of the influence degree each loss packet has on the voice listening quality. Whether it corresponds to the number of lost packets is registered as an influence degree α expressed by a real number.

本実施形態のように、ロスパケットの特徴量として正規化パワーを採用するのであれば、前記影響度合テーブル１０３ａは、図６に一例を示したように定義される。そして、正規化パワーPが０〜−P1の範囲であれば、パケットロスが有音区間で発生したものと判断され、１個のロスパケットが通常通り一個のパケットロスとして扱われる。 If the normalized power is adopted as the characteristic amount of the lost packet as in the present embodiment, the influence degree table 103a is defined as shown as an example in FIG. If the normalized power P is in the range of 0 to -P1, it is determined that a packet loss has occurred in a sound section, and one lost packet is treated as one packet loss as usual.

これに対して、正規化パワーPが−P1〜の範囲であれば、ロスパケットのペイロードが無音に相当し、パケットロスが無音区間で発生したものと判断されるので、１個のロスパケットが０個のパケットロスとして扱われる。すなわち、ロスパケットが音声受聴品質に対してほとんど影響を与えないので、パケットロスが発生しなかったものとして扱われる。 On the other hand, if the normalized power P is in the range of −P1˜, the payload of the lost packet corresponds to silence, and it is determined that the packet loss has occurred in the silent section. Treated as 0 packet loss. That is, since the lost packet has little influence on the voice listening quality, it is treated that no packet loss has occurred.

前記加重処理部１０３はさらに、図７に示したように、前記パケットロス検出部１０１から提供されるロスパターン情報に前記各ロスパケットの影響度合αを乗じて重み付きロスパターンを出力する。 Further, as shown in FIG. 7, the weight processing unit 103 multiplies the loss pattern information provided from the packet loss detection unit 101 by the influence degree α of each loss packet and outputs a weighted loss pattern.

図１へ戻り、物理パラメータ算出部１０４は、端末装置１で収集可能な物理パラメータのうち、予め受聴品質への影響が大きいパラメータの値を算出する。端末装置１で収集可能な物理パラメータには、例えば図８に示したように、RTCP XR VoIP Metricsで定義される物理パラメータがある。 Returning to FIG. 1, among the physical parameters that can be collected by the terminal device 1, the physical parameter calculation unit 104 calculates in advance the values of parameters that have a large influence on the listening quality. The physical parameters that can be collected by the terminal device 1 include physical parameters defined by RTCP XR VoIP Metrics as shown in FIG. 8, for example.

本発明では、これらのパラメータのうち、例えば図９に示したようなパラメータを受聴品質への影響が大きいパラメータとして予め選別しておく。物理パラメータ算出部１０４は、前記加重処理部１０３から提供される重み付きロスパターンを利用して、前記選別された物理パラメータを算出する。ここで算出された物理パラメータは信号品質評価部１０５へ提供される。 In the present invention, among these parameters, for example, the parameters as shown in FIG. 9 are selected in advance as parameters having a large influence on listening quality. The physical parameter calculation unit 104 calculates the selected physical parameter using the weighted loss pattern provided from the weighting processing unit 103. The physical parameters calculated here are provided to the signal quality evaluation unit 105.

信号品質評価部１０５は、図１０に示したように、前記物理パラメータ算出部１０４で算出された各物理パラメータに基づいて受聴品質を算出するための変換関数を備えている。 As shown in FIG. 10, the signal quality evaluation unit 105 includes a conversion function for calculating listening quality based on each physical parameter calculated by the physical parameter calculation unit 104.

この変換関数は、例えば、使用する物理パラメータを説明変数、信号品質の主観評価値を目的変数として、重回帰分析を行うことにより算出できる。また、使用する物理パラメータと信号品質の主観評価値とを対にして、統計モデルを用いた学習を行うことによって変換関数を算出してもよい。 This conversion function can be calculated, for example, by performing multiple regression analysis using the physical parameters to be used as explanatory variables and the subjective evaluation value of signal quality as an objective variable. Alternatively, the conversion function may be calculated by performing learning using a statistical model with a pair of a physical parameter to be used and a subjective evaluation value of signal quality.

図１１は、本発明による音声受聴品質の評価結果(MOS_proposed)とITU-T勧告P.862.1による音声受聴品質評価結果(MOS_p862.1)との相関を表している。この結果において、両評価結果の相関係数は0.85を上回っている。図１２は、次式(1)を用いて算出した推定誤差を表している。 FIG. 11 shows the correlation between the voice listening quality evaluation result (MOS_proposed) according to the present invention and the voice listening quality evaluation result (MOS_p862.1) according to ITU-T recommendation P.862.1. In this result, the correlation coefficient of both evaluation results exceeds 0.85. FIG. 12 represents an estimation error calculated using the following equation (1).

図１２において、約８４％の評価データは誤差が±０．２５以内に収まっており、約９７％の評価データは誤差が±０．５以内に収まっている。これらの評価結果から、本発明によれば音声受聴品質を精度良く推定できることがわかる。 In FIG. 12, about 84% of the evaluation data has an error within ± 0.25, and about 97% of the evaluation data has an error within ± 0.5. From these evaluation results, it can be seen that according to the present invention, the voice listening quality can be accurately estimated.

なお、上記した実施形態では、正規に受信したパケットのペイロード情報に基づいてロスパケットの特徴量を正規化パワーとして予測し、この正規化パワーに基づいて各ロスパケットに重み付けを行うものとして説明したが、本発明はこれのみに限定されるものではなく、受信パケットのペイロード情報に基づいてゼロ交叉数や符号化データ（例えば、ピッチ、コードブックインデックス番号）を算出し、これらをロスパケットの特徴量としても良い。 In the above-described embodiment, the characteristic amount of the lost packet is predicted as the normalized power based on the payload information of the normally received packet, and each lost packet is weighted based on the normalized power. However, the present invention is not limited to this, and the zero crossing number and encoded data (for example, pitch, codebook index number) are calculated based on the payload information of the received packet, and these are the characteristics of the lost packet. It is good as quantity.

図１３は、ロスパケットの特徴量としてゼロ交叉数を採用した場合に、前記加重処理部１０３に用意される影響度合テーブル１０３ｂの一例を示した図であり、ゼロ交叉数と閾値P1との大小関係に応じて影響度合αが求められる。 FIG. 13 is a diagram showing an example of the degree of influence table 103b prepared in the weighting processing unit 103 when the zero crossing number is adopted as the feature quantity of the lost packet. The magnitude of the zero crossing number and the threshold value P1 is shown in FIG. The degree of influence α is determined according to the relationship.

同様に、図１４は、ロスパケットの特徴量として符号化データ（ピッチ）を採用した場合に、前記加重処理部１０３に用意される影響度合テーブル１０３ｃの一例を示した図であり、ピッチと閾値P1との大小関係に応じて影響度合αが求められる。 Similarly, FIG. 14 is a diagram showing an example of the influence degree table 103c prepared in the weighting processing unit 103 when encoded data (pitch) is adopted as the feature amount of the lost packet, and the pitch and threshold value are shown. The degree of influence α is determined according to the magnitude relationship with P1.

また、上記した実施形態では、特徴量予測部１０２は受信パケットのペイロード情報に基づいてロスパケットの特徴量を一つだけ予測するものとして説明したが、本発明はこれのみに限定されるものではなく、前記特徴量予測部１０２に、受信パケットのペイロード情報に基づいてロスパケットの正規化パワーを予測する第１予測部、ゼロ交叉数を予測する第２予測部、および符号化データを予測する第３予測部の少なくとも２つを設け、評価対象となるサービスの種類（例えば、音声、楽曲）および符号化方式(例えば、EVRC、AAC)に応じていずれかの予測部を選択し、荷重処理部１０３が、選択された特徴量に基づいて重み付きロスパターンを生成するようにしても良い。 In the above-described embodiment, the feature amount prediction unit 102 has been described as predicting only one feature amount of the lost packet based on the payload information of the received packet. However, the present invention is not limited to this. Rather, the feature amount prediction unit 102 predicts the first prediction unit that predicts the normalized power of the lost packet based on the payload information of the received packet, the second prediction unit that predicts the number of zero crossings, and the encoded data. At least two third prediction units are provided, and one of the prediction units is selected according to the type of service to be evaluated (for example, voice, music) and the encoding method (for example, EVRC, AAC), and load processing is performed. The unit 103 may generate a weighted loss pattern based on the selected feature amount.

例えば、サービス種類が音声であれば、無音区間と有音区間との識別が比較的容易なので、特徴量として正規化パワーやゼロ交叉数を採用し、サービス種類が楽曲であれば、無音区間と有音区間との識別が難しいので、特徴量として符号化データを採用するようにしても良い。 For example, if the service type is voice, it is relatively easy to distinguish between the silent section and the voiced section. Therefore, the normalized power or zero crossing number is used as the feature amount. Since it is difficult to distinguish from a sound section, encoded data may be employed as a feature amount.

さらに、図５，６、１３，１４に関して説明した影響度合テーブルにおいて、特徴量と影響度合αとを関連づけるパラメータP1，P2は固定値である必要はなく、評価対象となるサービスの種類や符号化方式に応じてパラメータP1，P2が変化するようにしても良い。 Furthermore, in the influence degree table described with reference to FIGS. 5, 6, 13, and 14, the parameters P1 and P2 for associating the feature quantity with the influence degree α do not need to be fixed values, and the type of service to be evaluated and the encoding. The parameters P1 and P2 may be changed according to the method.

本発明に係る受聴品質評価装置を備えた携帯端末のブロック図である。It is a block diagram of the portable terminal provided with the listening quality evaluation apparatus which concerns on this invention. RTPパケットのヘッダフォーマットを示した図である。It is the figure which showed the header format of the RTP packet. パケットロス検出部で生成されるロスパターンの一例を示した図である。It is the figure which showed an example of the loss pattern produced | generated by a packet loss detection part. 信号波形と正規化パワーとの関係を示した図である。It is the figure which showed the relationship between a signal waveform and normalized power. 加重処理部に備わる影響度合テーブルの一例を示した図である。It is the figure which showed an example of the influence degree table with which a weighting process part is equipped. 正規化パワーと影響度合αとを対応付ける影響度合テーブルの一例を示した図である。It is the figure which showed an example of the influence degree table which matches normalized power and influence degree (alpha). 影響度合αに基づいて重み付きロスパターンを生成する手順を示した図である。It is the figure which showed the procedure which produces | generates a weighted loss pattern based on the influence degree (alpha). RTCP XR VoIP Metricsのパケットフォーマットを示した図である。It is the figure which showed the packet format of RTCP XR VoIP Metrics. 物理パラメータ算出部において算出される物理パラメータの一例を示した図である。It is the figure which showed an example of the physical parameter calculated in a physical parameter calculation part. 信号品質評価部の構成を示したブロック図である。It is the block diagram which showed the structure of the signal quality evaluation part. 発明による音声受聴品質の評価結果とITU-T勧告P.862.1による音声受聴品質評価結果との相関を示した図である。FIG. 9 is a diagram showing a correlation between an evaluation result of voice listening quality according to the invention and a voice listening quality evaluation result according to ITU-T recommendation P.862.1. 本発明による音声品質評価値の推定誤差を示した図である。It is the figure which showed the estimation error of the audio | voice quality evaluation value by this invention. ゼロ交叉数と影響度合αとを対応付ける影響度合テーブルの一例を示した図である。It is the figure which showed an example of the influence degree table which matches zero crossing number and influence degree (alpha). ピッチと影響度合αとを対応付ける影響度合テーブルの一例を示した図である。It is the figure which showed an example of the influence degree table which matches pitch and influence degree (alpha).

符号の説明Explanation of symbols

１…携帯電話端末，１０１…パケットロス検出部，１０２…特徴量予測部，１０３…加重処理部，１０４…物理パラメータ算出部，１０５…信号品質評価部
DESCRIPTION OF SYMBOLS 1 ... Mobile phone terminal, 101 ... Packet loss detection part, 102 ... Feature-value prediction part, 103 ... Weighting process part, 104 ... Physical parameter calculation part, 105 ... Signal quality evaluation part

Claims

パケット通信網を利用して提供される音声系IPパケットメディアサービスの品質を評価する受聴品質評価装置において、
パケットロスを検出するパケットロス検出手段と、
前記ロスパケットの直前の受信パケットのペイロード情報に基づいてロスパケットの特徴量を予測する特徴量予測手段と、
前記特徴量の予測結果に基づいて、各ロスパケットに対して、受聴品質に与える影響が反映された重み付けを行う加重処理手段と、
前記各ロスパケットの重み情報に基づいて、パケット転送品質に関する物理パラメータを算出する物理パラメータ算出手段と、
前記算出された物理パラメータに基づいて受聴品質を算出する信号品質評価手段とを含むことを特徴とする受聴品質評価装置。 In listening quality evaluation device for evaluating the quality of voice IP packet media service provided using packet communication network,
Packet loss detection means for detecting packet loss;
A feature amount predicting means for predicting a feature amount of a lost packet based on payload information of a received packet immediately before the lost packet;
Based on the prediction result of the feature amount, weighting processing means for performing weighting that reflects the influence on listening quality for each loss packet;
Physical parameter calculation means for calculating a physical parameter related to packet transfer quality based on the weight information of each lost packet;
A listening quality evaluation device comprising: signal quality evaluation means for calculating listening quality based on the calculated physical parameter.

前記特徴量予測手段が、前記ロスパケットの直前の受信パケットのペイロード情報から求まる正規化パワーに基づいてロスパケットの正規化パワーを予測する第１予測手段を含み、
前記加重処理手段は、前記正規化パワーの予測結果に基づいて各ロスパケットに重み付けを行うことを特徴とする請求項１に記載の受聴品質評価装置。 The feature amount prediction means includes first prediction means for predicting the normalized power of the lost packet based on the normalized power obtained from the payload information of the received packet immediately before the lost packet ;
The listening quality evaluation apparatus according to claim 1, wherein the weighting processing unit weights each loss packet based on a prediction result of the normalized power.

前記特徴量予測部が、前記ロスパケットの直前の受信パケットのペイロード情報から求まるゼロ交叉数に基づいてロスパケットのゼロ交叉数を予測する第２予測手段を含み、
前記加重処理手段は、前記ゼロ交叉数の予測結果に基づいて各ロスパケットに重み付けを行うことを特徴とする請求項１に記載の受聴品質評価装置。 The feature amount prediction unit includes second prediction means for predicting the zero crossing number of the lost packet based on the zero crossing number obtained from the payload information of the received packet immediately before the lost packet ;
The listening quality evaluation apparatus according to claim 1, wherein the weighting processing unit weights each loss packet based on the prediction result of the zero crossing number.

前記特徴量予測部が、前記ロスパケットの直前の受信パケットのペイロード情報から求まる符号化情報に基づいてロスパケットの符号化情報を予測する第３予測手段を含み、
前記加重処理手段は、前記符号化情報の予測結果に基づいて各ロスパケットに重み付けを行うことを特徴とする請求項１に記載の受聴品質評価装置。 The feature amount prediction unit includes third prediction means for predicting the encoded information of the lost packet based on the encoded information obtained from the payload information of the received packet immediately before the lost packet ;
The listening quality evaluation apparatus according to claim 1, wherein the weighting processing unit weights each loss packet based on a prediction result of the encoded information.

前記特徴量予測部が、
前記ロスパケットの直前の受信パケットのペイロード情報に基づいてロスパケットの異なる特徴量を予測する複数の予測手段と、
メディアサービスの種類に応じていずれかの予測手段を選択する選択手段とを備え、
前記加重処理手段は、選択された特徴量に基づいて各ロスパケットに重み付けを行うことを特徴とする請求項１に記載の受聴品質評価装置。 The feature amount prediction unit
A plurality of prediction means for predicting different feature quantities of the lost packet based on payload information of the received packet immediately before the lost packet;
Selecting means for selecting one of the prediction means according to the type of media service,
The listening quality evaluation apparatus according to claim 1, wherein the weighting processing unit weights each loss packet based on the selected feature amount.

前記特徴量予測手段が、
前記ロスパケットの直前の受信パケットのペイロード情報から求まる正規化パワーに基づいてロスパケットの正規化パワーを予測する第１予測手段、前記ロスパケットの直前の受信パケットのペイロード情報から求まるゼロ交叉数に基づいてロスパケットのゼロ交叉数を予測する第２予測手段、および前記ロスパケットの直前の受信パケットのペイロード情報から求まる符号化情報に基づいてロスパケットの符号化情報を予測する第３予測手段の少なくとも２つを備えたことを特徴とする請求項５に記載の受聴品質評価装置。 The feature amount predicting means is
First prediction means for predicting a normalized power loss packet based on normalized power obtained from the payload information of the received packet immediately before the loss packet, zero crossing number which is obtained from the payload information of the received packet immediately before the loss packet Second prediction means for predicting the zero crossing number of the lost packet based on the second prediction means, and third prediction means for predicting the encoded information of the lost packet based on the encoded information obtained from the payload information of the received packet immediately before the lost packet The listening quality evaluation apparatus according to claim 5, comprising at least two.

前記加重処理手段が、ロスパケットの特徴量を所定の閾値と比較し、この比較結果に基づいてロスパケットを重み付けすることを特徴とする請求項１ないし６のいずれかに記載の受聴品質評価装置。 7. The listening quality evaluation apparatus according to claim 1, wherein the weighting processing unit compares the characteristic amount of the lost packet with a predetermined threshold and weights the lost packet based on the comparison result. .

前記所定の閾値が、メディアサービス種別に応じて異なることを特徴とする請求項７に記載の受聴品質評価装置。 8. The listening quality evaluation apparatus according to claim 7, wherein the predetermined threshold value varies depending on a media service type.

前記加重処理手段が、IPレベルで観測できるロスパターンに対して、受聴品質に与える影響を考慮した重み付けを行うことを特徴とする請求項１ないし８のいずれかに記載の受聴品質評価装置。 9. The listening quality evaluation apparatus according to claim 1, wherein the weighting processing unit weights a loss pattern observable at an IP level in consideration of an influence on listening quality.

前記物理パラメータ算出手段が、受聴品質に影響する物理パラメータを算出することを特徴とする請求項１ないし９のいずれかに記載の受聴品質評価装置。 The listening quality evaluation apparatus according to claim 1, wherein the physical parameter calculation unit calculates a physical parameter that affects listening quality.

前記信号品質評価手段が、前記物理パラメータ算出手段において算出された物理パラメータに基づいて受聴品質を算出する変換関数を備えたことを特徴とする請求項１ないし１０のいずれかに記載の受聴品質評価装置。 The listening quality evaluation according to any one of claims 1 to 10, wherein the signal quality evaluation means includes a conversion function for calculating listening quality based on the physical parameter calculated by the physical parameter calculation means. apparatus.

前記パケットロス検出手段が、受信パケットのヘッダに登録されているシーケンス番号に基づいてパケットロスを検出することを特徴とする請求項１ないし１１のいずれかに記載の受聴品質評価装置。 12. The listening quality evaluation apparatus according to claim 1, wherein the packet loss detection unit detects a packet loss based on a sequence number registered in a header of a received packet.

パケット通信網を利用して提供される音声系IPパケットメディアサービスの品質を評価する受聴品質評価方法において、
パケットロスを検出する手順と、
前記ロスパケットの直前の受信パケットのペイロード情報に基づいて前記ロスパケットの特徴量を予測する手順と、
前記特徴量の予測結果に基づいて、各ロスパケットに対して、受聴品質に与える影響が反映された重み付けを行う手順と、
前記各ロスパケットの重み情報に基づいて、パケット転送品質に関する物理パラメータを算出する手順と、
前記算出された物理パラメータに基づいて受聴品質を算出する手順とを含むことを特徴とする受聴品質評価方法。 In a listening quality evaluation method for evaluating the quality of a voice IP packet media service provided using a packet communication network,
A procedure to detect packet loss;
A procedure for predicting a characteristic amount of the lost packet based on payload information of a received packet immediately before the lost packet;
Based on the prediction result of the feature quantity, a procedure for performing weighting reflecting the influence on listening quality for each loss packet;
A procedure for calculating physical parameters related to packet transfer quality based on the weight information of each lost packet;
A listening quality evaluation method comprising: calculating listening quality based on the calculated physical parameter.