CN112509603B

CN112509603B - Voice quality assessment method, device and system

Info

Publication number: CN112509603B
Application number: CN202011400170.0A
Authority: CN
Inventors: 吕非彼; 朱佳佳; 田元兵; 乔金剑; 刘亮; 马昱
Original assignee: China United Network Communications Group Co Ltd
Current assignee: China United Network Communications Group Co Ltd
Priority date: 2020-12-01
Filing date: 2020-12-01
Publication date: 2023-08-08
Anticipated expiration: 2040-12-01
Also published as: CN112509603A

Abstract

The application provides a voice quality assessment method, a device and a system, which relate to the technical field of communication, and can ensure the accuracy of voice quality assessment and reduce the cost of voice quality assessment. The method comprises the following steps: receiving recording information of M calls acquired by terminal equipment; the recording information of a call includes: call corpus of calls, on-time stamp of calls, release-time stamp of calls; m is greater than or equal to 1; inputting the record information of the M calls into a MOS algorithm model, and calculating to obtain voice quality values of the M calls; the MOS algorithm model supports parallel evaluation of voice quality of N calls; n is greater than or equal to 1.

Description

Voice quality assessment method, device and system

Technical Field

Embodiments of the present application relate to the field of communications technologies, and in particular, to a method, an apparatus, and a system for evaluating voice quality.

Background

With the continuous development of communication technology, the requirements on the communication quality are higher and higher. In order to accurately evaluate the voice quality in the call process, the current industry mainly adopts a standard mean opinion score (mean opinion score, MOS) voice quality evaluation scheme and a non-MOS voice quality evaluation scheme to evaluate the voice quality of the terminal in the call.

The main principle of the standard MOS voice quality assessment scheme is as follows: the test computer controls the calling terminal to call the called terminal, when the call is connected, a section of preset standard corpus is played at the calling terminal side, the called terminal answers in real time and sends the answering content to the MOS box, and then the MOS box compares the received voice with the standard corpus in real time through the MOS algorithm model, so that the MOS score of the voice is calculated.

The main principle of the non-MOS voice quality assessment scheme is as follows: the test computer controls the calling terminal to call the called terminal, and in the voice call process of the calling terminal and the called terminal, the test computer carries out deep packet detection (deep packet inspection, DPI) through real-time transmission protocol (real-time transport protocol, RTP), calculates indexes such as packet loss rate, time delay, jitter and the like in the voice call, and then calculates the voice quality of the call according to the indexes such as the packet loss rate, the time delay, the jitter and the like by adopting an own algorithm.

It can be seen that for a standard MOS voice quality assessment scheme, a MOS box needs to be configured when assessing voice quality; resulting in a standard MOS voice quality assessment scheme with a significant purchase cost.

For a non-MOS voice quality evaluation scheme, although a MOS box is not needed in the implementation, the cost is low, and the non-MOS voice quality evaluation scheme generally adopts an own algorithm to replace a standard MOS algorithm; the accuracy of the speech quality assessment cannot be guaranteed.

Therefore, there is no voice quality assessment scheme with low cost while ensuring voice assessment accuracy.

Disclosure of Invention

The method, the device and the system for evaluating the voice quality can ensure the accuracy of voice evaluation and reduce the cost of voice quality evaluation.

The application adopts the following technical scheme:

in a first aspect, the present application provides a speech quality assessment method, which may be applied to a speech quality assessment server, the method may include: receiving record information of M long-term evolution voice bearing calls acquired by terminal equipment; the recording information of a call includes: call corpus of calls, on-time stamp of calls, release-time stamp of calls; m is greater than or equal to 1; inputting the recorded information of the M calls into a mean opinion value (mean opinion score, MOS) algorithm model, and calculating to obtain voice quality values of the M calls; the MOS algorithm model supports parallel calculation of voice quality values of N calls; n is greater than or equal to 1.

According to the voice quality evaluation method, the MOS algorithm model is directly integrated in the voice quality evaluation server, and after receiving the call record information collected by the terminal equipment, the voice quality evaluation server inputs the record information into the MOS algorithm model to calculate and obtain the voice quality value of the call. On one hand, as the MOS algorithm is still adopted to calculate the voice quality value, the method can meet the standard of the industry and ensure the accuracy of voice quality assessment; on the other hand, the cost of purchasing the MOS box is reduced, and the cost of voice quality evaluation is reduced.

With reference to the first aspect, in one possible implementation manner, if M is greater than N, N is greater than 1; inputting the record information of the M calls into the MOS algorithm model, and calculating to obtain voice quality values of the M calls, wherein the method can comprise the following steps: distributing the record information of M calls to N voice quality evaluation queues; and sequentially inputting the call record information included in the N voice quality evaluation queues into the MOS algorithm model in parallel, and calculating to obtain voice quality values of M calls. In this possible implementation manner, if the number of received recording information of the call is very large, for example, the number is greater than the number of parallel-calculated voice quality values supported by the voice quality evaluation server, the large number of voice quality values is calculated in a queuing manner; the processing efficiency of the voice quality assessment server is improved.

With reference to the first aspect or one of the foregoing possible implementation manners, in another possible implementation manner, inputting, in parallel, recording information of calls included in the N voice quality evaluation queues into the MOS algorithm model in sequence, and calculating to obtain voice quality values of M calls may include: for a first voice quality evaluation queue, serially inputting the record information of one or more calls in the first voice quality evaluation queue to a MOS algorithm model according to a preset sequence, and calculating to obtain voice quality values of the one or more calls in the first voice quality evaluation queue; the first speech quality assessment queue is any one of the N speech quality assessment queues. In the possible implementation manner, the voice quality value of the call in each queue is processed in series, and the preset sequence of the serial processing can be configured according to the actual requirement, so that the processing flexibility is improved, and the processing process is orderly and efficient.

With reference to the first aspect or any one of the foregoing possible implementation manners, in another possible implementation manner, the preset sequence includes: the priority of calls is in order from high to low. In this possible implementation, the speech quality values of the calls in each queue are processed in order of priority of the calls from high to low, so that calls with high priority are prioritized.

With reference to the first aspect or any one of the foregoing possible implementation manners, in another possible implementation manner, the call may include: a voice over long-term evolution (VOLTE) call; alternatively, the new air-to-voice bearer (voice over new radio, VONR) calls.

In a second aspect, the present application further provides a voice quality assessment apparatus, which may be the voice quality assessment server in the first aspect or any one of the possible implementation manners of the first aspect, or the apparatus may be deployed at the voice quality assessment server. The apparatus may include a receiving unit and a processing unit. Wherein:

the receiving unit can be used for receiving the record information of M long-term evolution voice bearing calls acquired by the terminal equipment; the recording information of a call includes: call corpus of calls, on-time stamp of calls, release-time stamp of calls; m is greater than or equal to 1.

The processing unit can input the record information of the M calls into the MOS algorithm model, and calculate the voice quality values of the M calls; the MOS algorithm model supports parallel calculation of voice quality values of N calls; n is greater than or equal to 1.

It should be noted that, the voice quality evaluation device provided in the second aspect is configured to perform the voice quality evaluation method provided in the first aspect or any one of the possible implementation manners of the first aspect, and the specific implementation manner may refer to the specific implementation manner of the first aspect, which is not described herein.

In a third aspect, the present application provides a speech quality assessment server, the device may comprise a processor for implementing the speech quality assessment method described in the first aspect above. The apparatus may further comprise a memory coupled to the processor, the processor being operable to implement the speech quality assessment method described above as the first aspect or any one of the possible implementations of the first aspect when executing instructions stored in the memory. The device may also include a communication interface for the apparatus to communicate with other devices, which may be, for example, a transceiver, circuit, bus, module, or other type of communication interface. In one possible implementation, the apparatus may include:

A memory, which may be used to store instructions.

The processor can be used for inputting the record information of the M calls into the MOS algorithm model, and calculating to obtain the voice quality values of the M calls; the MOS algorithm model supports parallel calculation of voice quality values of N calls; n is greater than or equal to 1.

The processor may be further configured to receive record information of M long term evolution voice bearer calls acquired by the terminal device; the recording information of a call includes: call corpus of calls, on-time stamp of calls, release-time stamp of calls; m is greater than or equal to 1.

It should be noted that, in the present application, the instructions in the memory may be stored in advance, or may be downloaded from the internet and then stored when the device is used, and the source of the instructions in the memory is not specifically limited in the present application. The coupling in the embodiments of the present application is an indirect coupling or connection between devices, units, or modules, which may be in electrical, mechanical, or other form for information interaction between the devices, units, or modules.

In a fourth aspect, a speech quality assessment system is provided, which may comprise speech quality assessment means, which may be the means of the second aspect or any one of the possible implementations of the second aspect, and a terminal device.

In a fifth aspect, a speech quality assessment system is provided, which may comprise a speech quality assessment server, which may be a device according to the third aspect or any one of the possible implementations of the third aspect, and a terminal device.

In a sixth aspect, embodiments of the present application further provide a computer readable storage medium, including instructions that, when executed on a computer, cause the computer to perform the method for evaluating speech quality according to any one of the above-mentioned or any one of the possible implementation manners.

In a seventh aspect, embodiments of the present application further provide a computer program product, which when run on a computer, causes the computer to perform the speech quality assessment method according to any one of the above aspects or any one of the possible implementation manners.

In an eighth aspect, an embodiment of the present application provides a chip system, where the chip system includes a processor and may further include a memory, where the processor is configured to implement a function executed by the voice quality assessment server in the foregoing method. The chip system may be formed of a chip or may include a chip and other discrete devices.

The solutions provided in the second to eighth aspects are used to implement the voice quality assessment method provided in the first aspect, so that the same beneficial effects as those in the first aspect can be achieved, and no further description is given here.

The various possible implementations of any of the foregoing aspects may be combined without contradiction between the schemes.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Wherein the lines in the figures represent only the communication possible between two devices. The specific communication mode can be wireless communication or wired communication; can be determined according to the actual situation.

Fig. 1 is a schematic structural diagram of a standard MOS voice quality evaluation scenario provided in the prior art;

fig. 2 is a schematic structural diagram of a non-MOS voice quality evaluation scenario provided in the prior art;

fig. 3 is a schematic structural diagram of a network architecture according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a voice quality evaluation server according to an embodiment of the present application;

Fig. 5 is a flow chart of a voice quality evaluation method according to an embodiment of the present application;

fig. 6 is a flowchart of another voice quality assessment method according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a voice quality evaluation device according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of another voice quality assessment server according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

In the embodiments of the present application, in order to facilitate the clear description of the technical solutions of the embodiments of the present application, the words "first", "second", etc. are used to distinguish the same item or similar items having substantially the same function and effect. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ. The technical features described in the first and second descriptions are not sequential or in order of magnitude.

In the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as examples, illustrations, or descriptions. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion that may be readily understood.

In the description of the present application, unless otherwise indicated, "/" means that the associated object is an "or" relationship, e.g., a/B may represent a or B; the term "and/or" in this application is merely an association relation describing an association object, and means that three kinds of relations may exist, for example, a and/or B may mean: there are three cases, a alone, a and B together, and B alone, wherein a, B may be singular or plural. Also, in the description of the present application, unless otherwise indicated, "a plurality" means two or more than two. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.

In the embodiments of the present application, at least one may also be described as one or more, and a plurality may be two, three, four or more, which is not limited in this application.

To facilitate understanding, an existing speech quality assessment scheme is first introduced.

A simple explanation of a standard MOS voice quality assessment scheme is first provided.

As shown in fig. 1, in the standard MOS voice quality evaluation system, a test computer and one or more sets of voice quality test devices are mainly included. The set of voice quality testing devices may include a calling terminal, a called terminal, and a MOS box. The calling terminal and the called terminal are respectively connected and communicated with the test computer; the MOS box is connected and communicated with a called terminal in the voice quality testing device where the MOS box is positioned; the calling terminal in the same set of voice quality testing means may communicate with the called terminal.

Specifically, the test computer sends an instruction 1 to the calling terminal; the calling terminal receives the instruction 1 and initiates a call to the called terminal according to the instruction of the instruction 1; the called terminal answers the call initiated by the calling terminal; after detecting that the called terminal answers, the calling terminal plays a pre-stored standard corpus in a call; the called terminal answers the call content in real time and sends the call content (call corpus) to the MOS box in real time; the MOS box receives the call corpus sent by the called terminal, compares the received call corpus with the standard corpus through an MOS algorithm model, and calculates an MOS value of the call; and the MOS box forwards the calculated MOS value of the call to the test computer through the called terminal.

A brief description will now be made of a non-MOS voice quality evaluation scheme.

As shown in fig. 2, in the non-MOS voice quality evaluation system, a test computer, a plurality of main terminals, and a plurality of called terminals are mainly included. The calling terminal and the called terminal can be respectively connected and communicated with the test computer, and the calling terminal can also be communicated with the called terminal.

Specifically, the calling terminal sends a service establishment request to the test computer to request a voice call with the called terminal, and when the test computer receives the service establishment request sent by the calling terminal and determines that the service type corresponding to the service establishment request is the voice call, on one hand, the test computer instructs the calling terminal to initiate the voice call to the called terminal; the called terminal answers; after detecting that the called terminal answers, the calling terminal takes the voice data after the call content is coded as an effective load, loads the effective load into the content part of a real-time transmission protocol (real-time transport protocol, RTP) packet, encapsulates the corresponding RTP packet header and transmits the RTP packet header to the called terminal, and after receiving the data, the called terminal carries out protocol analysis and data decoding to restore the voice content (call corpus). On the other hand, the test computer respectively counts the packet loss rate of RTP packets sent by the calling terminal in the call process of the calling terminal and the called terminal; and evaluating the voice quality of the call according to the packet loss rate of the RTP packet by adopting an own algorithm.

As can be seen from the above two schemes, for the standard MOS voice quality evaluation scheme, a MOS box needs to be configured when evaluating voice quality; the standard MOS voice quality assessment scheme has larger purchase cost because the purchase cost of the MOS box is high and the authorization cost of the standard MOS algorithm is high; and, the more the number of terminal groups that need to evaluate voice quality, the higher the cost. For example, if a test team in each province of 31 provinces carries two sets of MOS boxes, it is necessary to purchase 62 MOS boxes, and the purchase cost is very high, assuming that the voice quality of the nationwide call needs to be evaluated.

For the non-MOS voice quality evaluation scheme, the implementation does not need a MOS box, but the cost is low, and the non-MOS voice quality evaluation scheme generally adopts an own algorithm to replace a standard MOS algorithm. Thus, on one hand, because the industry standard is MOS algorithm, the self algorithm is difficult to perform industry standard matching; on the other hand, the accuracy of the self-contained algorithm on the voice quality assessment is not strictly demonstrated, and the accuracy of the voice quality assessment cannot be guaranteed.

Based on the above, the application provides a voice quality evaluation method, which directly integrates the MOS algorithm model into a voice quality evaluation server, and after receiving the call record information collected by the terminal equipment, the voice quality evaluation server inputs the record information into the MOS algorithm model to calculate and obtain the voice quality value of the call. On one hand, as the MOS algorithm is still adopted to calculate the voice quality value, the method can meet the standard of the industry and ensure the accuracy of voice quality assessment; on the other hand, the cost of purchasing the MOS box is reduced, and the cost of voice quality evaluation is reduced.

In order to facilitate understanding of the implementation process of the scheme in the embodiment of the present application, a network architecture in the embodiment of the present application is first described. The voice quality assessment method in the embodiment of the application can be applied to the following network architecture.

It should be noted that, the network architecture and the scenario are for more clearly describing the technical solution of the embodiment of the present application, and do not constitute a limitation to the technical solution provided by the embodiment of the present application, and those skilled in the art can know that, with the evolution of the network architecture and the appearance of the new service scenario, the technical solution provided by the embodiment of the present application is equally applicable to similar architectures and scenarios.

As shown in fig. 3, a schematic diagram of a network architecture is provided. As shown in fig. 3, the voice quality assessment system 30 may include one or more calling devices 301, one or more called devices 302, and a voice quality assessment server 303. Wherein one or more calling devices 301 may communicate with a voice quality assessment server 303; one or more called devices 302 may communicate with a voice quality assessment server 303; calling device 301 may communicate with called device 302.

Specifically, the calling device 301 may also be referred to as a calling terminal, or a calling terminal device. Calling device 301 may be configured to communicate with voice quality assessment server 303; calling device 301 may also be used to communicate with called device 302. For example, calling device 301 may receive an instruction sent by voice quality assessment server 303; calling device 301 may also be used to talk to called device 302. Calling device 301 may include, but is not limited to, a mobile phone, a tablet computer, a wearable device (e.g., a smart watch, a smart bracelet), and other devices with voice call functionality.

The called device 302, which may also be referred to as a called terminal, or called terminal device. The called device 302 may be used to communicate with a voice quality assessment server 303; the called device 302 may also be used to communicate with the calling device 301. For example, the called device 302 may send the record information of the call it collects to the voice quality assessment server 303; the called device 302 may also be used to talk to the calling device 301. The called device 302 may include, but is not limited to, a mobile phone (mobile phone), a tablet computer (tablet computer), a wearable device (such as a smart watch, a smart bracelet), and other devices with voice call functions.

It will be appreciated that the calling device is merely a functional description of the terminal device and should not constitute a unique definition of the terminal device. One terminal device may be either a calling device or a called device. For example, one terminal device may be a calling device at a first time and a called device at a second time.

A voice quality assessment server 303 may be used to communicate with the calling device 301, the called device 302. Illustratively, the voice quality assessment server 303 may be configured to receive recorded information of a call collected by the called device 302; the voice quality assessment server 303 may also be used to send instructions to the calling device 301. The voice quality evaluation server 303 may be various physical servers, or a cloud server, etc.

It should be noted that, in the embodiments of the present application, the number, connection manner, and the like of each device included in the network architecture are not specifically limited; the network architecture shown in fig. 3 is merely an exemplary architecture diagram.

The implementation of the examples of the present application will be described in detail below with reference to the accompanying drawings.

In one aspect, an embodiment of the present application provides a voice quality evaluation device, configured to execute the voice quality evaluation method provided by the present application. The voice quality assessment means may be the voice quality assessment server 303 of fig. 3; alternatively, the voice quality assessment apparatus may be deployed at the voice quality assessment server 303 of fig. 3; alternatively, the voice quality assessment apparatus may be another device that can interact with the voice quality assessment server 303 of fig. 3.

Fig. 4 is a schematic structural diagram of a voice quality assessment server according to an embodiment of the present application, and as shown in fig. 4, the voice quality assessment server 40 may include at least one processor 41, a memory 42, a communication interface 43, and a communication bus 44. The following describes each constituent element of the voice quality evaluation server 40 specifically with reference to fig. 4:

the processor 41 may be one processor or may be a collective term of a plurality of processing elements. For example, processor 41 is a central processing unit (central processing unit, CPU), may be an integrated circuit specific (application specific integrated circuit, ASIC), or may be one or more integrated circuits configured to implement embodiments of the present application, such as: one or more microprocessors (digital signal processor, DSPs), or one or more field programmable gate arrays (field programmable gate array, FPGAs).

Among other things, the processor 41 may perform various functions by running or executing software programs stored in the memory 42 and invoking data stored in the memory 42. In a particular implementation, processor 41 may include one or more CPUs, such as CPU0 and CPU1 shown in FIG. 4, as an embodiment.

In a specific implementation, the speech quality assessment server 40 may comprise a plurality of processors, such as processor 41 and processor 45 shown in fig. 4, as an example. Each of these processors may be a single-core processor (single-CPU) or a multi-core processor (multi-CPU). A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).

The memory 42 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (random access memory, RAM) or other type of dynamic storage device that can store information and instructions, or an electrically erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM), a compact disc (compact disc read-only memory) or other optical disk storage, optical disk storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory 42 may be stand alone and be coupled to the processor 41 via a communication bus 44. Memory 42 may also be integrated with processor 41. The memory 42 is used for storing a software program for executing the scheme of the application, and is controlled by the processor 41 to execute.

The communication interface 43 uses any transceiver-like means for communicating with other devices or communication networks, such as ethernet, radio access network (radio access network, RAN), wireless local area network (wireless local area networks, WLAN), etc.

Communication bus 44 may be an industry standard architecture (industry standard architecture, ISA) bus, an external device interconnect (peripheral component, PCI) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in fig. 4, but not only one bus or one type of bus.

It should be noted that the components shown in fig. 4 do not constitute a limitation of the speech quality assessment server, and that the speech quality assessment server may comprise more or less components than shown in fig. 4, or some components in combination, or a different arrangement of components.

Specifically, the processor 41 performs the following functions by running or executing software programs and/or modules stored in the memory 42, and invoking data stored in the memory 42:

Receiving recording information of M calls acquired by terminal equipment; the recording information of a call includes: call corpus of calls, on-time stamp of calls, release-time stamp of calls; m is greater than or equal to 1; inputting the record information of the M calls into a MOS algorithm model, and calculating to obtain voice quality values of the M calls; the MOS algorithm model supports parallel evaluation of voice quality of N calls; n is greater than or equal to 1.

On the other hand, the embodiment of the present application provides a voice quality evaluation method, which can be applied to the voice quality evaluation server 40 shown in fig. 4, for evaluating the voice quality of a call.

Wherein the call may include, but is not limited to: VOLTE calls; alternatively, the VONR calls; or other call.

Note that, the voice quality evaluation server 40 runs a MOS algorithm model that supports parallel evaluation of voice quality of N calls.

Wherein N is greater than or equal to 1.

The MOS algorithm model may include, but is not limited to, a perceptual objective hearing quality assessment (perceptual objective listening quality analysis, POLQA) algorithm, or a speech quality perceptual assessment (perceptual evaluation of speech quality, PESQ) algorithm.

Illustratively, the voice quality assessment server 40 has a POLQA algorithm installed therein and purchases a parallel process authorizing the mplqa algorithm to support N inputs; i.e., the MPOLQA algorithm supports parallel assessment of the voice quality of N calls.

Specifically, when the voice quality evaluation method provided by the embodiment of the application is adopted to calculate the voice quality of a call, firstly, the voice quality evaluation server instructs each terminal device to collect the recording information of the call, then the recording information of the call collected by each terminal is sent to the voice quality evaluation server, and the voice quality evaluation server calculates the voice quality of the call according to the collected recording information of the call.

The process of the voice quality evaluation server instructing each terminal device to collect the recording information of the call is described in S503 to S510 below, which is not described herein.

As shown in fig. 5, the method may include:

s501, the voice quality evaluation server receives record information of M calls collected by the terminal equipment.

Wherein M is greater than or equal to 1.

The recording information of one call may include: call corpus of calls, on-time stamp of calls, and release-time stamp of calls.

Specifically, the voice quality evaluation server receives the record information of the M calls collected by the terminal device through offline copying or network transmission or other transmission modes.

One terminal device can collect one or more call record information.

In one possible implementation, when M is equal to 1, the record information of the M calls is collected by one terminal device.

In another possible implementation manner, when M is greater than 1, the record information of M calls may be collected by one terminal device or may be collected by a plurality of terminal devices.

S502, the voice quality assessment server inputs the record information of the M calls into a MOS algorithm model, and calculates the voice quality values of the M calls.

Implementation of S502 may include, but is not limited to, method 1 or method 2 described below.

Specifically, when M is less than or equal to N, the voice quality values of M calls can be obtained by calculation using method 1; when M is greater than N, the method 2 can be used to calculate the voice quality values of M calls.

The method 1 includes that a voice quality assessment server inputs recording information of M calls into a MOS algorithm model supporting parallel assessment of N calls, the MOS algorithm model processes the recording information of the M calls in parallel, corpus of call corpus in each call recording information between a connection time stamp and a release time stamp is obtained to be tested corpus of the call, the corpus to be tested of each call is compared with standard corpus, and voice quality values of the M calls are obtained through calculation.

The method 2, the voice quality assessment server distributes the record information of M calls to N voice quality assessment queues; and sequentially inputting the call record information included in the N voice quality evaluation queues into the MOS algorithm model in parallel, and calculating voice quality values of M calls.

In the method 2, for the first speech quality evaluation queue, the calculating the speech quality value of the first speech quality evaluation queue including the call may be implemented as: according to a preset sequence, the recorded information of the first voice quality evaluation queue comprising one or more calls is serially input into the MOS algorithm model, and the voice quality values of the one or more calls in the first voice quality evaluation queue are calculated.

Wherein the first speech quality assessment queue is any one of the N speech quality assessment queues.

Specifically, the MOS algorithm model obtains the corpus of the first call in the queue between the on time stamp and the release time stamp as the corpus to be tested of the call, compares the corpus to be tested of the call with the standard corpus, and calculates to obtain the voice quality value of the call; then, the voice quality value of the next call in the queue is acquired by adopting the same mode; and polling calculation is carried out in sequence to obtain a first voice quality evaluation queue comprising voice quality values of each call.

The preset sequence may include, but is not limited to: the priority of calls is from high to low; alternatively, the talk time may be in an early to late order, or other order.

According to the voice quality evaluation method, the MOS algorithm model is directly integrated in the voice quality evaluation server, and after receiving call record information acquired by the terminal equipment, the voice quality evaluation server inputs the record information into the MOS algorithm model module to calculate and obtain a call voice quality value. On one hand, as the MOS algorithm model is still adopted to calculate the voice quality value, the method can meet the standard of the industry and ensure the accuracy of voice quality assessment; on the other hand, the cost of purchasing the MOS box is reduced, and the cost of voice quality evaluation is reduced.

Taking the acquisition of the recording information of the call between the first terminal and the second terminal as an example, the process of the voice quality evaluation server indicating each terminal device to acquire the recording information of the call will be described.

As shown in fig. 6, the process may include S503 to S510 described below.

S503, the voice quality assessment server configures standard corpus.

In one possible implementation, the speech quality assessment server configures the corpus input by the user as a standard corpus.

For example, the speech quality evaluation server receives a corpus input by a user through a usb disk or the like, and saves the corpus as a standard corpus.

In another possible implementation, the speech quality assessment server pre-stores a plurality of corpora, and selects the standard corpora based on a first operation of the user.

For example, the speech quality evaluation server stores a plurality of corpora in advance, and the user selects a standard corpus by using the identification or the file name of a certain clicked corpus.

Optionally, the voice quality assessment server may also configure other parameters of the MOS algorithm model. Other parameters of the MOS algorithm model may include, but are not limited to, one or more of the following: bandwidth information, coding scheme.

Illustratively, the bandwidth information is configured to 12.2 kilohertz (kilo Heinrich Rudolf Hertz, kHz) based on user input, and the coding scheme is configured to adaptive Multi-Rate coding (AMR).

If no other parameters of the MOS algorithm model are configured, the system default parameters are adopted to perform the evaluation calculation.

S504, the voice quality assessment server sends a first instruction to the first terminal equipment.

The first terminal equipment is any calling equipment.

In one possible implementation, S504 may be implemented as: the voice quality assessment server sends a first instruction to the first terminal device to instruct the first terminal device to initiate a call to the second terminal device.

It should be noted that, when the speech quality evaluation server and the first terminal device negotiate in advance and save the standard corpus, in other words, when the first terminal device can determine the standard corpus, the speech quality evaluation server only sends the first instruction to the first terminal device.

S505, the first terminal equipment receives a first instruction sent by the voice quality evaluation server.

Wherein, the first instruction received by the first terminal device in S505 is the first instruction sent by the voice quality assessment server in S505.

Optionally, when the first terminal cannot determine the standard corpus, the voice quality evaluation method provided in the embodiment of the present application may further include S504A and S505A.

S504A, the voice quality evaluation server sends the standard corpus to the first terminal equipment or sends the identification of the standard corpus so that the first terminal equipment can determine the standard corpus.

The execution timing of S504A may be configured according to actual requirements, which is not limited in the embodiment of the present application.

For example, S504A may be performed after S504, or S504A may be performed before S504, or simultaneously with S504.

S504A may be implemented as: the voice quality evaluation server adopts a network transmission or offline copying mode to send the standard corpus to the first terminal equipment, or adopts a network transmission or offline copying mode to send the identification of the standard corpus, so that the first terminal equipment determines the standard corpus.

S505A, the first terminal equipment receives standard corpus transmitted by the voice quality evaluation server or identification of the standard corpus.

Wherein, the content received by the first terminal device in S505A is the content sent by the voice quality assessment server in S504A.

S506, the first terminal equipment initiates a call to the second terminal.

Specifically, the first terminal device initiates a call to the second terminal according to the indication of the first instruction.

Illustratively, the first terminal dials the call number of the first terminal device according to the indication of the first instruction, and initiates the call.

S507, the second terminal equipment is connected with the call initiated by the first terminal equipment.

Illustratively, the user clicks or slides the relevant location in the second terminal device, and the second terminal device responds to the clicking or sliding operation of the user to switch on the call initiated by the first terminal device.

S508, the first terminal device transmits the standard corpus to the second terminal device through conversation.

Specifically, the first terminal device encodes the standard corpus and then uses the encoded corpus as a payload, loads the payload into the content part of the RTP protocol packet, encapsulates the RTP packet header and transmits the encapsulated RTP packet header to the second terminal device.

The specific coding mode may be configured according to actual requirements, which is not specifically limited in the embodiments of the present application. For example, the coding scheme may be AMR.

S509, the second terminal equipment collects and stores call record information.

Specifically, the second terminal device collects the full RTP frame sent by the first terminal device as the received corpus, and collects the on-time stamp of the call and the release time stamp (hang-up time stamp) of the call; and the second terminal equipment is used as recording information of the call.

For example, the recording information of the call may be stored in the form of table 1.

TABLE 1

It should be noted that table 1 is only an exemplary format for storing call log information, and should not be construed as a unique limitation of the stored format.

Optionally, the record information of the call may further include priority of the call.

For example, the call log information may also be stored in table 2.

TABLE 2

It should be noted that table 2 is only an exemplary format for storing the recording information of the call, and should not be construed as a unique limitation to the stored format.

For example, the log information of the call may be saved in the form of a log (log) file.

S510, the second terminal equipment sends call record information to the voice quality evaluation server.

Specifically, the second terminal device may send the recording information of the call to the voice quality assessment server through network transmission or offline copying or other modes.

It can be understood that, if the voice quality of a plurality of calls needs to be evaluated, S503 to S510 are executed multiple times to obtain the record information of the plurality of calls, and then the record information of the plurality of calls is sent to the voice quality evaluation server through network transmission or offline copying, so that the voice quality evaluation server calculates the voice quality values of the plurality of calls.

The voice quality assessment method provided by the application is briefly described below through specific embodiments.

Assuming that the voice quality of the conversation in 31 provincial cities needs to be tested, and each provincial is provided with one group of test teams, the 31 groups of test teams respectively obtain 31 log files through the methods from S503 to S510; a log file includes log information of a call. The 31 test teams then copy 31 log files to the voice quality assessment server for assessment of voice quality.

The voice quality evaluation server is used for supporting the parallel evaluation of the voice quality of 5 calls and the input of the record information of the parallel 5 calls.

Specifically, the voice quality evaluation server distributes the record information of 31 calls in 31 log files to 5 voice quality evaluation queues, respectively. The voice quality evaluation queue 1 comprises recording information of a call 1, recording information of a call 2, recording information of a call 3, recording information of a call 4, recording information of a call 5, recording information of a call 6 and recording information of a call 7; the voice quality evaluation queue 2 comprises recording information of a call 8, recording information of a call 9, recording information of a call 10, recording information of a call 11, recording information of a call 12 and recording information of a call 13; the voice quality evaluation queue 3 includes recording information of the call 14, recording information of the call 15, recording information of the call 16, recording information of the call 17, recording information of the call 18, recording information of the call 19; the voice quality evaluation queue 4 includes recording information of the call 20, recording information of the call 21, recording information of the call 22, recording information of the call 23, recording information of the call 24, recording information of the call 25; the speech quality assessment queue 5 includes recorded information for call 26, recorded information for call 27, recorded information for call 28, recorded information for call 29, recorded information for call 30, and recorded information for call 31.

The voice quality evaluation server sequentially inputs the record information of the call 1 into the input channel of the MOS algorithm model corresponding to the voice quality evaluation queue 1 in parallel; inputting the record information of the call 8 into an input channel of a MOS algorithm model corresponding to the voice quality evaluation queue 2; inputting the record information of the call 14 into an input channel of a MOS algorithm model corresponding to the voice quality evaluation queue 3; inputting the record information of the call 20 into an input channel of a MOS algorithm model corresponding to the voice quality evaluation queue 4; inputting the record information of the call 26 into an input channel of the MOS algorithm model corresponding to the voice quality evaluation queue 5; the MOS algorithm model calculates in parallel the speech quality value for call 1, the speech quality value for call 8, the speech quality value for call 14, the speech quality value for call 20, and the speech quality value for call 26.

Then, the voice quality evaluation server respectively inputs the recording information of the next call of the 5 voice quality evaluation queues to 5 input channels of the MOS algorithm model, calculates in parallel, polls in sequence, and calculates voice quality values of 31 calls; further, the average value of the voice quality values of the calls in the province of China 31 is outputted.

The scheme provided by the embodiment of the invention is mainly introduced from the perspective of the implementation principle of interaction between the voice quality evaluation server and the terminal equipment in the network. It will be appreciated that the voice quality assessment server, in order to achieve the above-described functions, comprises corresponding hardware structures and/or software modules that perform the respective functions. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The embodiment of the invention can divide the functional modules of the voice quality evaluation device and the like according to the method example, for example, each functional module can be divided corresponding to each function, and two or more functions can be integrated in one processing module. The integrated modules may be implemented in hardware or in software functional modules. It should be noted that, in the embodiment of the present invention, the division of the modules is schematic, which is merely a logic function division, and other division manners may be implemented in actual implementation.

Fig. 7 shows a voice quality assessment apparatus 70 provided in the embodiment of the present application, in the case where respective functional modules are divided by corresponding respective functions, for realizing the functions of the voice quality assessment server in the above embodiment. The voice quality assessment means 70 may be a voice quality assessment server; alternatively, the voice quality assessment device 70 may be deployed at a voice quality assessment server. As shown in fig. 6, the voice quality evaluation device 70 may include: a receiving unit 701 and a processing unit 702. The receiving unit 701 is configured to perform S501 in fig. 5 or 6; the processing unit 702 is configured to execute S502 in fig. 5 or 6. All relevant contents of each step related to the above method embodiment may be cited to the functional description of the corresponding functional module, which is not described herein.

In the case of using an integrated unit, fig. 8 shows a voice quality assessment server 80 provided in an embodiment of the present application, for implementing the function of the voice quality assessment server in the above method. The speech quality assessment server 80 may comprise at least one processing module 801 for implementing the functionality of the speech quality assessment server in the embodiments of the present application. For example, the processing module 801 may be used to perform the process S502 in fig. 5 or fig. 6, specifically refer to the detailed description in the method example, which is not described herein.

The speech quality assessment server 80 may also include at least one memory module 802 for storing program instructions and/or data. The memory module 802 is coupled to the processing module 801. The coupling in the embodiments of the present application is an indirect coupling or communication connection between devices, units, or modules, which may be in electrical, mechanical, or other forms for information interaction between the devices, units, or modules. The processing module 801 may cooperate with the storage module 802. The processing module 801 may execute program instructions stored in the storage module 802. At least one of the at least one memory module may be included in the processing module.

The speech quality assessment server 80 may also include a communication module 803 for communicating with other devices via a transmission medium for determining that the speech quality assessment server 80 may communicate with other devices. The communication module 803 is used for the device to communicate with other devices. Illustratively, the processing module 801 may perform the process S501 of fig. 5 or 6 using the communication module 803.

In actual implementation, the receiving unit 701 and the processing unit 702 may be implemented by the processor 41 shown in fig. 4 calling the program code in the memory 42. Alternatively, the processor 41 shown in fig. 4 may be implemented through the communication interface 43, and the specific implementation procedure may refer to the description of the voice quality assessment method part shown in fig. 5 or fig. 6, which is not repeated herein.

As described above, the voice quality evaluation device 70 or the voice quality evaluation server 80 provided in the embodiments of the present application may be used to implement the functions of the voice quality evaluation server 80 device in the method implemented in the embodiments of the present application, and for convenience of explanation, only the portions relevant to the embodiments of the present application are shown, and specific technical details are not disclosed, which refer to the embodiments of the present application.

Further embodiments of the present application provide a voice quality assessment system, where a voice quality assessment device and a terminal device may be included in the system, where the voice quality assessment device may implement the function of the voice quality assessment server in the foregoing embodiments, for example, the voice quality assessment device may be the voice quality assessment server described in the embodiments of the present application.

Further embodiments of the present application provide a chip system, which includes a processor and may further include a memory, for implementing the functions of the voice quality assessment server in the embodiments shown in fig. 5 or fig. 6. The chip system may be formed of a chip or may include a chip and other discrete devices.

Still further embodiments of the present application provide a computer-readable storage medium, which may include a computer program that, when executed on a computer, causes the computer to perform the steps performed by the speech quality assessment server in the embodiments of fig. 5 or 6 described above.

Still further embodiments of the present application provide a computer program product comprising a computer program which, when run on a computer, causes the computer to perform the steps performed by the speech quality assessment server in the embodiments of fig. 5 or 6 described above.

From the foregoing description of the embodiments, it will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of functional modules is illustrated, and in practical application, the above-described functional allocation may be implemented by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to implement all or part of the functions described above.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another apparatus, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and the parts displayed as units may be one physical unit or a plurality of physical units, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application may be essentially or a part contributing to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions for causing a device (may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely a specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A voice quality assessment method, the method being applied to a voice quality assessment server, the method comprising:

receiving recording information of M calls acquired by terminal equipment; the recording information of the call comprises: the call corpus of the call, the call connection time stamp and the call release time stamp; m is greater than or equal to 1;

inputting the record information of M calls into a mean opinion score MOS algorithm model, and calculating to obtain voice quality values of the M calls; the MOS algorithm model supports parallel calculation of voice quality values of N calls; the N is greater than or equal to 1;

wherein, if the M is greater than the N, the N is greater than 1; inputting the recorded information of the M calls into a MOS algorithm model, and calculating to obtain voice quality values of the M calls, including:

Distributing the record information of M calls to N voice quality evaluation queues;

and inputting the call record information included in the N voice quality evaluation queues into an MOS algorithm model in sequence, and calculating to obtain M voice quality values of the calls.

2. The method of claim 1, wherein the step of inputting the recording information of the calls included in the N voice quality evaluation queues into a MOS algorithm model sequentially in parallel, and calculating to obtain the voice quality values of the M calls includes:

for a first voice quality evaluation queue, serially inputting the record information of one or more calls in the first voice quality evaluation queue into a MOS algorithm model according to a preset sequence, and calculating to obtain voice quality values of the one or more calls in the first voice quality evaluation queue; the first speech quality assessment queue is any one of the N speech quality assessment queues.

3. The method of claim 2, wherein the predetermined sequence comprises: the priority of calls is in order from high to low.

4. A voice quality assessment apparatus, the apparatus deployed at a voice quality assessment server, the apparatus comprising:

The receiving unit is used for receiving the record information of the M calls acquired by the terminal equipment; the recording information of the call comprises: the call corpus of the call, the call connection time stamp and the call release time stamp; m is greater than or equal to 1;

the processing unit is used for inputting the record information of the M calls into the average opinion score MOS algorithm model, and calculating to obtain voice quality values of the M calls; the MOS algorithm model supports parallel calculation of voice quality values of N calls; the N is greater than or equal to 1;

wherein, if the M is greater than the N, the N is greater than 1; the processing unit is specifically configured to:

5. The apparatus of claim 4, wherein the step of inputting the recording information of the calls included in the N voice quality evaluation queues into a MOS algorithm model sequentially in parallel, and calculating to obtain the voice quality values of M calls includes:

6. The apparatus of claim 5, wherein the predetermined sequence comprises: the priority of calls is in order from high to low.

7. A voice quality assessment server, characterized in that the voice quality assessment server comprises: a processor, a memory; the processor and the memory being coupled, the memory being for storing computer program code comprising computer instructions which, when executed by the speech quality assessment server, cause the speech quality assessment server to perform the speech quality assessment method according to any one of claims 1-3.

8. A computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the speech quality assessment method according to any one of claims 1-3.

9. A speech quality assessment system, characterized in that the system comprises a speech quality assessment server and a terminal device; wherein the speech quality assessment server is adapted to perform the speech quality assessment method of any of the preceding claims 1-3.