CN117292711A

CN117292711A - Voice equipment testing method, device and equipment

Info

Publication number: CN117292711A
Application number: CN202210690285.0A
Authority: CN
Inventors: 王哲; 李航; 周洪亮
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2022-06-17
Filing date: 2022-06-17
Publication date: 2023-12-26

Abstract

The embodiment of the application provides a test method, a test device and test equipment for voice equipment, and relates to the technical field of artificial intelligence. The method comprises the following steps: the test equipment generates a plurality of text instructions according to at least one instruction template corresponding to the voice equipment, converts the plurality of text instructions into a plurality of voice instructions, plays the plurality of voice instructions to the voice equipment, acquires execution information of each voice instruction from the voice equipment or a server corresponding to the voice equipment, and the execution information corresponding to each voice instruction comprises at least one of the following: the response result of the voice equipment to the voice instruction, the response time delay of the voice equipment to the voice instruction and the resource occupation information when the voice equipment executes the voice instruction are response or non-response, and further, the stability test result of the voice equipment is determined according to the execution information corresponding to the voice instructions. Through the process, the stability test of the voice equipment is automatically carried out by the test equipment, and the test efficiency is improved.

Description

Voice equipment testing method, device and equipment

Technical Field

The embodiment of the application relates to the technical field of artificial intelligence, in particular to a test method, a test device and test equipment for voice equipment.

Background

With the development of technology, more and more voice devices are applied to people's lives. The user can interact with the voice equipment in a voice mode, so that the user experience is improved.

Before the speech equipment leaves the factory, stability test is required to be carried out on the speech equipment. In the stability test of the voice equipment, a manual test mode is generally adopted. And the tester sends out a voice command to the voice equipment and observes the response condition of the voice equipment to the voice command. The tester repeatedly executes the test process, and determines whether the stability test is passed or not according to the observation results of the multiple tests.

However, the inventor finds that the test process is time-consuming and labor-consuming and has low test efficiency in the process of realizing the application.

Disclosure of Invention

The embodiment of the application provides a test method, a test device and test equipment for voice equipment, which are used for improving the test efficiency of the voice equipment.

In a first aspect, an embodiment of the present application provides a method for testing a voice device, which is applied to a testing device, where the method includes:

generating a plurality of text instructions according to at least one instruction template corresponding to the voice equipment;

converting the plurality of text instructions into a plurality of voice instructions, and playing the plurality of voice instructions to the voice equipment;

Acquiring execution information corresponding to each of the plurality of voice instructions from the voice equipment or a server corresponding to the voice equipment, wherein the execution information corresponding to each voice instruction comprises at least one of the following: the response result of the voice equipment to the voice instruction, the response time delay of the voice equipment to the voice instruction and the resource occupation information when the voice equipment executes the voice instruction are obtained, wherein the response result is response or non-response;

and determining a stability test result of the voice equipment according to the execution information corresponding to each of the voice instructions.

In one possible implementation manner, the execution information corresponding to each voice instruction includes: the response result of the voice equipment to the voice instruction, the response time delay of the voice equipment to the voice instruction and the resource occupation information when the voice equipment executes the voice instruction;

determining a stability test result of the voice equipment according to the execution information corresponding to each of the voice instructions, wherein the stability test result comprises:

according to the response result of the voice equipment to each voice instruction, determining that the response result is a first number of the voice instructions with response; determining the response rate of the voice equipment according to the first number and the total number of the voice instructions;

Determining the average response time delay of the voice equipment according to the response time delay of the voice equipment to each voice instruction;

determining the average resource occupancy rate of the voice equipment according to the resource occupancy information when the voice equipment executes each voice instruction;

and determining a stability test result of the voice equipment according to the response rate, the average response time delay and the average resource occupancy rate.

In a possible implementation manner, determining a stability test result of the voice device according to the response rate, the average response delay and the average resource occupancy, includes:

if the response rate is larger than or equal to a preset response rate, the average response time delay is smaller than or equal to a preset response time delay, and the average resource occupancy rate is smaller than or equal to a preset resource occupancy rate, determining that a stability test result of the voice equipment is test passing;

if at least one of the following is satisfied: and if the response rate is smaller than the preset response rate, the average response time delay is larger than the preset response time delay, and the average resource occupancy rate is larger than the preset occupancy rate, determining that the stability test result of the voice equipment is that the test fails.

In a possible implementation manner, generating a plurality of text instructions according to at least one instruction template corresponding to the voice device includes:

analyzing each instruction template to obtain the identification of a plurality of instruction units, the attributes of the plurality of instruction units and the sequence among the plurality of instruction units; the attributes of each instruction unit include optional or mandatory;

acquiring at least one candidate keyword corresponding to each of the plurality of instruction units according to the identifiers of the plurality of instruction units;

and generating a plurality of text instructions according to the attributes of the plurality of instruction units, the sequence among the plurality of instruction units and at least one candidate keyword corresponding to each of the plurality of instruction units.

In a possible implementation manner, generating a plurality of text instructions according to attributes of the plurality of instruction units, an order among the plurality of instruction units, and at least one candidate keyword corresponding to each of the plurality of instruction units includes:

generating a plurality of different keyword combinations according to at least one candidate keyword corresponding to each of the plurality of instruction units; each keyword combination comprises: one candidate keyword corresponding to each of the plurality of instruction units;

And aiming at each keyword combination, according to the attributes of the plurality of instruction units and the sequence among the plurality of instruction units, performing splicing processing on each candidate keyword in the keyword combination to obtain a plurality of text instructions.

In a possible implementation manner, before playing the plurality of voice instructions to the voice device, the method further includes:

acquiring a preset environmental noise model;

and carrying out noise adding processing on the voice instructions through the environment noise model so as to obtain a plurality of voice instructions containing noise.

In a possible implementation manner, playing the plurality of voice instructions to the voice device includes:

playing an ith voice instruction to the voice equipment;

starting a first timer, wherein the duration of the first timer is a first preset duration or a random duration;

after the first timer is finished, returning to execute the i-th voice instruction played to the voice equipment;

and i is 1, 2, 3, … … and N in sequence, wherein N is the number of the voice instructions.

In one possible implementation, each voice instruction includes: wake-up statements and command statements; playing an ith voice instruction to the voice equipment, including:

Playing the wake-up statement in the ith voice instruction to the voice equipment;

starting a second timer, wherein the duration of the second timer is a second preset duration or a random duration;

and after the second timer is finished, playing the command statement in the ith voice instruction to the voice equipment.

and controlling the simulation nozzle device to play the voice instructions to the voice equipment.

In a possible implementation manner, obtaining execution information corresponding to each of the plurality of voice instructions includes:

determining a start playing time and an end playing time corresponding to the voice commands;

transmitting request information to a server, the request information comprising: the starting playing time, the ending playing time and the identification of the voice equipment;

and receiving execution information corresponding to each of the voice instructions from the server.

In a possible implementation manner, after determining the stability test result of the voice device according to the execution information corresponding to each of the plurality of voice instructions, the method further includes:

Displaying the stability test result; or,

and sending the stability test result to preset equipment.

In a second aspect, an embodiment of the present application provides a test apparatus for a voice device, deployed in a test device, where the apparatus includes:

the generation module is used for generating a plurality of text instructions according to at least one instruction template corresponding to the voice equipment;

the processing module is used for converting the text instructions into voice instructions;

the playing module is used for playing the voice instructions to the voice equipment;

the acquisition module is configured to acquire execution information corresponding to each of the plurality of voice instructions from the voice device or a server corresponding to the voice device, where the execution information corresponding to each voice instruction includes at least one of the following: the response result of the voice equipment to the voice instruction, the response time delay of the voice equipment to the voice instruction and the resource occupation information when the voice equipment executes the voice instruction are obtained, wherein the response result is response or non-response;

and the determining module is used for determining a stability test result of the voice equipment according to the execution information corresponding to each of the voice instructions.

In a third aspect, an embodiment of the present application provides an electronic device, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores a computer program for execution by the at least one processor to implement the method of any of the first aspects.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having a computer program stored therein, the computer program being executed by the at least one processor to implement the method according to any of the first aspects.

In a fifth aspect, embodiments of the present application provide a computer program product comprising a computer program which, when executed by a processor, implements a method according to any of the first aspects.

The embodiment of the application provides a test method, a test device and test equipment of voice equipment, wherein the test equipment generates a plurality of text instructions according to at least one instruction template corresponding to the voice equipment, converts the plurality of text instructions into a plurality of voice instructions, plays the plurality of voice instructions to the voice equipment, acquires execution information of each voice instruction from the voice equipment or a server corresponding to the voice equipment, and the execution information corresponding to each voice instruction comprises at least one of the following steps: the response result of the voice equipment to the voice instruction, the response time delay of the voice equipment to the voice instruction and the resource occupation information when the voice equipment executes the voice instruction are response or non-response, and further, the stability test result of the voice equipment is determined according to the execution information corresponding to the voice instructions. Through the process, the stability test of the voice equipment is automatically carried out by the test equipment, and the test efficiency is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive faculty for a person skilled in the art.

Fig. 1 is a schematic diagram of a test scenario provided in an embodiment of the present application;

fig. 2 is a flow chart of a testing method of a voice device according to an embodiment of the present application;

fig. 3 is a flow chart of another testing method of a voice device according to an embodiment of the present application;

FIG. 4 is a schematic diagram of another test scenario provided in an embodiment of the present application;

fig. 5 is a schematic structural diagram of a test device of a voice device according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims of this application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be capable of operation in sequences other than those illustrated or described herein, for example. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The embodiment of the application provides a test method, a test device and test equipment for voice equipment, which relate to the technical field of test and are used for automatically testing the stability of the voice equipment.

In the embodiment of the present application, the voice device refers to an electronic device supporting a voice interaction function, including but not limited to: smart phone, intelligent audio amplifier, intelligent TV set, intelligent earphone, intelligent wrist-watch, intelligent glasses, intelligent refrigerator, intelligent vehicle-mounted equipment, intelligent robot, intelligent voice assistant etc..

The interaction process of the user and the voice equipment is as follows: the user inputs a voice command to the voice device, the voice device processes the voice command and outputs a response corresponding to the voice command to the user. The processing procedure of the voice equipment for the voice instruction comprises the following steps: performing voice recognition processing on the voice command to obtain a text corresponding to the voice command; carrying out semantic analysis processing on the text to determine the intention of a user; further, response information is generated according to the intention, and the response information is played. For example: taking the intelligent sound box as an example, a user outputs a voice instruction of playing the baby song to pull the radish to the intelligent sound box, the intelligent sound box plays the voice well, plays the radish for you, and starts playing the baby song to pull the radish.

In some examples, some or all of the processing of voice instructions described above may be implemented by a server, subject to the limitations of the processing power and/or storage power of the voice device. For example, after the voice device receives the voice command, the voice command may be directly sent to the server; the server executes the processing procedure of the voice command to obtain response information; the server sends the response information to the voice device, which plays the response information. For another example, after the voice device receives the voice command, the voice device performs voice recognition processing on the voice command to obtain a text corresponding to the voice command, and sends the text to the server. The server analyzes the text, determines the intention of the user, generates response information according to the intention of the user, and sends the response information to the voice equipment. After receiving the response information, the voice equipment plays the response information.

In order to facilitate understanding of the technical solution of the present application, an application scenario related to an embodiment of the present application is described below with reference to fig. 1.

Fig. 1 is a schematic diagram of a test scenario provided in an embodiment of the present application. As shown in fig. 1, the test scenario includes: test equipment, voice equipment and a server. Wherein the voice device is the object to be tested. Test devices are electronic devices with some computing power, including but not limited to: a mobile phone, a notebook computer, a tablet computer, a desktop computer, a server, etc. The test equipment is used for testing the stability of the voice equipment. The server is used for providing services for the voice equipment, and can be a common server or a cloud server.

Referring to fig. 1, the test device may simulate a user outputting a voice command to the voice device, and the voice device processes the voice command and outputs a response corresponding to the voice command. The process of the voice device processing the voice command may relate to interaction between the voice device and the server, and the specific interaction process may refer to the foregoing related description, which is not repeated herein. In the embodiment of the application, the test device can continuously output a plurality of voice instructions to the voice device so as to realize stability test of the voice device.

Further, in the embodiment of the present application, the test device may further obtain execution information of each voice command by the voice device, and determine a stability test result of the voice device according to the execution information of each voice command by the voice device.

Therefore, in the embodiment of the application, the stability test is automatically performed on the voice equipment through the test equipment, so that the test efficiency can be improved, and the stability test process of the voice equipment does not need to consume a large amount of manpower and time, so that the manpower cost and the time cost are reduced.

The technical scheme of the present application is described in detail below with specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.

Fig. 2 is a flow chart of a testing method of a voice device according to an embodiment of the present application. The method of the present embodiment may be performed by the test apparatus of fig. 1. As shown in fig. 2, the method of the present embodiment includes:

s201: and generating a plurality of text instructions according to at least one instruction template corresponding to the voice equipment.

In this embodiment, the text instruction refers to an instruction expressed in text form. The text instruction includes a plurality of instruction units, for example, each word in the text instruction may be referred to as an instruction unit. The instruction templates are used for describing format characteristics satisfied by each instruction unit in the text instruction. The instruction templates may be in the form of regular expressions, for example, or may be in other forms.

As one example, an instruction template may be as follows:

the prefix word (open) < device name > [ switch ]

The instruction templates comprise 5 instruction units, wherein 'open' in parentheses and 'device name' in brackets are optional instruction units, and 'prefix words', 'switch' in brackets are optional instruction units.

In one possible implementation, a plurality of text instructions may be generated in the following manner:

(1) Analyzing each instruction template to obtain the identification of a plurality of instruction units, the attributes of the plurality of instruction units and the sequence among the plurality of instruction units, wherein the attributes of each instruction unit are optional or necessary.

For example, the instruction templates are parsed to obtain identifiers of 5 instruction units, which are "keyword", "open", "device name", "switch", respectively. The attributes of the instruction unit "key word", "switch" are optional, and the attributes of the instruction unit "open" and "device name" are mandatory. The order among the plurality of instruction units is as follows: "keyword", "open", "device name", "switch".

(2) And acquiring at least one candidate keyword corresponding to each of the plurality of instruction units according to the identifiers of the plurality of instruction units.

For example, candidate keywords corresponding to the instruction unit "prefix word" include, but are not limited to: please, want, fast, help me, replace me, please me, fast give me, want me want, and the like. The candidate keyword corresponding to the instruction unit "device name" is an identification of the voice device, for example, air conditioner 1, living room air conditioner, and the like. The candidate keywords corresponding to the instruction unit "open" include, but are not limited to: opening, starting, etc.

(3) And generating a plurality of text instructions according to the attributes of the plurality of instruction units, the sequence among the plurality of instruction units and at least one candidate keyword corresponding to each of the plurality of instruction units.

For example, a plurality of different keyword combinations may be generated according to at least one candidate keyword corresponding to each of the plurality of instruction units, where each keyword combination includes: and a candidate keyword corresponding to each of the plurality of instruction units.

For example, taking the instruction template as an example, candidate keywords corresponding to 5 instruction units are arranged and combined to obtain a plurality of keyword combinations, for example, some of the keyword combinations are as follows:

{ please open, air conditioner 1, switch }

{ please, on, air conditioner 1, on, off })

{ please, start, air conditioner 1, switch })

{ want, open, air conditioner 1, switch })

{ want, turn on, air conditioner 1, switch })

{ want, open, air conditioner 1, switch })

{ want, start, air conditioner 1, switch })

{ help me, open, air conditioner 1, switch }

{ help me, on, air conditioner 1, on/off }

{ help me, on, air conditioner 1, on, off })

{ help me, start, air conditioner 1, switch }

……

And for each keyword combination, performing splicing processing on each candidate keyword in the keyword combination according to the attributes of the plurality of instruction units and the sequence among the plurality of instruction units to obtain a plurality of text instructions.

For example, for the keyword combination { please open, air conditioner 1, switch }, depending on the attributes of the plurality of instruction units and the order among the plurality of instruction units, the following text instructions may be obtained: the method comprises the steps of turning on the air conditioner 1, requesting to turn on the air conditioner 1, turning on a switch of the air conditioner 1, requesting to turn on the switch of the air conditioner 1.

In this embodiment, a text instruction is generated according to an instruction template corresponding to a voice device, so that the generated text instruction can cover various possible instruction expression modes, the richness of the text instruction for testing the voice device can be improved, and the accuracy of a test result of the voice device is further improved.

In some possible examples, it may be desirable to wake up the voice device using a specified wake-up statement prior to interacting with the voice device. Thus, each text instruction may include: wake statements and naming statements. The wake-up statement is used for waking up the voice device, and the command statement is used for indicating the voice device to execute a certain command.

By way of example, a plurality of text instructions may be as shown in Table 1.

TABLE 1

Sequence number of text instruction	Wake statement	Command statement
			1	Small A	Turning on the air conditioner 1
2	Small A	Please turn on the air conditioner 1
			3	Small A	Switch for opening air conditioner 1
……	……	……

S202: converting the text instructions into voice instructions and playing the voice instructions to voice equipment.

In this embodiment, the text instruction may be subjected to speech synthesis processing to obtain a speech instruction. Further, the voice command may be played to obtain the voice command collected by the voice device.

In a possible implementation, the test device is provided with audio playback means, such as a loudspeaker or the like, by means of which the test device can play voice instructions to the voice device.

In another possible implementation, the test device may be provided with a dummy nozzle arrangement, or the test device may be connected with a dummy nozzle arrangement. The simulated mouth device may also be referred to as a manual mouth or an artificial mouth. The simulated mouth device may simulate a sound source that approximates a real human mouth. In this embodiment, the test device may control the simulated mouth apparatus to play a voice command to the voice device. By controlling the simulation nozzle device to play the voice command, the voice command sent by the real user can be simulated more truly, so that the interaction process of the test equipment and the voice equipment is more similar to that of the real user and the voice equipment, and the accuracy of the test result can be improved.

In this embodiment, after converting a plurality of text instructions into a plurality of voice instructions, a preset environmental noise model may be further obtained, and the plurality of voice instructions are subjected to noise adding processing through the environmental noise model to obtain a plurality of voice instructions containing noise. Further, a voice command containing noise is played to the voice device. For example, the environmental noise model corresponding to various environments may be counted in advance for a plurality of different environments. And respectively adding environment noise models corresponding to different environments to each voice instruction to obtain the noise-containing voice instruction under different environments. By adding environmental noise to the voice command, the voice command after noise addition can better simulate the voice command of a real scene, thereby improving the accuracy of a test result.

In this embodiment of the present application, after the test device obtains a plurality of voice commands, the plurality of voice commands may be sequentially played to the voice device.

In one possible implementation, the following steps (1) to (3) may be performed in a loop to implement playing of a plurality of voice commands.

(1) And playing the ith voice instruction to the voice equipment.

(2) A first timer is started that has a duration of a first preset duration (e.g., 10 seconds, 1 minute, 5 minutes, etc.), or a random duration (e.g., a random duration between 7 seconds and N minutes may be set).

(3) After the first timer is finished, the step (1) is executed in a return mode.

In the loop execution process, i sequentially takes 1, 2, 3, … … and N, where N is the number of the plurality of voice instructions acquired in S201.

In this implementation, when playing a plurality of voice commands, a certain time interval is provided between adjacent voice commands, that is, after playing the ith voice command, waiting for a certain time interval to play the (i+1) th voice command. The waiting time interval may be a specific preset duration or a randomly generated duration. The time interval is controlled by a first timer.

By waiting for a certain time interval when playing the adjacent voice instructions, the playing process of the voice instructions is more in accordance with the dialogue characteristics of the real user and the voice equipment, so that the automatic testing process of the voice equipment can more truly simulate the using process of the real user to the voice equipment, and the accuracy of the testing result is improved.

In one possible implementation, when the voice command includes a wake-up statement and a command statement, the playing process of each voice command may be implemented as follows. Taking the i-th voice instruction as an example,

(1) And playing the wake-up statement in the ith voice instruction to the voice equipment.

(2) And starting a second timer, wherein the duration of the second timer is a second preset duration or a random duration.

(3) And after the second timer is finished, playing the command statement in the ith voice instruction to the voice equipment.

In this implementation, when each voice command is played, a certain time interval is also provided between the wake-up statement and the command statement, that is, after the wake-up statement is played, the command statement is played again after waiting for a certain time interval. The waiting time interval may be a specific preset duration or a randomly generated duration. The time interval is controlled by a second timer.

By waiting a certain time interval between the wake-up statement and the command statement in each voice command, the playing process of each voice command is more in accordance with the dialogue characteristics of a real user and voice equipment, so that the automatic testing process of the voice equipment can simulate the using process of the real user to the voice equipment more truly, and the accuracy of the testing result is improved.

S203: the execution information corresponding to each voice instruction is obtained from the voice equipment or a server corresponding to the voice equipment, wherein the execution information corresponding to each voice instruction comprises at least one of the following steps: the response result of the voice equipment to the voice instruction, the response time delay of the voice equipment to the voice instruction and the resource occupation information when the voice equipment executes the voice instruction are obtained, wherein the response result is response or non-response.

In this embodiment, after the test device plays the voice command to the voice device, the voice device processes the voice command. Further, the test device may obtain execution information of each voice command by the voice device, where the execution information refers to related information for describing an execution condition of the voice command by the voice device.

The execution information corresponding to each voice instruction may include one or more of the following: the response result of the voice equipment to the voice instruction, the response time delay of the voice equipment to the voice instruction and the resource occupation information when the voice equipment executes the voice instruction.

The response result of the voice device to the voice command can be responsive or non-responsive. For example, after the test device plays the voice command "play the baby song and pull the radish" to the voice device, if the voice device does not output any response information, the response result of the voice device to the voice command is indicated to be no response; if the voice device outputs the response message "good, play the radish for you", it is indicated that the voice device responds to the response result of the voice command.

The response time delay of the voice device to the voice command refers to how long the voice device can output response information after the test device plays the voice command to the voice device. For example, after the test device plays the voice command "play the baby song to pull the radish" to the voice device, if the voice device waits for 3 seconds to start outputting the response message "good, for your play to pull the radish", it is indicated that the response delay of the voice device to the voice command is 3 seconds.

The resource occupation information when the voice device executes the voice command refers to occupation situations of computing resources, storage resources, network resources and the like of the voice device in the process of executing the voice command by the voice device. The resource occupancy information includes, but is not limited to: central processing unit (Central Processing Unit, CPU) occupancy, memory occupancy, network transmission rate, etc.

In one example, the test device may be communicatively coupled to a voice device. For example, the test device may be connected to the voice device via a data line, or the test device may be connected to the voice device via a network. After receiving each voice command, the voice device automatically records the execution information of the voice command in a log file, for example, whether the voice device responds to the voice command, the response time length, the CPU occupancy rate, the memory occupancy rate, the network transmission rate and the like.

In this case, the test device may acquire execution information of each voice instruction by the voice device from the voice device. For example, the test device may send a request message to the voice device, and after receiving the request message, the voice device sends its own execution information for each voice instruction to the test device. Or the voice equipment actively transmits the execution information of each voice instruction to the test equipment.

Optionally, after each voice command is played, the test device may obtain the execution information corresponding to the voice command from the voice device. Or, the test device may also obtain, from the voice device, the execution information corresponding to all the voice commands at one time after playing all the voice commands. This embodiment is not limited thereto.

In another example, the voice device may upload the log file to the server in real time or periodically, and the test device may obtain the execution information corresponding to each of the plurality of voice instructions from the server. Specifically, the test apparatus may employ the following means:

(1) The testing equipment determines a starting playing time and an ending playing time corresponding to the voice commands. The start playing time may be the playing time of the 1 st voice command, and the end playing time may be the playing time of the last voice command.

(2) The test equipment sends a request message to the server, wherein the request message comprises the following components: the start playing time, the end playing time and the identification of the voice equipment. Thus, after receiving the request message, the server can obtain the log file corresponding to the voice device according to the identifier of the voice device, and obtain the execution information of the voice device on each voice execution from the log file according to the starting playing time and the ending playing time. Further, the server transmits execution information of each voice execution by the voice device to the test device.

(3) And the test equipment receives execution information corresponding to each of the voice instructions from the server.

In the implementation manner, even if the test equipment cannot be directly connected with the voice equipment, the test equipment can acquire the execution information corresponding to each voice instruction from the server, so that the automatic test of various voice equipment can be realized.

In the two possible implementations, the test device needs to obtain the execution information corresponding to each voice instruction from the voice device or from the server. In still other possible implementations, the test device may also obtain the execution information corresponding to each voice instruction by monitoring the response information of the voice device by itself. For example, after the test device plays each voice command to the voice device, the test device starts to monitor the response information of the voice device, and if the response information is monitored within the preset duration, the response result of the voice device to the voice command is indicated to be responsive; if the response information is not monitored within the preset time, the response result of the voice equipment to the voice instruction is indicated to be non-response; and under the condition that the response result is that the response exists, the test equipment can record the response time of the voice equipment to the voice instruction. In the implementation mode, the test equipment can acquire the execution information of the voice equipment on the voice instruction without depending on the voice equipment and the server, so that the test flexibility of the test equipment is improved.

S204: and determining a stability test result of the voice equipment according to the execution information corresponding to each of the voice instructions.

In this embodiment, the test device may determine the stability test result of the voice device by analyzing the execution information corresponding to each voice command.

In one possible implementation manner, if the execution information corresponding to the voice command includes: and the test equipment can determine that the response result is the first number of the responsive voice instructions according to the response result of the voice equipment to each voice instruction. And determining the response rate of the voice equipment according to the first quantity and the total quantity of the voice instructions. For example, if the total number of voice commands is 100, and the response result corresponding to 95 voice commands is that there is a response, the response rate of the voice device is 95%.

Further, the test device may determine a stability test result of the voice device according to the response rate of the voice device. If the response rate of the voice equipment is greater than or equal to the preset response rate, determining that the stability test result of the voice equipment is that the test is passed; if the response rate of the voice equipment is smaller than the preset response rate, determining that the stability test result of the voice equipment is that the test is not passed.

In one possible implementation manner, if the execution information corresponding to the voice command includes: and if the response time of the voice equipment to the voice instructions is long, the test equipment can determine the average response time delay of the voice equipment according to the response time delay of the voice equipment to each voice instruction. Further, the test device may determine a stability test result of the voice device according to the average response delay of the voice device. If the average response time delay of the voice equipment is smaller than or equal to the preset response time delay, determining that the stability test result of the voice equipment is that the test is passed; if the average response time delay of the voice equipment is larger than the preset response time delay, determining that the stability test result of the voice equipment is that the test is not passed.

In one possible implementation manner, if the execution information corresponding to the voice command includes: the test device may determine an average resource occupancy rate of the voice device according to the resource occupancy information when the voice device executes the voice command. Further, the test device may determine a stability test result of the voice device according to the average resource occupancy rate of the voice device. For example, if the average resource occupancy rate of the voice device is less than or equal to the preset resource occupancy rate, determining that the stability test result of the voice device is that the test is passed; if the average resource occupancy rate of the voice equipment is larger than the preset resource occupancy rate, determining that the stability test result of the voice equipment is that the test is not passed.

Or, the test device may determine, according to the resource occupation information when the voice device executes each voice instruction, whether the first voice instruction exists in the plurality of voice instructions, where the resource occupation rate of the voice device when executing the first voice instruction is greater than a preset resource occupation rate. If the test result is not passed, the test result is passed.

In one possible implementation manner, if the execution information corresponding to each voice instruction includes: the test device may determine the stability test result of the voice device by using the following manner:

(1) According to the response result of the voice equipment to each voice instruction, determining that the response result is the first number of the voice instructions with response; and determining a response rate of the voice device based on the first number and the total number of the plurality of voice instructions.

(2) And determining the average response time delay of the voice equipment according to the response time delay of the voice equipment to each voice instruction.

(3) And determining the average resource occupancy rate of the voice equipment according to the resource occupancy information when the voice equipment executes each voice instruction.

(4) And determining a stability test result of the voice equipment according to the response rate, the average response time delay and the average resource occupancy rate.

For example, if the response rate is greater than or equal to a preset response rate, the average response time delay is less than or equal to a preset response time delay, and the average resource occupancy rate is less than or equal to a preset resource occupancy rate, determining that the stability test result of the voice device is test passing;

if at least one of the following is satisfied: and if the response rate is smaller than the preset response rate, the average response time delay is larger than the preset response time delay, and the average resource occupancy rate is larger than the preset resource occupancy rate, determining that the stability test result of the voice equipment is that the test fails.

Optionally, in the above various implementations, if each voice instruction includes a wake-up statement and a command statement, when determining the response rate of the voice device, the wake-up response rate and the command response rate of the voice device may be determined respectively.

In this embodiment, a test script may be deployed in a test apparatus, and the test apparatus implements the test method provided in this embodiment by executing the test script. Optionally, a test duration may also be set in the test script, e.g., 30 minutes, 1 hour, 8 hours, 16 hours, 168 hours, etc. The test method of the present embodiment may be repeatedly executed within the specified test duration until the specified test duration is reached. For example, assuming that the test duration is 10 hours, the test device acquires 100 voice commands, and after the 100 voice commands are executed, the test device may repeatedly execute the test process of the 100 voice commands until the specified test duration is reached. Therefore, different test time lengths can be designated according to test requirements of different scenes, and the test flexibility is improved.

The test method for the voice equipment provided by the embodiment comprises the following steps: the test equipment generates a plurality of text instructions according to at least one instruction template corresponding to the voice equipment, converts the plurality of text instructions into a plurality of voice instructions, plays the plurality of voice instructions to the voice equipment, and acquires execution information corresponding to each voice instruction from the voice equipment or a server corresponding to the voice equipment, wherein the execution information corresponding to each voice instruction comprises at least one of the following: the response result of the voice equipment to the voice instruction, the response time delay of the voice equipment to the voice instruction and the resource occupation information when the voice equipment executes the voice instruction are response or non-response, and further, the stability test result of the voice equipment is determined according to the execution information corresponding to the voice instructions. Through the process, the stability test of the voice equipment is automatically carried out by the test equipment, the test efficiency is improved, and the labor cost and the time cost are reduced. Further, a plurality of text instructions are generated based on at least one instruction template, so that the generated text instructions can cover various possible instruction expression modes, the richness of the text instructions for testing the voice equipment can be improved, and the accuracy of the test result of the voice equipment is further improved.

On the basis of the above-described embodiments, the following describes the solution of the present application in connection with a specific example.

Fig. 3 is a flow chart of another testing method of a voice device according to an embodiment of the present application. As shown in fig. 3, the method of the present embodiment includes:

s301: the test equipment generates a plurality of text instructions according to at least one instruction template corresponding to the voice equipment, and converts the text instructions into voice instructions.

It should be understood that, the specific implementation of S301 may refer to the related description of the foregoing embodiments, which is not repeated herein.

S302: the test device plays the voice command to the voice device.

S303: the voice device executes the voice instructions.

The above-described S302 to S303 may be repeatedly performed a plurality of times. Assuming that the number of voice instructions obtained in S301 is N, the test device sequentially plays each voice instruction to the voice device so that the voice device executes each voice instruction. The implementation manner of playing the voice command to the voice device by the test device may be referred to the detailed description of the foregoing embodiments, which is not repeated herein.

S304: the voice equipment sends the execution information corresponding to each voice instruction to the server.

After each voice instruction is executed, the voice equipment can send the execution information corresponding to the voice instruction to the server. Or, the voice device may further send execution information corresponding to all voice commands to the server after all voice commands are executed. Or, the voice device may also send the execution information corresponding to the voice command to the server once every certain time interval. This embodiment is not limited thereto.

S305: the test equipment sends a request message to the server.

Wherein the request message includes: the starting playing time, the ending playing time and the identification of the voice equipment corresponding to the voice instructions.

S306: and the server sends the execution information corresponding to the voice instructions to the test equipment.

S307: and the test equipment determines a stability test result of the voice equipment according to the execution information corresponding to the voice instructions.

It should be noted that, the specific implementation manner of S305 to S307 may refer to the detailed description of the foregoing embodiments, which is not repeated here.

S308: and the test equipment displays the stability test result or sends the stability test result to preset equipment.

The preset device may be a terminal device corresponding to a developer/tester of the voice device. By displaying the stability test result or sending the stability test result to the preset equipment, related personnel can know the test result in time.

The test method of the voice equipment provided by the embodiment realizes that the test equipment automatically tests the stability of the voice equipment, and improves the test efficiency. Further, labor cost and time cost are also reduced.

In the embodiments shown in fig. 2 and 3, a speech product is taken as an example. In practical application, the test method of the voice equipment provided by the embodiment of the application can also realize the simultaneous test of a plurality of voice products. The following is a description with reference to fig. 4.

Fig. 4 is a schematic diagram of another test scenario provided in an embodiment of the present application. As shown in fig. 4, the application scenario may include a plurality of voice devices, where the plurality of voice devices are all to-be-tested voice devices. The test equipment executes the method of the embodiment, so that the simultaneous test of a plurality of voice equipment can be realized.

Specifically, the test device acquires a plurality of voice instructions and plays the plurality of voice instructions. When each voice command is played, the test equipment receives the voice command by a plurality of voice equipment and executes the voice command. Each voice device uploads the execution information of the voice instruction to the server. Thus, the server stores execution information of voice instructions by a plurality of voice devices. After the test equipment plays the plurality of test instructions, when a request message is sent to the server, the request message can comprise the identifier of each voice equipment, and the starting playing time and the ending playing time of the plurality of voice instructions. In this way, the server can send the test device execution information of the plurality of voice instructions by each voice device. And the test equipment determines the stability test result of each voice equipment according to the execution information of the voice equipment on the voice instructions. Through the process, the stability test of a plurality of voice devices is realized, and the test efficiency is further improved.

Fig. 5 is a schematic structural diagram of a test device for a voice device according to an embodiment of the present application. The apparatus may be in the form of software and/or hardware. The apparatus may be, for example, the test device of fig. 1 or a processor, chip, module, unit, etc. disposed in the test device. As shown in fig. 5, the test apparatus 500 for a voice device provided in this embodiment includes: a generating module 501, a processing module 502, a playing module 503, an obtaining module 504 and a determining module 505.

The generating module 501 is configured to generate a plurality of text instructions according to at least one instruction template corresponding to the voice device;

a processing module 502, configured to convert the plurality of text instructions into a plurality of voice instructions;

a playing module 503, configured to play the plurality of voice instructions to the voice device;

an obtaining module 504, configured to obtain, from the voice device or a server corresponding to the voice device, execution information corresponding to each of the plurality of voice instructions, where the execution information corresponding to each voice instruction includes at least one of: the response result of the voice equipment to the voice instruction, the response time delay of the voice equipment to the voice instruction and the resource occupation information when the voice equipment executes the voice instruction are obtained, wherein the response result is response or non-response;

The determining module 505 is configured to determine a stability test result of the voice device according to the execution information corresponding to each of the plurality of voice commands.

In one possible implementation manner, the execution information corresponding to each voice instruction includes: the response result of the voice equipment to the voice instruction, the response time delay of the voice equipment to the voice instruction and the resource occupation information when the voice equipment executes the voice instruction; the determining module 505 is specifically configured to:

In a possible implementation manner, the determining module 505 is specifically configured to:

In one possible implementation, the generating module 501 is specifically configured to:

In a possible implementation, the processing module 502 is further configured to:

acquiring a preset environmental noise model;

In a possible implementation manner, the playing module 503 is specifically configured to:

playing an ith voice instruction to the voice equipment;

In one possible implementation, each voice instruction includes: wake-up statements and command statements; the playing module 503 is specifically configured to:

In a possible implementation manner, the apparatus further includes:

the display module is used for displaying the stability test result; or,

and the sending module is used for sending the stability test result to preset equipment.

The test device for a voice device provided in this embodiment may be used to execute the test method for a voice device provided in any of the above method embodiments, and its implementation principle and technical effects are similar, and will not be described here again.

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may be the test device of fig. 1. As shown in fig. 6, the electronic device 600 provided in this embodiment includes: at least one processor 601 and a memory 602. The processor 601 and the memory 602 may be connected by a bus 603, for example.

The memory 602 is used to store a computer program;

the at least one processor 601 is configured to execute the computer program stored in the memory, so that the electronic device 600 executes the method for testing a voice device according to any one of the above embodiments, and its implementation principle and technical effects are similar, which is not described herein.

The embodiment of the application also provides a computer readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the method for testing a voice device provided in any one of the method embodiments is implemented, and its implementation principle and technical effect are similar, and are not repeated here.

The embodiment of the application also provides a chip, which comprises: the device comprises a memory and a processor, wherein the memory stores a computer program, and the processor runs the computer program to realize the test method of the voice device provided by any method embodiment, and the implementation principle and the technical effect are similar, and are not repeated here.

The embodiment of the application further provides a computer program product, which comprises a computer program, when the computer program is executed by a processor, the method for testing the voice device provided by any one of the method embodiments is implemented, and the implementation principle and the technical effect are similar, and are not repeated herein.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in each embodiment of the present application may be integrated in one processing unit, or each module may exist alone physically, or two or more modules may be integrated in one unit. The units formed by the modules can be realized in a form of hardware or a form of hardware and software functional units.

The integrated modules, which are implemented in the form of software functional modules, may be stored in a computer readable storage medium. The software functional modules described above are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to perform some steps of the methods described in the embodiments of the present application.

It is understood that the processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in a processor for execution.

The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile memory NVM, such as at least one magnetic disk memory, and may also be a U-disk, a removable hard disk, a read-only memory, a magnetic disk or optical disk, etc.

The bus may be an industry standard architecture (Industry Standard Architecture, ISA) bus, an external device interconnect (Peripheral Component, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, the buses in the drawings of the present application are not limited to only one bus or one type of bus.

The storage medium may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an application specific integrated circuit (Application Specific Integrated Circuits, ASIC). It is also possible that the processor and the storage medium reside as discrete components in an electronic device or a master device.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the method embodiments described above may be performed by hardware associated with program instructions. The foregoing program may be stored in a computer readable storage medium. The program, when executed, performs steps including the method embodiments described above; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A method for testing a voice device, the method comprising:

2. The method of claim 1, wherein the execution information corresponding to each voice instruction comprises: the response result of the voice equipment to the voice instruction, the response time delay of the voice equipment to the voice instruction and the resource occupation information when the voice equipment executes the voice instruction;

Determining a first number of voice instructions with response according to the response result of the voice equipment to each voice instruction, and determining the response rate of the voice equipment according to the first number and the total number of the voice instructions;

3. The method of claim 2, wherein determining the stability test result of the voice device based on the response rate, the average response delay, and the average resource occupancy comprises:

4. A method according to any one of claims 1 to 3, wherein generating a plurality of text instructions from at least one instruction template corresponding to the speech device comprises:

5. The method of claim 4, wherein generating a plurality of text instructions based on the attributes of the plurality of instruction units, the order among the plurality of instruction units, and at least one candidate keyword to which each of the plurality of instruction units corresponds, comprises:

6. The method of any one of claims 1 to 5, further comprising, prior to playing the plurality of voice instructions to the voice device:

acquiring a preset environmental noise model;

7. The method of any of claims 1-6, wherein playing the plurality of voice instructions to the voice device comprises:

playing an ith voice instruction to the voice equipment;

8. The method of claim 7, wherein each voice command comprises: wake-up statements and command statements; playing an ith voice instruction to the voice equipment, including:

9. The method according to any one of claims 1 to 8, further comprising, after determining the stability test result of the voice device according to the execution information corresponding to each of the plurality of voice commands:

displaying the stability test result; or,

and sending the stability test result to preset equipment.

10. A test apparatus for a speech device, disposed in a test device, the apparatus comprising:

11. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores a computer program that is executed by the at least one processor to implement the method of any one of claims 1 to 9.

12. A computer readable storage medium, having stored therein a computer program, the computer program being executed by the at least one processor to implement the method of any one of claims 1 to 9.

13. A computer program product comprising a computer program which, when executed by a processor, implements the method of any one of claims 1 to 9.