CN107995624B

CN107995624B - Method for outputting sound data based on multi-path data transmission

Info

Publication number: CN107995624B
Application number: CN201711282364.3A
Authority: CN
Inventors: 王梅
Original assignee: Beijing Momo Information Technology Co Ltd
Current assignee: Beijing Momo Information Technology Co Ltd
Priority date: 2017-12-07
Filing date: 2017-12-07
Publication date: 2021-03-19
Anticipated expiration: 2037-12-07
Also published as: CN107995624A; CN112804683A

Abstract

The invention discloses a method and a system for outputting sound data based on multipath data transmission, wherein the method comprises the following steps: the sound agent device receives the sound data output request from the mobile terminal and determines a target sound output unit; the voice agent device judges the network transmission delay received from the mobile terminal, and if the network transmission delay is smaller than the delay time threshold value, the voice agent device sends an indication message for indicating the mobile terminal to output voice data through multi-path data transmission to the mobile terminal; in response to establishment of the wireless communication connection, the mobile terminal enters a multipath data transmission mode. Each of the other sound output units performs sound output based on the first output instruction transmitted by the sound agent device and the sound data subjected to the noise reduction processing, and the target sound output unit performs sound output based on the second output instruction transmitted by the mobile terminal and the original sound data.

Description

Method for outputting sound data based on multi-path data transmission

Technical Field

The present invention relates to a system and a method for outputting sound data based on multi-path data transmission, such as the network communication field, the internet of things communication field, the data processing field, the voice processing field, and the like.

Background

Currently, in a teleconference or live conference, a user who wishes to perform voice output often needs to perform voice input with a voice input device such as a microphone and then perform voice output through a voice output device such as a speaker. However, it is often the case that the number of voice input devices is insufficient. Such an insufficient number of voice input devices may cause some users to wait for other users to transfer the voice input devices when they wish to output voice. Furthermore, when two users are frequently alternating speech outputs for the same problem, it may be necessary to frequently exchange speech input devices between the two users.

In this case, on the one hand, a delay in the voice output by the user is caused, for example, the user needs to wait for the voice input device, and on the other hand, the voice output by the user is not changed, for example, the voice input device needs to be switched.

Furthermore, when a user wishes to output multimedia data simultaneously with speech output, the prior art solutions fail to fulfill this requirement.

Disclosure of Invention

According to an aspect of the present invention, there is provided a method for sound data transmission based on multiple transmission paths, the method including:

a user initiating a voice output request to a voice agent device via a mobile terminal, calculating a network delay and transmitting the network delay to the voice agent device after receiving a voice transmission matching code from a voice output unit through voice transmission and receiving a response message from the voice agent device that permits voice output;

entering a multi-path transmission mode based on a multi-path transmission protocol according to an instruction of the voice agent apparatus to enter a feedback suppression mode, transmitting voice data input by a user to the voice agent apparatus using a first network connection, and simultaneously transmitting voice data input by the user to a voice output unit closest to a position of the mobile terminal among the plurality of voice output units using a second network connection,

extracting a sound sample, a mobile terminal position and a sound wave transmission matching code from a sound output request received by a mobile terminal by using a sound agent device, determining whether to allow sound output by the user according to the sound sample, and sending the sound wave transmission matching code to the mobile terminal by sound wave transmission by indicating a sound output unit closest to the mobile terminal position among a plurality of sound output units without considering a result of the determination whether to allow sound output by the user;

if the user is allowed to output sound according to the sound sample, sending a response message to the mobile terminal and receiving network delay from the mobile terminal, and if the network delay is larger than a feedback threshold value, entering a feedback suppression mode to output sound;

in the feedback suppression mode, the voice agent apparatus instructs the mobile terminal to enter a multipath transmission mode based on a multipath transmission protocol,

and performing sound output based on the output instruction and the sound data sent by the sound agent device by using sound output units except the sound output unit closest to the position of the mobile terminal in the plurality of sound output units, wherein the sound output unit closest to the position of the mobile terminal performs sound output according to the output instruction and the sound data sent by the mobile terminal.

According to an aspect of the present invention, there is provided a method of outputting sound data based on multipath data transmission, the method including:

a user uses a mobile terminal to send a sound data output request to sound agent equipment through a first wireless network, wherein the sound data output request comprises a sound sample, the current position of the mobile terminal and an initial matching code;

the sound agent device receives the sound data output request from the mobile terminal, determines whether the mobile terminal is allowed to output sound data according to a sound sample in the sound data output request, and sends an output permission message to the mobile terminal if the mobile terminal is determined to be allowed to output sound data;

the sound agent device detects a target sound output unit closest to the current position of the mobile terminal from among a plurality of sound output units based on the current position of the mobile terminal in the sound data output request, determines a first sound wave transmission matching code based on an initial matching code in the sound data output request and a matching random number generated randomly, and transmits the first sound wave transmission matching code to the target sound output unit;

the mobile terminal receives the permission output message and determines a network transmission delay based on a time stamp in the permission output message, and the network transmission delay is sent to the sound agent device;

the voice agent device judges the network transmission delay received from the mobile terminal, and if the network transmission delay is smaller than the delay time threshold value, the voice agent device sends an indication message for indicating the mobile terminal to output voice data through multi-path data transmission to the mobile terminal;

in response to receiving an indication message, the mobile terminal determines a second acoustic transmission match code from the initial match code and a matching random number in the indication message and broadcasts the second acoustic transmission match code to the plurality of sound output units based on acoustic communications;

when a target sound output unit in the plurality of sound output units determines that the received second sound wave transmission matching code is the same as the first sound wave transmission matching code, establishing wireless communication connection with the mobile terminal through a second wireless network;

in response to the establishment of the wireless communication connection, the mobile terminal enters a multipath data transmission mode: acquiring original sound data input by a user through a sound acquisition device of a mobile terminal, performing noise reduction processing on the original sound data to generate noise-reduced sound data, transmitting the noise-reduced sound data to a sound proxy device through a first wireless network, transmitting the noise-reduced sound data and a first output instruction to other sound output units except for a target sound output unit among a plurality of sound output units through a wired network by the sound proxy device, and simultaneously transmitting the original sound data and a second output instruction to the target sound output unit through a second wireless network by the mobile terminal; and

each of the other sound output units performs sound output based on the first output instruction transmitted by the sound agent device and the sound data subjected to the noise reduction processing, and the target sound output unit performs sound output based on the second output instruction transmitted by the mobile terminal and the original sound data.

Before the user uses the mobile terminal to send the sound data output request to the sound agent device through the first wireless network, the method further comprises the following steps: and acquiring a sound sample input by a user through a sound acquisition device of the mobile terminal. The sound sample is a segment of speech input by a user in a live environment where speech output is taking place. The sound sample is a piece of speech used to convey a summary of the user's opinion.

The sound sample is a piece of speech that is used to introduce the identity of the user. Before the user uses the mobile terminal to send the sound data output request to the sound agent device through the first wireless network, the method further comprises the following steps: and acquiring the current position of the mobile terminal by using a positioning device of the mobile terminal.

The acquiring the current position of the mobile terminal by using the positioning device of the mobile terminal comprises: the positioning device calibrates the satellite positioning data according to indoor auxiliary positioning, outdoor auxiliary positioning and/or access point auxiliary positioning to acquire the current position of the mobile terminal.

Before the user uses the mobile terminal to send the sound data output request to the sound agent device through the first wireless network, the method further comprises the following steps: the initial matching code is generated based on a Media Access Control (MAC) address of the mobile terminal or based on a hardware address of the mobile terminal.

The determining whether to allow the mobile terminal to output the sound data according to the sound sample in the sound data output request comprises: and performing voice recognition on the voice sample to generate text information, and determining to allow the mobile terminal to output voice data when the text information conforms to the expression habit of the corresponding language.

Wherein whether the text information conforms to the expression habit of the corresponding language is determined based on semantic recognition.

Dividing the character information into at least one sentence unit according to the sentence-breaking symbol of the character information, carrying out independent semantic analysis on each sentence unit to determine a semantic score, and determining that the character information accords with the expression habit of a corresponding language when the weighted sum of the semantic scores of each sentence unit is greater than an expression threshold value.

The sound agent apparatus acquires and stores a position of each of the plurality of sound output units in advance.

Determining a target sound output unit closest to the current position of the mobile terminal based on a straight-line distance between the current position of the mobile terminal and the position of each of the plurality of sound output units.

Determining a target sound output unit closest to the current location of the mobile terminal based on a sound wave transmission distance between the current location of the mobile terminal and a location of each of the plurality of sound output units.

The determining a first acoustic transmission match code based on an initial match code in the acoustic data output request and a randomly generated match random number comprises: and performing character string connection on the initial matching code and the randomly generated matching random number to generate a first sound wave transmission matching code.

Determining a first acoustic transmission match code based on the initial match code in the acoustic data output request and a randomly generated match random number comprises: the initial matching code and the randomly generated matching random number are summed to generate a first acoustic transmission matching code.

The determining a first acoustic transmission match code based on an initial match code in the acoustic data output request and a randomly generated match random number comprises: and performing cyclic shift, bitwise operation or bitwise splicing on the initial matching code based on the randomly generated matching random number to generate a first sound wave transmission matching code.

The sound sample is used to indicate at least one of: the speech intelligibility of the user, the type of language the speech of the user is involved in, and the background noise level.

And determining to allow the mobile terminal to output the sound data under the conditions that the definition of the voice of the user is greater than the minimum required definition threshold, the type of the language related to the voice of the user can be automatically translated by the voice recognition server, and the background sound noise intensity is lower than the maximum allowable noise intensity.

Said determining a network transmission delay based on a timestamp in the grant output message comprises: the mobile terminal determines the network delay based on the timestamp indicating the transmission time and the current time of the mobile terminal.

And if the network transmission delay is determined to be greater than or equal to the delay time threshold, sending an indication message for indicating that the mobile terminal cannot output the sound data through the multi-path data transmission to the mobile terminal.

In response to receiving an indication message that the audio data output through the multi-path data transmission is not possible, the mobile terminal performs the audio data output through the single-path data transmission.

The mobile terminal determines a second acoustic transmission matching code according to the initial matching code and the matching random number in the indication message, and the method comprises the following steps: the initial matching code and the matching random number are character-string-concatenated to generate a second acoustic transmission matching code.

The mobile terminal determines a second acoustic transmission matching code according to the initial matching code and the matching random number in the indication message, and the method comprises the following steps: the initial matching code and the matching random number are added to generate a second vocoding.

The mobile terminal determines a second acoustic transmission matching code according to the initial matching code and the matching random number in the indication message, and the method comprises the following steps: and performing cyclic shift, bitwise operation or bitwise concatenation on the initial matching code based on the matching random number to generate a second sound wave transmission matching code.

The target sound output unit stores the received at least one first acoustic transmission matching code.

Each of the plurality of sound output units stores the received at least one first acoustic transmission match code.

Each of the plurality of sound output units, upon receiving a second acoustic transmission match code, compares the second acoustic transmission match code with the stored at least one first acoustic transmission match code.

In the multi-path data transmission mode, the mobile terminal transmits the voice data through two different transmission paths or at least two different transmission paths.

Wherein the original sound data is an original sound data stream input by a user through a sound acquisition device of the mobile terminal, and the noise reduction-processed sound data is a noise reduction-processed sound data stream.

The transmitting the noise-reduced voice data to the voice agent device using the first wireless network includes: and transmitting the voice data stream subjected to the noise reduction processing to a voice agent device in real time by using a first wireless network.

The mobile terminal transmitting the original sound data and the second output instruction to the target sound output unit using the second wireless network includes: the mobile terminal transmits an original sound data stream to the target sound output unit in real time using a second wireless network, and transmits the second output instruction to the target sound output unit. Wherein the first output instruction comprises a volume value for indicating an output volume. Wherein the second output instruction comprises: a volume value indicating the output volume.

Before the sound agent apparatus transmits the noise reduction-processed sound data and the first output instruction to the other sound output units except the target sound output unit among the plurality of sound output units through the wired network, the method further includes:

determining an original frequency and an original timbre of the noise reduction processed sound data, determining a frequency grade of the noise reduction processed sound data based on a difference value between a preset reference frequency value and the frequency, determining a timbre weighting factor based on the frequency grade, determining a spectral curve of the original timbre and determining an initial timbre fraction of the noise reduction processed sound data according to a similarity of the spectral curve and a preset timbre standard line, and determining a timbre grade of the noise reduction processed sound data based on the timbre weighting factor and the initial timbre fraction; determining an original volume of the noise-reduced sound data, determining a volume weighting factor based on the tone level, and determining an output volume based on the volume weighting factor and the original volume. The preset reference frequency value is 100Hz, 120Hz or 150 Hz.

Determining a difference value of the original frequency (e.g., 60-300Hz) minus a reference frequency value, calculating an integer in a result of dividing the difference value by an interval value (e.g., 10), determining the integer as a frequency level of the noise reduction-processed sound data, and determining an absolute value of the frequency level as a timbre weighting factor.

The initial timbre scores comprise: 25 min, 26 min, 27 min, 28 min and 29 min.

The determining the tone scale of the noise reduction-processed sound data based on the tone weighting factor and the initial tone score comprises: and taking the difference obtained by subtracting the tone weighting factor from the initial tone score as the tone level of the sound data subjected to the noise reduction processing, and setting the tone level. Determining a volume weighting factor based on the timbre levels comprises: and determining a percentage numerical value corresponding to a numerical value obtained by dividing the tone grade by 100 as a tone weighting factor.

The determining an output volume based on the volume weighting factor and a raw volume comprises: output volume is original volume x (volume weighting factor + 1). The first output instruction includes a volume value indicating the output volume. The second output instruction includes a volume value indicating an original volume.

Each of the other sound output units performs sound output based on the first output instruction transmitted by the sound agent device and the sound data subjected to the noise reduction processing, and includes: each of the other sound output units performs sound output based on the volume value in the first output instruction transmitted by the sound agent apparatus and the sound data subjected to the noise reduction processing.

The target sound output unit performs sound output according to the second output instruction and the original sound data transmitted by the mobile terminal, and comprises the following steps: and the target sound output unit carries out sound output according to the volume value in the second output instruction transmitted by the mobile terminal and the original sound data.

Before the voice agent device transmits the noise reduction-processed voice data and the first output instruction to other voice output units except the target voice output unit in the plurality of voice output units through a wired network, the method further includes:

The initial timbre scores comprise: 25 min, 26 min, 27 min, 28 min and 29 min.

The determining an output volume based on the volume weighting factor and a raw volume comprises: output volume is original volume x (1-volume weighting factor). The first output instruction includes a volume value indicating the output volume. The second output instruction includes a volume value indicating an original volume.

Determining a difference value of the original frequency (e.g., 60-300Hz) minus a reference frequency value, calculating an integer in a result of dividing the difference value by an interval value (e.g., 10), determining the integer as a frequency level of the noise reduction-processed sound data, and determining an absolute value of the frequency level as a timbre weighting factor. The initial timbre scores comprise: 25 min, 26 min, 27 min, 28 min and 29 min. The determining the tone scale of the noise reduction-processed sound data based on the tone weighting factor and the initial tone score comprises: and taking the difference obtained by subtracting the tone weighting factor from the initial tone score as the tone level of the sound data subjected to the noise reduction processing, and setting the tone level. Determining a volume weighting factor based on the timbre levels comprises: and determining a percentage numerical value corresponding to a numerical value obtained by dividing the tone grade by 100 as a tone weighting factor.

The determining an output volume based on the volume weighting factor and a raw volume comprises: output volume is original volume x (1+ volume weighting factor). The first output instruction includes: a volume value indicating the output volume and a current location of the mobile terminal. The second output instruction includes a volume value indicating an original volume.

Each of the other sound output units performs sound output based on the first output instruction transmitted by the sound agent device and the sound data subjected to the noise reduction processing, and includes: each of the other sound output units determines a linear distance to the mobile terminal based on the current position of the mobile terminal in the first output instruction transmitted by the sound agent device, determines a percentage value corresponding to a value obtained by dividing the linear distance by 1000 as a distance weighting factor, calculates an actual sound volume value based on the distance weighting factor and the sound volume value of the first output instruction, and performs sound output according to the actual sound volume value and the sound data subjected to noise reduction processing; where the actual volume value is the volume value x (1+ distance weighting factor).

Before the mobile terminal transmits the original sound data and the second output instruction to the target sound output unit by using a second wireless network, the method further comprises the following steps: a volume value indicating an output volume and the network transmission delay are included in the second output instruction.

The target sound output unit performs sound output according to the second output instruction and the original sound data transmitted by the mobile terminal, and comprises the following steps: the target sound output unit performs delayed sound output on the original sound data according to the volume value in the second output instruction transmitted by the mobile terminal and with the time of network transmission delay, so that the target sound output unit and each of the other sound output units can keep time consistency when performing sound output.

When the sound agent apparatus determines at least two sound output units closest to the current position of the mobile terminal among a plurality of sound output units based on the current position of the mobile terminal in the sound data output request, one sound output unit is randomly selected from the at least two sound output units as a target sound output unit.

When the sound agent apparatus detects at least two sound output units closest to the current location of the mobile terminal among a plurality of sound output units based on the current location of the mobile terminal in the sound data output request, transmitting description information of the at least two sound output units to the mobile terminal and determining a target sound output unit from among the at least two sound output units in response to a selection message of a user.

When the sound agent device detects at least two sound output units closest to the current position of the mobile terminal from among the sound output units based on the current position of the mobile terminal in the sound data output request, the sound output unit farthest from the sound agent device from among the at least two sound output units is determined as a target sound output unit.

When the sound agent device detects at least two sound output units closest to the current position of the mobile terminal from among the sound output units based on the current position of the mobile terminal in the sound data output request, the sound output unit closest to the sound agent device from among the at least two sound output units is determined as a target sound output unit.

According to an aspect of the present invention, there is provided a system for sound data transmission based on multiple transmission paths, the system comprising:

a mobile terminal initiating a voice output request to the voice agent apparatus, calculating a network delay and transmitting the network delay to the voice agent apparatus after receiving a voice transmission matching code from the voice output unit through voice transmission and receiving a response message permitting voice output from the voice agent apparatus;

entering a multi-path transmission mode based on a multi-path transmission protocol in response to an instruction of the voice proxy apparatus to enter a feedback suppression mode, transmitting voice data input by a user to the voice proxy apparatus using a first network connection, and simultaneously transmitting voice data input by the user to a voice output unit closest to a position of the mobile terminal among the plurality of voice output units using a second network connection,

a sound agent device extracting a sound sample, a mobile terminal position, and a sound wave transmission matching code from a sound output request received by a mobile terminal, determining whether to allow the user to perform sound output based on the sound sample, and transmitting the sound wave transmission matching code to the mobile terminal by sound wave transmission, indicating a sound output unit closest to the mobile terminal position among a plurality of sound output units, regardless of a result of the determination whether to allow the user to perform sound output;

in the feedback suppression mode, the voice proxy apparatus instructs the mobile terminal to enter a multipath transmission mode based on a multipath transmission protocol,

and a plurality of sound output units, wherein sound output units except the sound output unit closest to the position of the mobile terminal perform sound output based on the output instruction and the sound data sent by the sound agent device, and the sound output unit closest to the position of the mobile terminal performs sound output according to the output instruction and the sound data sent by the mobile terminal.

According to an aspect of the present invention, there is provided a system for sound data output based on multi-path data transmission, the system including:

a mobile terminal for transmitting a voice data output request input by a user to a voice agent apparatus through a first wireless network, the voice data output request including a voice sample, a current location of the mobile terminal, and an initial matching code; in response to receiving an indication message, the mobile terminal determines a second acoustic transmission match code from the initial match code and a matching random number in the indication message and broadcasts the second acoustic transmission match code to a plurality of sound output units based on acoustic communications;

the voice agent device receives the voice data output request from the mobile terminal, determines whether the mobile terminal is allowed to output voice data according to a voice sample in the voice data output request, and sends an output permission message to the mobile terminal if the mobile terminal is determined to be allowed to output voice data; detecting a target sound output unit closest to the current position of the mobile terminal among a plurality of sound output units based on the current position of the mobile terminal in the sound data output request, determining a first sound wave transmission matching code based on an initial matching code in the sound data output request and a matching random number generated randomly, and transmitting the first sound wave transmission matching code to the target sound output unit;

wherein the mobile terminal receives the grant output message and determines a network transmission delay based on a timestamp in the grant output message, the network transmission delay being sent to the voice proxy device; the voice agent equipment judges the network transmission delay received from the mobile terminal, and if the network transmission delay is smaller than the delay time threshold value, the voice agent equipment sends an indication message for indicating the mobile terminal to output voice data through multi-path data transmission to the mobile terminal;

a plurality of sound output units, which establish a wireless communication connection with the mobile terminal through a second wireless network when a target sound output unit of the plurality of sound output units determines that the received second sound transmission matching code is the same as the first sound transmission matching code;

wherein in response to the establishment of the wireless communication connection, the mobile terminal enters a multipath data transmission mode: acquiring original sound data input by a user through a sound acquisition device of a mobile terminal, performing noise reduction processing on the original sound data to generate noise-reduced sound data, transmitting the noise-reduced sound data to a sound proxy device through a first wireless network, transmitting the noise-reduced sound data and a first output instruction to other sound output units except for a target sound output unit among a plurality of sound output units through a wired network by the sound proxy device, and simultaneously transmitting the original sound data and a second output instruction to the target sound output unit through a second wireless network by the mobile terminal; and

Further comprising: and acquiring a sound sample input by a user through a sound acquisition device of the mobile terminal. The sound sample is a segment of speech input by a user in a live environment where speech output is taking place. The sound sample is a piece of speech used to convey a summary of the user's opinion. The sound sample is a piece of speech that is used to introduce the identity of the user. Further comprising: and acquiring the current position of the mobile terminal by using a positioning device of the mobile terminal.

Further comprising: the initial matching code is generated based on a Media Access Control (MAC) address of the mobile terminal or based on a hardware address of the mobile terminal. And the voice agent equipment performs voice recognition on the voice sample to generate text information, and determines to allow the mobile terminal to output voice data when the text information conforms to the expression habit of a corresponding language.

The sound agent apparatus determines whether the text information conforms to an expression habit of a corresponding language based on semantic recognition.

The voice agent device divides the character information into at least one sentence unit according to the sentence-breaking symbol of the character information, performs independent semantic analysis on each sentence unit to determine a semantic score, and determines that the character information accords with the expression habit of the corresponding language when the weighted sum of the semantic scores of each sentence unit is greater than an expression threshold value. The sound agent apparatus acquires and stores a position of each of the plurality of sound output units in advance. The sound agent apparatus determines a target sound output unit closest to the current location of the mobile terminal based on a straight-line distance between the current location of the mobile terminal and a location of each of the plurality of sound output units. The sound agent apparatus determines a target sound output unit closest to the current location of the mobile terminal based on a sound wave transmission distance between the current location of the mobile terminal and a location of each of the plurality of sound output units. The sound agent device character-string connects the initial matching code and the randomly generated matching random number to generate a first sound wave transmission matching code. The sound agent device sums the initial matching code and the randomly generated matching random number to generate a first acoustic transmission matching code. The sound agent device performs cyclic shift, bitwise operation or bitwise concatenation on the initial matching code based on a randomly generated matching random number to generate a first sound wave transmission matching code. The sound sample is used to indicate at least one of: the speech intelligibility of the user, the type of language the speech of the user is involved in, and the background noise level.

And when the definition of the voice of the user is greater than the minimum required definition threshold, the type of the language involved by the voice of the user can be automatically translated by the voice recognition server, and the background sound noise intensity is lower than the maximum allowable noise intensity, the voice agent equipment determines to allow the mobile terminal to output the voice data.

The mobile terminal determines a network delay based on a timestamp indicating the transmission time and a current time of the mobile terminal.

And if the network transmission delay is determined to be larger than or equal to the delay time threshold, the voice agent equipment sends an indication message for indicating that the mobile terminal cannot output voice data through multi-path data transmission to the mobile terminal.

In response to receiving an indication message that audio data output through multi-path data transmission is not possible, the mobile terminal performs audio data output through single-path data transmission.

And the mobile terminal performs character string connection on the initial matching code and the matching random number to generate a second sound wave transmission matching code. The mobile terminal sums the initial matching code and the matching random number to generate a second vocoding. And the mobile terminal performs cyclic shift, bitwise operation or bitwise splicing on the initial matching code based on the matching random number to generate a second sound wave transmission matching code. The target sound output unit stores the received at least one first acoustic transmission matching code. Each of the plurality of sound output units stores the received at least one first acoustic transmission match code. Each of the plurality of sound output units, upon receiving a second acoustic transmission match code, compares the second acoustic transmission match code with the stored at least one first acoustic transmission match code. In the multi-path data transmission mode, the mobile terminal transmits the voice data through two different transmission paths or at least two different transmission paths. Wherein the original sound data is an original sound data stream input by a user through a sound acquisition device of the mobile terminal, and the noise reduction-processed sound data is a noise reduction-processed sound data stream. And the mobile terminal transmits the voice data stream subjected to the noise reduction processing to the voice agent equipment in real time by using a first wireless network.

The mobile terminal transmits an original sound data stream to the target sound output unit in real time using a second wireless network, and transmits the second output instruction to the target sound output unit. Wherein the first output instruction comprises a volume value for indicating an output volume. Wherein the second output instruction comprises: a volume value indicating the output volume.

The method further comprises the steps that the sound agent equipment determines original frequency and original tone of the sound data subjected to the noise reduction processing, determines the frequency grade of the sound data subjected to the noise reduction processing based on the difference value of a preset reference frequency value and the frequency, determines a tone weighting factor based on the frequency grade, determines a frequency spectrum curve of the original tone and determines an initial tone score of the sound data subjected to the noise reduction processing according to the similarity of the frequency spectrum curve and a preset tone standard line, and determines the tone grade of the sound data subjected to the noise reduction processing based on the tone weighting factor and the initial tone score; determining an original volume of the noise-reduced sound data, determining a volume weighting factor based on the tone level, and determining an output volume based on the volume weighting factor and the original volume. The preset reference frequency value is 100Hz, 120Hz or 150 Hz.

The sound agent apparatus determines a difference value of the original frequency (e.g., 60-300Hz) minus a reference frequency value, calculates an integer in a result of dividing the difference value by an interval value (e.g., 10), determines the integer as a frequency level of the noise reduction-processed sound data, and determines an absolute value of the frequency level as a tone weighting factor.

The initial timbre scores comprise: 25 min, 26 min, 27 min, 28 min and 29 min.

The determining the tone scale of the noise reduction-processed sound data based on the tone weighting factor and the initial tone score comprises: and taking the difference obtained by subtracting the tone weighting factor from the initial tone score as the tone level of the sound data subjected to the noise reduction processing, and setting the tone level.

Determining a volume weighting factor based on the timbre levels comprises: and determining a percentage numerical value corresponding to a numerical value obtained by dividing the tone grade by 100 as a tone weighting factor.

The method further comprises the steps that the sound agent equipment determines original frequency and original tone of the sound data subjected to the noise reduction processing, determines the frequency grade of the sound data subjected to the noise reduction processing based on the difference value of a preset reference frequency value and the frequency, determines a tone weighting factor based on the frequency grade, determines a frequency spectrum curve of the original tone and determines an initial tone score of the sound data subjected to the noise reduction processing according to the similarity of the frequency spectrum curve and a preset tone standard line, and determines the tone grade of the sound data subjected to the noise reduction processing based on the tone weighting factor and the initial tone score; determining an original volume of the noise-reduced sound data, determining a volume weighting factor based on the tone level, and determining an output volume based on the volume weighting factor and the original volume.

The preset reference frequency value is 100Hz, 120Hz or 150 Hz.

The initial timbre scores comprise: 25 min, 26 min, 27 min, 28 min and 29 min.

The preset reference frequency value is 100Hz, 120Hz or 150 Hz.

The initial timbre scores comprise: 25 min, 26 min, 27 min, 28 min and 29 min.

Each of the other sound output units performs sound output based on the first output instruction transmitted by the sound agent device and the sound data subjected to the noise reduction processing, and includes: each of the other sound output units determines a linear distance to the mobile terminal based on the current position of the mobile terminal in the first output instruction transmitted by the sound agent device, determines a percentage value corresponding to a value obtained by dividing the linear distance by 1000 as a distance weighting factor, calculates an actual sound volume value based on the distance weighting factor and the sound volume value of the first output instruction, and performs sound output according to the actual sound volume value and the sound data subjected to noise reduction processing;

where the actual volume value is the volume value x (1+ distance weighting factor).

The preset reference frequency value is 100Hz, 120Hz or 150 Hz.

The initial timbre scores comprise: 25 min, 26 min, 27 min, 28 min and 29 min.

Further comprising the mobile terminal causing a volume value indicating an output volume and the network transmission delay to be included in the second output instruction.

When the sound agent device detects at least two sound output units closest to the current position of the mobile terminal from among the sound output units based on the current position of the mobile terminal in the sound data output request, determining a target sound output unit from the sound output unit closest to the sound agent device from among the at least two sound output units.

Drawings

A more complete understanding of exemplary embodiments of the present invention may be had by reference to the following drawings in which:

fig. 1a, 1b and 1c are schematic structural diagrams of a system for sound data transmission based on multiple transmission paths according to an embodiment of the present invention;

fig. 2 is a flowchart of a method for transmitting voice data based on multiple transmission paths according to an embodiment of the present invention;

FIG. 3 is a flow chart of a method of determining output volume according to an embodiment of the present invention;

fig. 4 is a flowchart of a method of determining an output volume according to another embodiment of the present invention;

fig. 5 is a flowchart of a method of delaying output of sound data according to an embodiment of the present invention; and

fig. 6 is a flowchart of a method of determining a target sound output unit according to an embodiment of the present invention.

Detailed Description

The exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, however, the present invention may be embodied in many different forms and is not limited to the embodiments described herein, which are provided for complete and complete disclosure of the present invention and to fully convey the scope of the present invention to those skilled in the art. The terminology used in the exemplary embodiments illustrated in the accompanying drawings is not intended to be limiting of the invention. In the drawings, the same units/elements are denoted by the same reference numerals.

Unless otherwise defined, terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Further, it will be understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense.

Fig. 1a, 1b and 1c are schematic structural diagrams of a system 100 for transmitting sound data based on multiple transmission paths according to an embodiment of the present invention. The system 100 includes: a sound agent apparatus 101, sound output units 102-1, 102-2, 102-3, 102-4, 102-5, 102-6, 102-7, 102-8, 102-9, 102-10, 102-11, 102-12, and a mobile terminal 103. As shown in fig. 1a, a plurality of sound output units 102-1, 102-2, 102-3, 102-4, 102-5, 102-6, 102-7, 102-8, 102-9, 102-10, 102-11, and 102-12 are arranged within the venue, and each sound output unit is for sound output based on received sound data (e.g., a stream of sound data). In general, the number of sound output units may be determined according to the area of a meeting place and the position of each sound output unit may be determined according to the pattern of the meeting place. Preferably, each sound output unit is capable of communicating with the sound agent apparatus 101 through a wired network, and is capable of communicating with a user equipment (e.g., the mobile terminal 103) through various types of wireless networks. Each sound output unit has the capability of processing sound data or a stream of sound data and of outputting sound in accordance with the received output instruction.

As shown in fig. 1b, mobile terminal 103 may be located at any suitable location in the audience area (i.e., a location where a user of mobile terminal 103 may stand or sit) within the venue. It should be understood that there may be multiple mobile terminals in a venue. For clarity, the present application will be described with reference to the mobile terminal 103 as an example. As shown in fig. 1b, mobile terminal 103 is closest to output unit 102-6 and more distant to other output units to a different extent.

As shown in fig. 1c, the mobile terminal 103 is closest and equal to the output units 102-2, 102-3 and 102-6, and more or less distant from the other output units. It should be understood that the present application is described by taking the example that the distance from the mobile terminal 103 to the three output units is equal. In practice, the number of output units equidistant from mobile terminal 103 may be any reasonable number.

The mobile terminal 103 initiates a sound output request to the sound agent apparatus 101. After receiving the acoustic transmission matching code from the sound output units 102-2, 102-3, and 102-6 through acoustic transmission and receiving a response message from the sound agent apparatus 101 that permits sound output, the mobile terminal 103 calculates a network delay and transmits the network delay to the sound agent apparatus 101. In response to the instruction to enter the feedback suppression mode transmitted by the sound agent apparatus 101, the mobile terminal 103 enters the multipath transmission mode based on the multipath transmission protocol. The multi-path transmission mode is a mode in which the mobile terminal transmits audio data or an audio data stream or the like through at least two paths to output audio. Mobile terminal 103 transmits the voice data input by the user to voice proxy apparatus 101 using a first network connection, and mobile terminal 103 simultaneously transmits the voice data input by the user to a voice output unit (e.g., output unit 102-6 in fig. 1b or one of output units 102-2, 102-3, and 102-6 in fig. 1 c) closest to the mobile terminal location among the plurality of voice output units using a second network connection.

The sound agent apparatus 101 extracts a sound sample, a mobile terminal position, and a sound transmission matching code from a sound output request received by the mobile terminal 103, and determines whether or not to allow the user to make a sound output based on the sound sample, and transmits a sound transmission matching code to the mobile terminal 103 by sound transmission indicating a sound output unit closest to the mobile terminal position regardless of a result of the determination as to whether or not to allow the user using the mobile terminal 103 to make a sound output.

If the sound agent apparatus 101 determines from the sound samples that the user is allowed to make sound output, it sends a response message to the mobile terminal 103 and receives a network delay from the mobile terminal 103, and if the network delay is greater than a feedback threshold, the sound agent apparatus 101 enters a feedback suppression mode for sound output. In the feedback suppression mode, the sound agent apparatus 101 instructs the mobile terminal 103 to enter a multipath transmission mode based on a multipath transmission protocol.

The sound output units other than the sound output unit closest to the mobile terminal position among the sound output units 102-1, 102-2, 102-3, 102-4, 102-5, 102-6, 102-7, 102-8, 102-9, 102-10, 102-11, and 102-12 perform sound output based on the output instruction and the sound data transmitted by the sound agent apparatus 101. The sound output unit closest to the mobile terminal performs sound output based on the output instruction and the sound data transmitted from the mobile terminal 103.

According to an embodiment of the present invention, the system 100 performs sound data output based on multi-path data transmission and includes: a sound agent apparatus 101, sound output units 102-1, 102-2, 102-3, 102-4, 102-5, 102-6, 102-7, 102-8, 102-9, 102-10, 102-11, 102-12, and a mobile terminal 103. The mobile terminal 103 transmits a voice data output request input by the user to the voice agent apparatus 101 through a first wireless network (e.g., a wide area wireless communication network, 3G, 4G, or 5G, etc.). The voice data output request includes a voice sample, the current location of the mobile terminal 103, and an initial matching code. Wherein a user may obtain a sample of the sound input or entered by the user through a sound obtaining means (e.g., a microphone) of the mobile terminal 103. The sound sample may be a piece of speech input by the user in the live environment where the speech output is made, a piece of speech for expressing a summary of the user's opinion, a piece of speech for introducing the user's identity, etc. The sound sample is an important criterion for determining whether to permit the user to make sound data output.

The present application utilizes a positioning device of the mobile terminal 103 to obtain the current position of the mobile terminal 103. Specifically, the positioning device calibrates the satellite positioning data according to indoor assisted positioning, outdoor assisted positioning, and/or access point assisted positioning to obtain the current position of the mobile terminal 103. Generally, the location information of the user may be acquired through a GPS chip or a beidou chip of the mobile terminal 103. The present application may then calibrate the location information based on outdoor assisted positioning of the communication network (e.g., the venue is an outdoor venue), indoor assisted positioning (e.g., the venue is an indoor venue), and/or access point assisted positioning (e.g., access point devices within the venue having a wireless network).

The initial matching code is generated based on the MAC address of the mobile terminal 103, or based on the hardware address of the mobile terminal 103. For example, the present application may determine all or part of the content (character string) of the MAC address of the mobile terminal 103 as the initial matching code, or determine all or part of the content (character string) of the hardware address of the mobile terminal 103 as the initial matching code.

The sound agent apparatus 101 receives the sound data output request from the mobile terminal 103, and determines whether to allow sound data output by the mobile terminal 103 according to a sound sample in the sound data output request. The sound agent apparatus 101 performs speech recognition on the sound sample to generate text information, and determines to allow sound data output by the mobile terminal 103 when the text information conforms to the expression habit of the corresponding language (e.g., the expressed meaning conforms to the language basic content requirement). The sound agent apparatus 101 determines whether the text information conforms to the expression habit of the corresponding language based on semantic recognition.

The sound agent apparatus 101 divides the text information into at least one sentence unit according to the sentence break symbol of the text information, performs independent semantic analysis on each sentence unit to determine a semantic score, and determines that the text information conforms to the expression habit of the corresponding language when the weighted sum of the semantic scores of each sentence unit is greater than an expression threshold. The number of characters in a sentence unit is determined as a weight of the sentence unit. For example, the textual information includes sentence units a and B. Sentence unit a includes 5 chinese characters and sentence unit B includes 10 chinese characters. The semantic score of sentence unit a is 9 points (full 10 points, lowest 0 point), and the semantic score of sentence unit B is 8 points. The weight of sentence unit a is 5/(5+10) ═ 1/3, and the weight of sentence unit B is 10/(5+10) ═ 2/3. The semantic score of the text information is a weighted sum of the semantic scores of sentence units a and B, i.e., 9 × 8 (1/3) +8 × 2/3) ═ 8.33. The expression threshold (greater than 0 and less than or equal to 10) may be, for example, any reasonable value such as 7, 7.5, 8, or 8.5.

The sound sample is used to indicate at least one of: the speech intelligibility of the user, the type of language the speech of the user is involved in, and the background noise level. When the definition of the user's voice is larger than the minimum required definition threshold, the type of the language involved in the user's voice can be automatically translated by the voice recognition server, and the background noise intensity is lower than the maximum allowable noise intensity, the voice agent apparatus 101 determines to allow the mobile terminal 103 to perform voice data output

If it is determined that the mobile terminal 103 is allowed to perform the voice data output, the voice agent apparatus 101 transmits an output permission message to the mobile terminal 103 through the first wireless network. Further, the sound agent apparatus 101 detects a target sound output unit (e.g., one of the output units 102-6 in fig. 1b or the output units 102-2, 102-3, and 102-6 in fig. 1 c) closest to the current position of the mobile terminal 103 among a plurality of sound output units based on the current position of the mobile terminal 103 in the sound data output request. The sound agent apparatus 101 acquires and stores the position of each of the plurality of sound output units in advance. The sound agent apparatus 101 determines a target sound output unit closest to the current position of the mobile terminal 103 based on a straight-line distance between the current position of the mobile terminal 103 and the position of each of the plurality of sound output units. Alternatively, the sound agent apparatus 101 determines a target sound output unit closest to the current position of the mobile terminal 103 based on a sound wave transmission distance between the current position of the mobile terminal 103 and the position of each of the plurality of sound output units. In this case, when there is an obstacle (e.g., a pillar) between the specific sound output unit and the mobile terminal 103, the shortest distance at which the sound wave is transmitted is taken as the distance between the specific sound output unit and the mobile terminal 103.

When the sound agent apparatus 101 determines at least two sound output units closest to the current position of the mobile terminal 103 among the plurality of sound output units based on the current position of the mobile terminal 103 in the sound data output request, one sound output unit is randomly selected from the at least two sound output units as a target sound output unit. When the sound agent apparatus 101 detects at least two sound output units closest to the current position of the mobile terminal 103 among a plurality of sound output units based on the current position of the mobile terminal 103 in the sound data output request, transmits description information of the at least two sound output units to the mobile terminal 103 and determines a target sound output unit from among the at least two sound output units in response to a selection message of a user. When the sound agent apparatus 101 detects at least two sound output units closest to the current position of the mobile terminal 103 among the plurality of sound output units based on the current position of the mobile terminal 103 in the sound data output request, the sound output unit farthest from the sound agent apparatus 101 among the at least two sound output units is determined as a target sound output unit. When the sound agent apparatus 101 detects at least two sound output units closest to the current position of the mobile terminal 103 among a plurality of sound output units based on the current position of the mobile terminal 103 in the sound data output request, the sound output unit closest to the sound agent apparatus 101 among the at least two sound output units is determined as a target sound output unit.

The sound agent apparatus 101 determines a first sound wave transmission matching code based on the initial matching code in the sound data output request and the matching random number generated at random, and transmits the first sound wave transmission matching code to the target sound output unit. The target sound output unit stores the received at least one first acoustic transmission matching code. Since there are a plurality of mobile terminals in the venue, there may be a plurality of first acoustic transmission match codes for this purpose and each of the plurality of sound output units may receive at least one first acoustic transmission match code. To this end, each of the plurality of sound output units stores the received at least one first acoustic transmission matching code.

The sound agent apparatus 101 character-string-concatenates the initial matching code and the matching random number generated at random to generate a first sound wave transmission matching code. For example, if the initial match code is 406188963D56 and matches the random number 25, then the first insonification match code is 406188963D 5625. Alternatively, the sound agent apparatus 101 adds the initial matching code and a matching random number generated at random to generate a first sound wave transmission matching code. For example, if the initial match code is 406188963D56 and matches the random number 25, then the first acoustic transmission match code is 406188963D 81. Alternatively, the sound agent apparatus 101 performs cyclic shift, bitwise operation, or bitwise concatenation on the initial matching code based on a matching random number generated randomly to generate the first acoustic transmission matching code. For example, if the initial match code is 100110101101 and matches the random number 2, then the first insonified match code may be the initial match code circularly right shifted by 2 two bits, i.e., 011001101011. For example, if the initial match code is 100110101101 and the matching random number is 100110011001, then the first sonic propagation match code may be a bitwise or operation, i.e., 100110111101. For example, the initial match code is 1101 and the matching random number is 0010, then the first acoustically transmitted match code may be spliced alternately by bit, i.e., 10100110, where the 1 st, 3 rd, 5 th, and 7 th bits of the first acoustically transmitted match code are from the initial match code and the 2 nd, 4 th, 6 th, and 8 th bits are from the matching random number.

Wherein the mobile terminal 103 receives the grant output message and determines a network transmission delay based on a timestamp in the grant output message. The mobile terminal 103 determines the network delay based on the timestamp indicating the transmission time and the current time of the mobile terminal 103. The mobile terminal 103 sends the network transmission delay to the sound agent apparatus 101. The voice agent apparatus 101 judges a network transmission delay received from the mobile terminal 103, and if it is determined that the network transmission delay is less than a delay time threshold, transmits an instruction message for instructing the mobile terminal 103 to perform voice data output through multi-path data transmission to the mobile terminal 103. If it is determined that the network transmission delay is greater than or equal to the delay time threshold, the voice agent apparatus 101 transmits an indication message to the mobile terminal 103 for indicating that the mobile terminal 103 cannot perform voice data output through multi-path data transmission. In response to receiving an indication message that audio data output is not possible through multi-path data transmission, the mobile terminal 103 performs audio data output through single-path data transmission. That is, the mobile terminal 103 communicates with the sound agent apparatus 101 only through the first wireless network, for example, transmits sound data or a sound data stream to the sound agent apparatus 101.

In response to receiving the indication message, mobile terminal 103 determines a second acoustic transmission match code from the initial match code and the matching random number in the indication message and broadcasts the second acoustic transmission match code to the plurality of sound output units based on acoustic communications. The mobile terminal 103 performs character string connection of the initial matching code and the matching random number to generate a second sound wave transmission matching code. Alternatively, the mobile terminal 103 sums the initial matching code and the matching random number to generate the second vocoding. Or, the mobile terminal 103 performs cyclic shift, bitwise operation or bitwise concatenation on the initial matching code based on the matching random number to generate a second acoustic transmission matching code. The second acoustic transmission matching code is generated in the same manner as the first acoustic transmission matching code described above, and therefore, the description thereof is omitted.

And a plurality of sound output units that establish a wireless communication connection with the mobile terminal 103 through a second wireless network when a target sound output unit of the plurality of sound output units determines that the received second acoustic transmission matching code is identical to the first acoustic transmission matching code. Further, each of the plurality of sound output units, upon receiving a second acoustic transmission match code, compares the second acoustic transmission match code with the stored at least one first acoustic transmission match code.

Wherein the mobile terminal 103 enters a multi-path data transmission mode in response to the establishment of the wireless communication connection. The multi-path transmission mode is a mode in which the mobile terminal transmits audio data or an audio data stream or the like through at least two paths to output audio. That is, in the multipath data transmission mode, the mobile terminal 103 performs transmission of voice data through two different transmission paths or at least two different transmission paths. Original sound data input by a user is acquired by a sound acquisition device of the mobile terminal 103, and is subjected to noise reduction processing to generate noise-reduced sound data. Wherein the original sound data is a stream of original sound data input by a user through a sound acquisition device of the mobile terminal 103, and the noise reduction-processed sound data is a stream of noise reduction-processed sound data.

The noise reduction-processed sound data is transmitted to the sound agent apparatus 101 using a first wireless network, the sound agent apparatus 101 transmits the noise reduction-processed sound data and the first output instruction to other sound output units than the target sound output unit among the plurality of sound output units through a wired network, and at the same time, the mobile terminal 103 transmits the original sound data and the second output instruction to the target sound output unit using a second wireless network. Wherein the mobile terminal 103 transmits the noise-reduced voice data stream to the voice agent apparatus 101 in real time using the first wireless network. The mobile terminal 103 transmits the original sound data stream to the target sound output unit in real time using the second wireless network, and transmits the second output instruction to the target sound output unit. Wherein the first output instruction comprises a volume value for indicating an output volume. Wherein the second output instruction comprises: a volume value indicating the output volume.

Each of the other sound output units performs sound output based on the first output instruction transmitted by the sound agent apparatus 101 and the sound data subjected to the noise reduction processing, and the target sound output unit performs sound output based on the second output instruction transmitted by the mobile terminal 103 and the original sound data.

Preferably, the sound agent apparatus 101 determines an original frequency and an original tone of the noise reduction-processed sound data, determines a frequency level of the noise reduction-processed sound data based on a difference between a preset reference frequency value and the frequency, determines a tone weighting factor based on the frequency level, determines a spectral curve of the original tone and an initial tone score of the noise reduction-processed sound data according to a similarity of the spectral curve and a preset tone standard line, and determines a tone level of the noise reduction-processed sound data based on the tone weighting factor and the initial tone score; determining an original volume of the noise-reduced sound data, determining a volume weighting factor based on the tone level, and determining an output sound based on the volume weighting factor and the original volume. Wherein the preset reference frequency value is 100Hz, 120Hz or 150 Hz.

The sound agent apparatus 101 determines a difference value of the original frequency (e.g., 60-300Hz) minus a reference frequency value, calculates an integer in a result of dividing the difference value by an interval value (e.g., 10), determines the integer as a frequency level of the noise reduction-processed sound data, and determines an absolute value of the frequency level as a tone weighting factor. Wherein the initial tone color score comprises: 25 min, 26 min, 27 min, 28 min and 29 min. Determining the tone scale of the noise reduction-processed sound data based on the tone weighting factor and the initial tone score comprises: and taking the difference obtained by subtracting the tone weighting factor from the initial tone score as the tone level of the sound data subjected to the noise reduction processing, and setting the tone level. Wherein determining a volume weighting factor based on the timbre levels comprises: determining a percentage value corresponding to a value obtained by dividing the tone scale by 100 as a tone weighting factor, wherein determining an output volume based on the volume weighting factor and an original volume comprises: output volume is original volume x (volume weighting factor + 1).

The first output instruction includes a volume value indicating the output volume, and the second output instruction includes a volume value indicating an original volume. Each of the other sound output units performs sound output based on the first output instruction transmitted from the sound agent apparatus 101 and the sound data subjected to the noise reduction processing includes: each of the other sound output units performs sound output based on the volume value in the first output instruction transmitted by the sound agent apparatus 101 and the sound data subjected to the noise reduction processing. The sound output by the target sound output unit according to the second output instruction and the original sound data transmitted by the mobile terminal 103 includes: the target sound output unit performs sound output according to the volume value and the original sound data in the second output instruction transmitted by the mobile terminal 103.

The sound agent apparatus 101 determines an original frequency and an original tone of the noise reduction-processed sound data, determines a frequency level of the noise reduction-processed sound data based on a difference between a preset reference frequency value and the frequency, determines a tone weighting factor based on the frequency level, determines a spectral curve of the original tone and an initial tone score of the noise reduction-processed sound data according to a similarity of the spectral curve and a preset tone standard line, and determines a tone level of the noise reduction-processed sound data based on the tone weighting factor and the initial tone score; determining an original volume of the noise-reduced sound data, determining a volume weighting factor based on the tone level, and determining an output volume based on the volume weighting factor and the original volume. The preset reference frequency value is 100Hz, 120Hz or 150 Hz.

The sound agent apparatus 101 determines a difference value of the original frequency (e.g., 60-300Hz) minus a reference frequency value, calculates an integer in a result of dividing the difference value by an interval value (e.g., 10), determines the integer as a frequency level of the noise reduction-processed sound data, and determines an absolute value of the frequency level as a tone weighting factor. The initial timbre scores comprise: 25 min, 26 min, 27 min, 28 min and 29 min. Determining the tone scale of the noise reduction-processed sound data based on the tone weighting factor and the initial tone score comprises: and taking the difference obtained by subtracting the tone weighting factor from the initial tone score as the tone level of the sound data subjected to the noise reduction processing, and setting the tone level. Determining a volume weighting factor based on the timbre levels comprises: and determining a percentage numerical value corresponding to a numerical value obtained by dividing the tone grade by 100 as a tone weighting factor. The determining an output volume based on the volume weighting factor and a raw volume comprises: output volume is original volume x (1-volume weighting factor).

The sound agent apparatus 101 determines a difference value of the original frequency (e.g., 60-300Hz) minus a reference frequency value, calculates an integer in a result of dividing the difference value by an interval value (e.g., 10), determines the integer as a frequency level of the noise reduction-processed sound data, and determines an absolute value of the frequency level as a tone weighting factor. The initial timbre scores include: 25 min, 26 min, 27 min, 28 min and 29 min. Determining the tone scale of the noise reduction-processed sound data based on the tone weighting factor and the initial tone score comprises: and taking the difference obtained by subtracting the tone weighting factor from the initial tone score as the tone level of the sound data subjected to the noise reduction processing, and setting the tone level. Determining a volume weighting factor based on the timbre levels comprises: and determining a percentage numerical value corresponding to a numerical value obtained by dividing the tone grade by 100 as a tone weighting factor. Determining an output volume based on the volume weighting factor and the raw volume comprises: output volume is original volume x (1+ volume weighting factor).

The first output instruction includes: a volume value indicating the output volume and the current location of the mobile terminal 103, and the second output instruction comprises a volume value indicating the original volume. Each of the other sound output units performs sound output based on the first output instruction transmitted from the sound agent apparatus 101 and the sound data subjected to the noise reduction processing includes: each of the other sound output units determines a linear distance to the mobile terminal 103 based on the current position of the mobile terminal 103 in the first output instruction transmitted by the sound agent apparatus 101, determines a percentage value corresponding to a value obtained by dividing the linear distance by 1000 as a distance weighting factor, calculates an actual sound volume value based on the distance weighting factor and the sound volume value of the first output instruction, and performs sound output according to the actual sound volume value and the sound data subjected to noise reduction processing; where the actual volume value is the volume value x (1+ distance weighting factor). The sound output by the target sound output unit according to the second output instruction and the original sound data transmitted by the mobile terminal 103 includes: the target sound output unit performs sound output according to the volume value and the original sound data in the second output instruction transmitted by the mobile terminal 103.

The sound agent apparatus 101 determines a difference value of the original frequency (e.g., 60-300Hz) minus a reference frequency value, calculates an integer in a result of dividing the difference value by an interval value (e.g., 10), determines the integer as a frequency level of the noise reduction-processed sound data, and determines an absolute value of the frequency level as a timbre weighting factor. The initial timbre scores include: 25 min, 26 min, 27 min, 28 min and 29 min. Determining the tone scale of the noise reduction-processed sound data based on the tone weighting factor and the initial tone score comprises: and taking the difference obtained by subtracting the tone weighting factor from the initial tone score as the tone level of the sound data subjected to the noise reduction processing, and setting the tone level. Determining a volume weighting factor based on the timbre levels comprises: and determining a percentage numerical value corresponding to a numerical value obtained by dividing the tone grade by 100 as a tone weighting factor. Determining an output volume based on the volume weighting factor and the raw volume comprises: output volume is original volume x (1-volume weighting factor).

The first output instruction includes a volume value indicating the output volume, and the second output instruction includes a volume value indicating an original volume. Each of the other sound output units performs sound output based on the first output instruction transmitted from the sound agent apparatus 101 and the sound data subjected to the noise reduction processing, and includes: each of the other sound output units determines a linear distance to the mobile terminal 103 based on the current position of the mobile terminal 103 in the first output instruction transmitted by the sound agent apparatus 101, determines a percentage value corresponding to a value obtained by dividing the linear distance by 1000 as a distance weighting factor, calculates an actual sound volume value based on the distance weighting factor and the sound volume value of the first output instruction, and performs sound output according to the actual sound volume value and the sound data subjected to noise reduction processing; where the actual volume value is the volume value x (1+ distance weighting factor). The sound output by the target sound output unit according to the second output instruction and the original sound data transmitted by the mobile terminal 103 includes: the target sound output unit performs sound output according to the volume value and the original sound data in the second output instruction transmitted by the mobile terminal 103.

Preferably, the mobile terminal 103 causes a volume value indicating an output volume and the network transmission delay to be included in the second output instruction. The target sound output unit performing sound output according to the second output instruction and the original sound data transmitted by the mobile terminal 103 includes: the target sound output unit performs delayed sound output on the original sound data according to the volume value in the second output instruction transmitted by the mobile terminal 103 and with the time of network transmission delay, so that the target sound output unit and each of the other sound output units can keep time consistency when performing sound output.

Preferably, the evaluation of the aesthetic sense of timbre is mainly based on the spectral curve and is judged according to the italian best line, i.e. a straight line is connected from the intensity point of the fundamental tone to the 16 th overtone point (on the spectral curve), and the closer the spectral curve of which timbre is to the line, the more beautiful, pleasant and vivid the music is. Overtones also affect the characteristics of timbre, and some overtones and fundamental tones are completely harmonic; some overtones and fundamental tones are not completely harmonic relations; some overtones and fundamental tones are in dissonant relation; the harmonic overtones are full, so that the tone quality stability is strong; if the sound is not completely harmonious and the overtones are full, the tone is rich in expressive force; the sound is strange and difficult to hear (wolf sound) if the harmonic overtone is too much. Some stringed instruments are used by people who cannot play, and the wolf sound is easily found.

Fig. 2 is a flow chart of a method 200 for voice data transmission based on multiple transmission paths according to an embodiment of the present invention. The method 200 begins at step 201 where a user uses a mobile terminal to send a voice data output request to a voice proxy device over a first wireless network (e.g., a wide area wireless communication network, 3G, 4G, or 5G, etc.) at step 201, the voice data output request including a voice sample, a current location of the mobile terminal, and an initial match code. Wherein a user may obtain a sample of the sound input or entered by the user through a sound obtaining means (e.g., a microphone) of the mobile terminal. The sound sample may be a piece of speech input by the user in the live environment where the speech output is made, a piece of speech for expressing a summary of the user's opinion, a piece of speech for introducing the user's identity, etc. The sound sample is an important criterion for determining whether to permit the user to make sound data output.

The method and the device utilize a positioning device of the mobile terminal to obtain the current position of the mobile terminal. Specifically, the positioning device calibrates the satellite positioning data according to indoor auxiliary positioning, outdoor auxiliary positioning and/or access point auxiliary positioning to obtain the current position of the mobile terminal. Generally, the position information of the user can be acquired through a GPS chip or a beidou chip of the mobile terminal. The present application may then calibrate the location information based on outdoor assisted positioning of the communication network (e.g., the venue is an outdoor venue), indoor assisted positioning (e.g., the venue is an indoor venue), and/or access point assisted positioning (e.g., access point devices within the venue having a wireless network).

The initial matching code is generated based on the media access control MAC address of the mobile terminal, or the initial matching code is generated based on the hardware address of the mobile terminal. For example, the application may determine all or part of the content (character string) of the MAC address of the mobile terminal as the initial matching code, or determine all or part of the content (character string) of the hardware address of the mobile terminal as the initial matching code.

In step 202, the sound agent device receives the sound data output request from the mobile terminal, and determines whether to allow the mobile terminal to output sound data according to the sound sample in the sound data output request. The voice agent device carries out voice recognition on the voice sample to generate text information, and when the text information conforms to the expression habit of the corresponding language (for example, the expressed meaning conforms to the requirement of language basic content), the voice agent device determines to allow the mobile terminal to carry out voice data output. The sound agent apparatus determines whether the text information conforms to an expression habit of a corresponding language based on semantic recognition.

The voice agent device divides the character information into at least one sentence unit according to the sentence-breaking symbol of the character information, carries out independent semantic analysis on each sentence unit to determine a semantic score, and determines that the character information accords with the expression habit of the corresponding language when the weighted sum of the semantic scores of each sentence unit is larger than an expression threshold value. The number of characters in a sentence unit is determined as a weight of the sentence unit. For example, the textual information includes sentence units a and B. Sentence unit a includes 5 chinese characters and sentence unit B includes 10 chinese characters. The semantic score of sentence unit a is 9 points (full 10 points, lowest 0 point), and the semantic score of sentence unit B is 8 points. The weight of sentence unit a is 5/(5+10) ═ 1/3, and the weight of sentence unit B is 10/(5+10) ═ 2/3. The semantic score of the text information is a weighted sum of the semantic scores of sentence units a and B, i.e., 9 × 8 (1/3) +8 × 2/3) ═ 8.33. The expression threshold (greater than 0 and less than or equal to 10) may be, for example, any reasonable value such as 7, 7.5, 8, or 8.5.

The sound sample is used to indicate at least one of: the speech intelligibility of the user, the type of language the speech of the user is involved in, and the background noise level. And when the definition of the voice of the user is greater than the minimum required definition threshold, the type of the language involved by the voice of the user can be automatically translated by the voice recognition server, and the background sound noise intensity is lower than the maximum allowable noise intensity, the voice agent equipment determines to allow the mobile terminal to output the voice data. And if the mobile terminal is determined to be allowed to perform voice data output, sending an output permission message to the mobile terminal through the first wireless network.

In step 203, the sound agent apparatus detects a target sound output unit closest to the current position of the mobile terminal among a plurality of sound output units based on the current position of the mobile terminal in the sound data output request. Further, the sound agent apparatus detects a target sound output unit (e.g., one of the output units 102-6 in fig. 1b or the output units 102-2, 102-3, and 102-6 in fig. 1 c) closest to the current location of the mobile terminal among the plurality of sound output units based on the current location of the mobile terminal in the sound data output request. The sound agent apparatus acquires and stores a position of each of the plurality of sound output units in advance. The sound agent apparatus determines a target sound output unit closest to the current location of the mobile terminal based on a straight-line distance between the current location of the mobile terminal and a location of each of the plurality of sound output units. Alternatively, the sound agent apparatus determines a target sound output unit closest to the current location of the mobile terminal based on a sound wave transmission distance between the current location of the mobile terminal and a location of each of the plurality of sound output units. In this case, when there is an obstacle (e.g., a pillar) between the specific sound output unit and the mobile terminal, the shortest distance of sound wave transmission is taken as the distance between the specific sound output unit and the mobile terminal.

And determining a first sound wave transmission matching code based on the initial matching code in the sound data output request and the randomly generated matching random number, and sending the first sound wave transmission matching code to the target sound output unit. The target sound output unit stores the received at least one first acoustic transmission matching code. Since there are a plurality of mobile terminals in the venue, there may be a plurality of first acoustic transmission match codes for this purpose and each of the plurality of sound output units may receive at least one first acoustic transmission match code. To this end, each of the plurality of sound output units stores the received at least one first acoustic transmission matching code.

The sound agent device character-string concatenates the initial matching code and the randomly generated matching random number to generate a first sound wave transmission matching code. For example, if the initial match code is 406188963D56 and matches the random number 25, then the first insonification match code is 406188963D 5625. Alternatively, the sound agent device sums the initial matching code and a randomly generated matching random number to generate a first acoustic transmission matching code. For example, if the initial match code is 406188963D56 and matches the random number 25, then the first acoustic transmission match code is 406188963D 81. Alternatively, the sound agent device cyclically shifts, bitwise operates, or bitwise concatenates the initial matching codes based on a randomly generated matching random number to generate the first acoustic transmission matching code. For example, if the initial match code is 100110101101 and the matching random number is 2, then the first insonified match code may be the initial match code circularly right shifted by 2 bits, i.e., 011001101011. For example, if the initial match code is 100110101101 and the matching random number is 100110011001, then the first sonic propagation match code may be a bitwise or operation, i.e., 100110111101. For example, the initial match code is 1101 and the matching random number is 0010, then the first acoustically transmitted match code may be spliced alternately by bit, i.e., 10100110, where the 1 st, 3 rd, 5 th, and 7 th bits of the first acoustically transmitted match code are from the initial match code and the 2 nd, 4 th, 6 th, and 8 th bits are from the matching random number.

In step 204, the mobile terminal receives the grant output message and determines a network transmission delay based on a timestamp in the grant output message, the network transmission delay being sent to the voice proxy device. The mobile terminal determines a network delay based on a timestamp indicating the transmission time and a current time of the mobile terminal. And the mobile terminal sends the network transmission delay to the sound agent equipment.

In step 205, the voice agent apparatus determines a network transmission delay received from a mobile terminal, and if it is determined that the network transmission delay is less than a delay time threshold, transmits an instruction message for instructing the mobile terminal to output voice data through multi-path data transmission to the mobile terminal. The voice agent device judges the network transmission delay received from the mobile terminal, and if the network transmission delay is determined to be smaller than the delay time threshold, the voice agent device sends an indication message for indicating the mobile terminal to output voice data through multi-path data transmission to the mobile terminal. And if the network transmission delay is determined to be larger than or equal to the delay time threshold, the voice agent equipment sends an indication message for indicating that the mobile terminal cannot output voice data through multi-path data transmission to the mobile terminal. In response to receiving an indication message that audio data output through multi-path data transmission is not possible, the mobile terminal performs audio data output through single-path data transmission. That is, the mobile terminal communicates with the sound agent apparatus only through the first wireless network, for example, transmits sound data or a sound data stream to the sound agent apparatus.

In step 206, in response to receiving the indication message, the mobile terminal determines a second sound transmission matching code according to the initial matching code and the matching random number in the indication message and broadcasts the second sound transmission matching code to the plurality of sound output units based on sound wave communication. And the mobile terminal performs character string connection on the initial matching code and the matching random number to generate a second sound wave transmission matching code. Alternatively, the mobile terminal sums the initial matching code and the matching random number to generate a second vocoding. Or the mobile terminal performs cyclic shift, bitwise operation or bitwise concatenation on the initial matching code based on the matching random number to generate a second acoustic transmission matching code. The second acoustic transmission matching code is generated in the same manner as the first acoustic transmission matching code described above, and therefore, the description thereof is omitted.

In step 207, when the target sound output unit of the plurality of sound output units determines that the received second sound transmission matching code is the same as the first sound transmission matching code, a wireless communication connection is established with the mobile terminal through a second wireless network. Further, each of the plurality of sound output units, upon receiving a second acoustic transmission match code, compares the second acoustic transmission match code with the stored at least one first acoustic transmission match code.

In step 208, the mobile terminal enters a multi-path data transmission mode in response to the establishment of the wireless communication connection. The multi-path transmission mode is a mode in which the mobile terminal transmits audio data or an audio data stream or the like through at least two paths to output audio. That is, in the multipath data transmission mode, the mobile terminal 103 performs transmission of voice data through two different transmission paths or at least two different transmission paths. Original sound data input by a user are acquired through a sound acquisition device of the mobile terminal, and the original sound data are subjected to noise reduction processing to generate noise-reduced sound data. Wherein the original sound data is an original sound data stream input by a user through a sound acquisition device of the mobile terminal, and the noise reduction-processed sound data is a noise reduction-processed sound data stream.

The noise reduction-processed sound data is transmitted to a sound agent device using a first wireless network, the sound agent device transmits the noise reduction-processed sound data and a first output instruction to other sound output units except for the target sound output unit among a plurality of sound output units through a wired network, and at the same time, the mobile terminal transmits original sound data and a second output instruction to the target sound output unit using a second wireless network. And the mobile terminal transmits the voice data stream subjected to the noise reduction processing to the voice agent equipment in real time by using the first wireless network. The mobile terminal transmits an original sound data stream to the target sound output unit in real time using a second wireless network, and transmits the second output instruction to the target sound output unit. Wherein the first output instruction comprises a volume value for indicating an output volume. Wherein the second output instruction comprises: a volume value indicating the output volume.

In step 209, each of the other sound output units performs sound output based on the first output instruction transmitted by the sound agent device and the sound data subjected to the noise reduction processing, and the target sound output unit performs sound output based on the second output instruction transmitted by the mobile terminal and the original sound data.

Fig. 3 is a flow chart of a method 300 of determining output volume according to an embodiment of the present invention. The method 300 begins at step 301.

In step 301, the original frequency and the original timbre of the noise reduction processed sound data are determined. In step 302, based on the difference between the preset reference frequency value and the frequency. Determining a frequency level of the noise reduction-processed sound data, and determining a timbre weighting factor based on the frequency level. In step 303, a spectrum curve of the original tone is determined and an initial tone score of the noise reduction processed sound data is determined according to a similarity between the spectrum curve and a preset tone standard line, and a tone grade of the noise reduction processed sound data is determined based on the tone weighting factor and the initial tone score. At step 304, an original volume of the noise-reduced sound data is determined, a volume weighting factor is determined based on the tone level, and an output volume is determined based on the volume weighting factor and the original volume.

Fig. 4 is a flow chart of a method 400 of determining an output volume according to another embodiment of the present invention. The method 400 begins at step 401. In step 401, each of the other sound output units determines a straight-line distance to the mobile terminal based on the current position of the mobile terminal in the first output instruction transmitted by the sound agent apparatus. In step 402, a percentage value corresponding to a value obtained by dividing the straight-line distance by 1000 is determined as a distance weighting factor. In step 403, an actual volume value is calculated based on the distance weighting factor and the volume value of the first output instruction. In step 404, sound output is performed based on the actual volume value and the sound data subjected to the noise reduction processing.

Fig. 5 is a flowchart of a method 500 of delaying output of sound data according to an embodiment of the present invention. The method 500 begins at step 501. In step 501, a volume value indicating an output volume and the network transmission delay are included in the second output instruction. In step 502, the mobile terminal transmits the original sound data and the second output instruction to the target sound output unit using the second wireless network. In step 503, the target sound output unit performs delayed sound output on the original sound data according to the volume value in the second output instruction transmitted by the mobile terminal and with the time of network transmission delay, so that the target sound output unit and each of the other sound output units can keep time consistent when performing sound output.

Fig. 6 is a flow chart of a method 600 of determining a target sound output unit according to an embodiment of the present invention. The method 600 begins at step 601. In step 601, when the sound agent apparatus detects at least two sound output units closest to the current position of the mobile terminal among a plurality of sound output units based on the current position of the mobile terminal in the sound data output request, the sound agent apparatus transmits description information of the at least two sound output units to the mobile terminal. In step 602, a target sound output unit is determined from at least two sound output units in response to a selection message of a user.

With respect to the method 300-600, in summary: the sound agent device determines an original frequency and an original tone of the noise reduction processed sound data, determines a frequency grade of the noise reduction processed sound data based on a difference value between a preset reference frequency value and the frequency, determines a tone weighting factor based on the frequency grade, determines a frequency spectrum curve of the original tone and an initial tone score of the noise reduction processed sound data according to a similarity of the frequency spectrum curve and a preset tone standard line, and determines a tone grade of the noise reduction processed sound data based on the tone weighting factor and the initial tone score; determining an original volume of the noise-reduced sound data, determining a volume weighting factor based on the tone level, and determining an output sound based on the volume weighting factor and the original volume. Wherein the preset reference frequency value is 100Hz, 120Hz or 150 Hz.

The sound agent apparatus determines a difference value of the original frequency (e.g., 60-300Hz) minus a reference frequency value, calculates an integer in a result of dividing the difference value by an interval value (e.g., 10), determines the integer as a frequency level of the noise reduction-processed sound data, and determines an absolute value of the frequency level as a tone weighting factor. Wherein the initial tone color score comprises: 25 min, 26 min, 27 min, 28 min and 29 min. Determining the tone scale of the noise reduction-processed sound data based on the tone weighting factor and the initial tone score comprises: and taking the difference obtained by subtracting the tone weighting factor from the initial tone score as the tone level of the sound data subjected to the noise reduction processing, and setting the tone level. Wherein determining a volume weighting factor based on the timbre levels comprises: determining a percentage value corresponding to a value obtained by dividing the tone scale by 100 as a tone weighting factor, wherein determining an output volume based on the volume weighting factor and an original volume comprises: output volume is original volume x (volume weighting factor + 1).

The first output instruction includes a volume value indicating the output volume, and the second output instruction includes a volume value indicating an original volume. Each of the other sound output units performs sound output based on the first output instruction transmitted from the sound agent apparatus and the sound data subjected to the noise reduction processing includes: each of the other sound output units performs sound output based on the volume value in the first output instruction transmitted by the sound agent apparatus and the sound data subjected to the noise reduction processing. The target sound output unit performs sound output according to the second output instruction and the original sound data transmitted by the mobile terminal, and comprises the following steps: and the target sound output unit carries out sound output according to the volume value in the second output instruction transmitted by the mobile terminal and the original sound data.

The sound agent device determines an original frequency and an original tone of the noise reduction processed sound data, determines a frequency grade of the noise reduction processed sound data based on a difference value between a preset reference frequency value and the frequency, determines a tone weighting factor based on the frequency grade, determines a frequency spectrum curve of the original tone and an initial tone score of the noise reduction processed sound data according to a similarity of the frequency spectrum curve and a preset tone standard line, and determines a tone grade of the noise reduction processed sound data based on the tone weighting factor and the initial tone score; determining an original volume of the noise-reduced sound data, determining a volume weighting factor based on the tone level, and determining an output volume based on the volume weighting factor and the original volume. The preset reference frequency value is 100Hz, 120Hz or 150 Hz.

The sound agent apparatus determines a difference value of the original frequency (e.g., 60-300Hz) minus a reference frequency value, calculates an integer in a result of dividing the difference value by an interval value (e.g., 10), determines the integer as a frequency level of the noise reduction-processed sound data, and determines an absolute value of the frequency level as a tone weighting factor. The initial timbre scores comprise: 25 min, 26 min, 27 min, 28 min and 29 min. Determining the tone scale of the noise reduction-processed sound data based on the tone weighting factor and the initial tone score comprises: and taking the difference obtained by subtracting the tone weighting factor from the initial tone score as the tone level of the sound data subjected to the noise reduction processing, and setting the tone level. Determining a volume weighting factor based on the timbre levels comprises: and determining a percentage numerical value corresponding to a numerical value obtained by dividing the tone grade by 100 as a tone weighting factor. The determining an output volume based on the volume weighting factor and a raw volume comprises: output volume is original volume x (1-volume weighting factor).

The sound agent apparatus determines a difference value of the original frequency (e.g., 60-300Hz) minus a reference frequency value, calculates an integer in a result of dividing the difference value by an interval value (e.g., 10), determines the integer as a frequency level of the noise reduction-processed sound data, and determines an absolute value of the frequency level as a tone weighting factor. The initial timbre scores include: 25 min, 26 min, 27 min, 28 min and 29 min. Determining the tone scale of the noise reduction-processed sound data based on the tone weighting factor and the initial tone score comprises: and taking the difference obtained by subtracting the tone weighting factor from the initial tone score as the tone level of the sound data subjected to the noise reduction processing, and setting the tone level. Determining a volume weighting factor based on the timbre levels comprises: and determining a percentage numerical value corresponding to a numerical value obtained by dividing the tone grade by 100 as a tone weighting factor. Determining an output volume based on the volume weighting factor and the raw volume comprises: output volume is original volume x (1+ volume weighting factor).

The first output instruction includes: a volume value indicating the output volume and a current location of the mobile terminal, and the second output instruction includes a volume value indicating an original volume. Each of the other sound output units performs sound output based on the first output instruction transmitted from the sound agent apparatus and the sound data subjected to the noise reduction processing includes: each of the other sound output units determines a linear distance to the mobile terminal based on the current position of the mobile terminal in the first output instruction transmitted by the sound agent device, determines a percentage value corresponding to a value obtained by dividing the linear distance by 1000 as a distance weighting factor, calculates an actual sound volume value based on the distance weighting factor and the sound volume value of the first output instruction, and performs sound output according to the actual sound volume value and the sound data subjected to noise reduction processing; where the actual volume value is the volume value x (1+ distance weighting factor). The target sound output unit performs sound output according to the second output instruction and the original sound data transmitted by the mobile terminal, and comprises the following steps: the target sound output unit performs sound output according to the volume value and the original sound data in the second output instruction transmitted by the mobile terminal 103.

The sound agent apparatus determines a difference value of the original frequency (e.g., 60-300Hz) minus a reference frequency value, calculates an integer in a result of dividing the difference value by an interval value (e.g., 10), determines the integer as a frequency level of the noise reduction-processed sound data, and determines an absolute value of the frequency level as a tone weighting factor. The initial timbre scores include: 25 min, 26 min, 27 min, 28 min and 29 min. Determining the tone scale of the noise reduction-processed sound data based on the tone weighting factor and the initial tone score comprises: and taking the difference obtained by subtracting the tone weighting factor from the initial tone score as the tone level of the sound data subjected to the noise reduction processing, and setting the tone level. Determining a volume weighting factor based on the timbre levels comprises: and determining a percentage numerical value corresponding to a numerical value obtained by dividing the tone grade by 100 as a tone weighting factor. Determining an output volume based on the volume weighting factor and the raw volume comprises: output volume is original volume x (1-volume weighting factor).

The first output instruction includes a volume value indicating the output volume, and the second output instruction includes a volume value indicating an original volume. Each of the other sound output units performs sound output based on the first output instruction transmitted by the sound agent device and the sound data subjected to the noise reduction processing, and includes: each of the other sound output units determines a linear distance to the mobile terminal based on the current position of the mobile terminal in the first output instruction transmitted by the sound agent device, determines a percentage value corresponding to a value obtained by dividing the linear distance by 1000 as a distance weighting factor, calculates an actual sound volume value based on the distance weighting factor and the sound volume value of the first output instruction, and performs sound output according to the actual sound volume value and the sound data subjected to noise reduction processing; where the actual volume value is the volume value x (1+ distance weighting factor). The target sound output unit performs sound output according to the second output instruction and the original sound data transmitted by the mobile terminal, and comprises the following steps: and the target sound output unit carries out sound output according to the volume value in the second output instruction transmitted by the mobile terminal and the original sound data.

Preferably, the mobile terminal causes a volume value indicating an output volume and the network transmission delay to be included in the second output instruction. The target sound output unit performing sound output according to the second output instruction and the original sound data transmitted by the mobile terminal includes: the target sound output unit performs delayed sound output on the original sound data according to the volume value in the second output instruction transmitted by the mobile terminal and with the time of network transmission delay, so that the target sound output unit and each of the other sound output units can keep time consistency when performing sound output.

In addition, the present application also includes a method for transmitting voice data based on multiple transmission paths, the method including:

if the user is allowed to output sound according to the sound sample, sending a response message to the mobile terminal and receiving network delay from the mobile terminal, and if the network delay is larger than a feedback threshold value, entering a feedback suppression mode to output sound; in the feedback suppression mode, the sound agent device instructs the mobile terminal to enter a multipath transmission mode based on a multipath transmission protocol, and sound output is performed based on an output instruction and sound data transmitted from the sound agent device using a sound output unit other than a sound output unit closest to the position of the mobile terminal among the plurality of sound output units, and sound output is performed based on the output instruction and sound data transmitted from the mobile terminal by the sound output unit closest to the position of the mobile terminal.

Claims

1. A method of sound data output based on multi-path data transmission, the method comprising:

2. The method of claim 1, further comprising, before the user sends the voice data output request to the voice agent apparatus through the first wireless network using the mobile terminal: and acquiring a sound sample input by a user through a sound acquisition device of the mobile terminal.

3. The method according to claim 1 or 2, further comprising, before the user sends the voice data output request to the voice agent apparatus through the first wireless network using the mobile terminal: and acquiring the current position of the mobile terminal by using a positioning device of the mobile terminal.

4. The method of any one of claims 1-2, further comprising, prior to the user using the mobile terminal to send a voice data output request to the voice agent device over the first wireless network: the initial matching code is generated based on a Media Access Control (MAC) address of the mobile terminal or based on a hardware address of the mobile terminal.

5. The method according to any of claims 1-2, wherein the determining whether to allow the mobile terminal to output voice data according to the voice sample in the voice data output request comprises: and performing voice recognition on the voice sample to generate text information, and determining to allow the mobile terminal to output voice data when the text information conforms to the expression habit of the corresponding language.

6. The method of any of claims 1-2, the sound proxy device pre-fetching and storing a location of each of the plurality of sound output units.

7. The method of claim 6, determining a target sound output unit closest to the current location of the mobile terminal based on a straight-line distance between the current location of the mobile terminal and the location of each of the plurality of sound output units.

8. The method of claim 1, the determining a first acoustic transmission match code based on an initial match code in the sound data output request and a randomly generated match random number comprising: and performing character string connection on the initial matching code and the randomly generated matching random number to generate a first sound wave transmission matching code.

9. The method of claim 1, the determining a network transmission delay based on a timestamp in the grant output message comprising: the mobile terminal determines the network delay based on the timestamp indicating the transmission time and the current time of the mobile terminal.

10. The method of claim 1, wherein the mobile terminal performs transmission of voice data via two different transmission paths or at least two different transmission paths in the multi-path data transmission mode.