CN104811318A

CN104811318A - Method for controlling voice communication through voice

Info

Publication number: CN104811318A
Application number: CN201510184232.1A
Authority: CN
Inventors: 姚昊萍; 丁兰英; 张静; 史丽萍
Original assignee: Nanjing Agricultural University
Current assignee: Nanjing Agricultural University
Priority date: 2015-04-15
Filing date: 2015-04-15
Publication date: 2015-07-29

Abstract

The invention relates to the communication field, in particular to an achievement method for controlling and coordinating mutual communication through respective voice characters during voice communication of many people. According to a method for controlling voice communication through voice, attention to communication voice contents are avoided, identification and judgment of communication partners are avoided; a speaker controls a range of communication members and listeners to timely answer and be kept in active states through the volume, voice characters of the speaker and the listeners are collected through a circuit, the voice character mainly comprises the volume of the speaker, the response time of answer and the response frequency; smooth remote voice communication is controlled and coordinated according to the characters and the user experience is increased.

Description

A kind of method of Sound control speech exchange

Art

The present invention relates to the communications field, utilize respective sound characteristic to control when relating more specifically to multi-person speech communication and the implementation method coordinating mutually to exchange.

Background technology

The mode that between remote two two users of current realization, voice communication exchanges is common telephone service, realizes the platform that multi-person speech exchanges communication and mainly contains conference telephone service in telephone system and intercom mode.

Telephone service needs promoter's dial-up connection, and the other side's off-hook is conversed.Conference telephone needs organizer first to apply for this function to telephone operator, reinforms each participant and puts through certain telephone number and add.The framework that conference telephone realizes as shown in Figure 1, is a kind of integrated system structure.A center processing unit all delivered in the voice of participant, sends to each participant again after carrying out superposing (also can by weighted superposition).Can not exchange between two again between participant.Intercom mode is a kind of broadcast mode of one-to-many, and the same time can only a user talk, and each speech all needs button switch communication mode.

Above-mentioned three kinds of speech exchange modes have clear and definite application scenario, in respective applied environment, have irreplaceable advantage.In some application scenarios, sometimes need between group member to exchange separately between two, sometimes need again all to exchange.For example tour in the field, often need to report separately oneself situation, other times exchange again separately, and communicatee does not also fix separately.If everybody walks together, distance is very near, and talking to exchanging does not have problems.If can't see mutually, use Wireless Telecom Equipment, adopt current existing scheme, or very inconvenient.Current mobile terminal, although intelligence degree is very high, nearly all application all depends on screen operator or button operation.Middle function screen or button are also very dangerous in action.Also some terminal have employed speech recognition technology and carrys out control operation, but also could not reach the effect freely exchanged.

Also have application scenes as between kinsfolk, everybody may not stay in same place, recent developments of caring for each other once in a while, generally first says hello each other, reports recent developments, then exchange separately to everybody.Such mode is than call about effect is much better one by one.Current QQ group or micro-letter group management function are only equivalent to text intercom mode, and it is convenient, warm that voice are still unable to catch up with in text interchange.

People are when closely talking interchange, and generally, the sound size of speaker is wanted to control interchange scope.Giving great volume is want to allow more people hear, volume is little is only want to allow around a few individual or a people hear.Hearer replys in time and represents interested in interchange content, ites is desirable to continue.If hearer does not reply or the frequency of response is very low, trying to engage sb. in small tald, represent and interchange content is lost interest in, just should exit this Communicator's scope at leisure.

Current mobile communication enters 4G, and wireless bandwidth is increasing, adds the performance that current mobile terminal is fabulous, realizes remote personalized freely exchanging and there is not technical problem.The feature that the present invention closely exchanges according to people, gathers the sound characteristic of talker and respondent, mainly comprises speaker's volume, the response time of response and the frequency of response with circuit.Utilize these features to control and coordinate remote voice communication and exchange, strengthening the experience sense of user.

Summary of the invention

In the present invention, the voice signal that people exchange is divided into response language and new topic two class.Response language refer to current speaker's voice signal terminate after other people voice signal that starts in a period of time of determining.New topic refers to other the communicating voice signal except response language.The judgement of response language needs to use speech sound signal terminal point detection technique to determine the starting point of voice signal.A terminal judging, response language is divided into following two kinds of situations:

1) the first situation as shown in Figure 2, and the user of present terminal starts the voice signal hearing that other users send after talking and terminating through the t time.If t < is T (threshold value of system definition), then this user is defined as response language to the voice signal that current end user sends.

2) the second situation as shown in Figure 3, and after the user of present terminal hears that voice signal that other users send terminates, the user through t ' time present terminal starts speech, sends voice signal to other users.If t ' < is T ' (threshold value of system definition), then current end user is defined as response language to the voice signal that other users send.

Because the first above-mentioned situation needs the impact considering voice signal propagation delay, T during default threshold value, generally can be allowed more slightly bigger than T '.

The present invention realizes the framework of employing as shown in Figure 4, is a kind of distributed system architecture, can directly two-way communication between each user.During multi-person speech communication, the overlap-add operation of multi-path voice signal realizes in each user this locality.As shown in Figure 5, when multi-path voice signal delivers to a user, direct superposition can be adopted.If current total L road voice signal is brought, if K ₁=K ₂...=K _l=1/L.Compare with traditional conference telephone system, the way of the upper superposing voice signal of each user and member corresponding to these voice signals not necessarily identical, the result after superposition exports to this user.

Each user in system has a living group table, and this group table defines the maximum magnitude of speech exchange member.The corresponding parameter Hi of each member in group's table except active user, represents the active degree that they are current.A Hi value counter represents, maximum is M, and this member of real-time counting sends the number of times of response language to current end user, i.e. the number of times of the response language of the first situation above-mentioned as shown in Figure 2.Once there be the parameter Hi value of a member to be greater than M/2 in group's table of user, in group's table, the Hi value of all members is divided by 2, and namely the counter of all members moves to right one, high-order benefit 0.Continue counting again.The member not being 0 Hi value in active user group table calls active member.Active member is pressed the sequence of Hi value size.

The range value of voice signal is not stable, alters a great deal, and starts in rear a period of time, the mean value of speech signal amplitude, that is: so volume A (n) of voice signal is defined as voice signal

A (n) = \frac{1}{N} Σ_{j = 0}^{N - 1} | x_{j} |

(formula one)

In order to eliminate some disturbance factors, can Continuous plus A (n) value always, select wherein suitable result to judge the volume of current speakers.Generally directly do not select maximum, and the Second Largest Value chosen wherein or the third-largest value are to judge the volume of talker.The volume maximum of each user varies with each individual, and it is max volume that system first gives tacit consent to P, and in application process, if the max volume of user is greater than P, then the temporary transient max volume with user substitutes P.Volume is divided into several grades by log law.When user talks by maximum volume, then voice signal can send to all members (except active user) in group, then selects the member maximum with current Hi value to converse by minimum shelves volume.Talker can control the range of transmission of voice signal in active member with the change of medium volume.A kind of situation of exception is had to need special processing as follows.

The current Hi value of this member when the voice signal of user belongs to the response language of the above-mentioned the second situation shown in Fig. 3, then will the member of current response object be put into before active member, temporarily be classified as the highest member of active degree, although may be 0.The situation that should be noted that is, if a more than member sends voice signal here to active user in the threshold range of time T ', active user should select wherein t ' to be worth minimum member as response object.

The present invention avoids the concern to communicating voice content dexterously, also avoids the identification to communicatee and judgement.Gather the sound characteristic of talker in speech exchange and respondent with circuit, mainly comprise speaker's volume, the response time of response and the frequency of response.Utilize these features to control and coordinate carrying out smoothly of remote speech communication exchanges.

Accompanying drawing illustrates:

The current conference telephone system structural representation of Fig. 1;

Fig. 2 judges the first situation schematic diagram of replying language;

Fig. 3 judges the second situation schematic diagram of replying language;

Fig. 4 the present invention realizes the configuration diagram adopted;

Fig. 5 multi-path voice Signal averaging schematic diagram;

Specific embodiment:

Below with a concrete process utilizing the sound characteristic of speech exchange person to control and to coordinate remote speech communication exchanges so that the method that the present invention introduces to be described.

System definition judges that the threshold value T of response language is 4 seconds, and T ' is 3 seconds.In system, each user has an identical group table.The corresponding counter of each member in group's table except active user, the maximum M of counter is 16, then M/2=8.This member of counter real-time counting sends the number of times of replying language to the user of present terminal, and once there be the Counter Value of a member to be greater than 8, in group's table, the counter of all members (except active user) moves to right one, high-order benefit 0.Continue counting again.The member not being 0 active user group table Counter value calls active member.Active member is pressed the sequence of Counter Value size.A special active member is: if the voice signal of active user's speech is response language, then will the member of current response object is put into before active member, is temporarily classified as the highest member of active degree.

System adopts 14 bit AD sample voice signals, and sample frequency is 8kHz, and first giving tacit consent to P=3862 is max volume.Volume is divided into following 3 grades: being more than or equal to 0.53P is large volume, be middle volume between 0.53P and 0.21P, and being less than or equal to 0.21P is small volume.Calculate an A (n) by formula one and select 64 sampled values, altogether 8ms.Before sending to voice signal from the voice signal starting point of user, (starting to select sending object) terminates Continuous plus 6 A (n) values, 6 A (n) is according to value sorted from big to small: A (1), A (2), A (3), A (4), A (5), A (6).Judge that the step of user's volume is as follows:

1) if A (2) >=0.53P, then judge that user adopts large volume to talk.If judge A (2) > P simultaneously, then another P=A (2), temporarily substitutes P with the second largest volume A (2) of user.

2) if 0.53P > A (2) > 0.21P, then volume speech in user's employing is judged.

3) if A (2)≤0.21P, then judge that user adopts small volume to talk.

When user adopts large volume to talk, voice signal can issue all members (except active user) in group.When during user adopts, volume is talked, voice signal can issue all active members current in group.When user adopts small volume to talk, voice signal only issues the member that in group, current active degree is the highest.

When multi-path voice signal delivers to user together, can directly superpose.Have 4 road voice signals to bring if current, then: Uo=0.25U ₁+ 0.25U ₂+ 0.25U ₃+ 0.25U ₄.Result after superposition exports to this user.

Can also button be adopted in specific embodiment of the invention, with the length of key press time or continuously compression number represent the volume of talker, so just do not need the volume judging talker's sound.Talker one side can control the range of transmission of voice signal by button; Listen a side of speech also can produce simple return signal with button equally and issue talker, by the statistics of the language that responses, reach same effect.

In sum, in existing technical foundation, adopt utilizing the sound characteristic of user to control and coordinating the mutual method exchanged of introduction of the present invention, remote personalized free voice communication can be realized, make the speech exchange of people convenient.

Claims

1. a method for Sound control speech exchange, is characterized in that: the sound characteristic gathering talker and respondent with circuit, comprises speaker's volume, the response time of response and the frequency of response; Utilize these features to control and carrying out smoothly of coordinating that remote multi-person speech communication exchanges.

2. utilize Communicator's sound characteristic as claimed in claim 1 to control and the method coordinated voice communication and exchange, it is characterized in that: the voice signal that people exchange is divided into response language and new topic two class; Response language refer to current speaker's voice signal terminate after other people voice signal that starts in a period of time of determining; New topic refers to other the communicating voice signal except response language.

3. utilize Communicator's sound characteristic as claimed in claim 1 to control and the method coordinated voice communication and exchange, it is characterized in that: when multi-person speech communication exchanges, the overlap-add operation of multi-path voice signal realizes in each user this locality.

4. utilize Communicator's sound characteristic as claimed in claim 1 to control and the method coordinated voice communication and exchange, it is characterized in that: each user in system has a living group table, the corresponding parameter Hi of each member in group's table except active user, represents the active degree that they are current; A Hi value counter represents, maximum is M, and this member of real-time counting sends the number of times of response language to active user; Once there be the parameter Hi value of a member to be greater than M/2 in group's table of user, in group's table, the Hi value of all members is divided by 2, and namely the counter of all members moves to right one, high-order benefit 0, then continues counting.

5. as claimed in claim 4 the active member in active user group table is pressed the method for Hi value size sequence, it is characterized in that: the member not being 0 Hi value in active user group table calls active member, and active member is pressed the sequence of Hi value size; Consider a special active member, if the voice signal of active user's speech is response language, then will the member of current response object is put into before active member, is temporarily classified as the highest member of active degree.

6. judge the method for the response object that active user talks as claimed in claim 5, it is characterized in that: if a more than member sends voice signal here to active user in the threshold range of time T ', active user should select wherein t ' to be worth minimum member as response object.

7. utilize the volume of talker's sound to participate in the method controlled as claimed in claim 1, it is characterized in that: when user talks by maximum volume, then voice signal can send to all members (except active user) in group, then select the member maximum with current Hi value to converse by minimum shelves volume, talker controls the range of transmission of voice signal in group telogenesis person with volume.

8. the determination methods of the volume of talker's sound as claimed in claim 7, it is characterized in that: a period of time calculates A (n) value continuously, and A (n) is according to value sorted from big to small, choose Second Largest Value wherein or the third-largest value to judge the volume of talker.

9. utilize Communicator's sound characteristic as claimed in claim 1 to control and the method coordinated voice communication and exchange, it is characterized in that: can also button be adopted in concrete enforcement, with the length of key press time or continuously compression number represent the volume of talker, such talker one side can control the range of transmission of voice signal by button; Listen a side of speech also can produce simple return signal with button equally and issue talker, by the statistics of the language that responses, reach same effect.