CN104811318A - Method for controlling voice communication through voice - Google Patents

Method for controlling voice communication through voice Download PDF

Info

Publication number
CN104811318A
CN104811318A CN201510184232.1A CN201510184232A CN104811318A CN 104811318 A CN104811318 A CN 104811318A CN 201510184232 A CN201510184232 A CN 201510184232A CN 104811318 A CN104811318 A CN 104811318A
Authority
CN
China
Prior art keywords
voice signal
volume
voice
user
active
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510184232.1A
Other languages
Chinese (zh)
Inventor
姚昊萍
丁兰英
张静
史丽萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Agricultural University
Original Assignee
Nanjing Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Agricultural University filed Critical Nanjing Agricultural University
Priority to CN201510184232.1A priority Critical patent/CN104811318A/en
Publication of CN104811318A publication Critical patent/CN104811318A/en
Pending legal-status Critical Current

Links

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The invention relates to the communication field, in particular to an achievement method for controlling and coordinating mutual communication through respective voice characters during voice communication of many people. According to a method for controlling voice communication through voice, attention to communication voice contents are avoided, identification and judgment of communication partners are avoided; a speaker controls a range of communication members and listeners to timely answer and be kept in active states through the volume, voice characters of the speaker and the listeners are collected through a circuit, the voice character mainly comprises the volume of the speaker, the response time of answer and the response frequency; smooth remote voice communication is controlled and coordinated according to the characters and the user experience is increased.

Description

A kind of method of Sound control speech exchange
Art
The present invention relates to the communications field, utilize respective sound characteristic to control when relating more specifically to multi-person speech communication and the implementation method coordinating mutually to exchange.
Background technology
The mode that between remote two two users of current realization, voice communication exchanges is common telephone service, realizes the platform that multi-person speech exchanges communication and mainly contains conference telephone service in telephone system and intercom mode.
Telephone service needs promoter's dial-up connection, and the other side's off-hook is conversed.Conference telephone needs organizer first to apply for this function to telephone operator, reinforms each participant and puts through certain telephone number and add.The framework that conference telephone realizes as shown in Figure 1, is a kind of integrated system structure.A center processing unit all delivered in the voice of participant, sends to each participant again after carrying out superposing (also can by weighted superposition).Can not exchange between two again between participant.Intercom mode is a kind of broadcast mode of one-to-many, and the same time can only a user talk, and each speech all needs button switch communication mode.
Above-mentioned three kinds of speech exchange modes have clear and definite application scenario, in respective applied environment, have irreplaceable advantage.In some application scenarios, sometimes need between group member to exchange separately between two, sometimes need again all to exchange.For example tour in the field, often need to report separately oneself situation, other times exchange again separately, and communicatee does not also fix separately.If everybody walks together, distance is very near, and talking to exchanging does not have problems.If can't see mutually, use Wireless Telecom Equipment, adopt current existing scheme, or very inconvenient.Current mobile terminal, although intelligence degree is very high, nearly all application all depends on screen operator or button operation.Middle function screen or button are also very dangerous in action.Also some terminal have employed speech recognition technology and carrys out control operation, but also could not reach the effect freely exchanged.
Also have application scenes as between kinsfolk, everybody may not stay in same place, recent developments of caring for each other once in a while, generally first says hello each other, reports recent developments, then exchange separately to everybody.Such mode is than call about effect is much better one by one.Current QQ group or micro-letter group management function are only equivalent to text intercom mode, and it is convenient, warm that voice are still unable to catch up with in text interchange.
People are when closely talking interchange, and generally, the sound size of speaker is wanted to control interchange scope.Giving great volume is want to allow more people hear, volume is little is only want to allow around a few individual or a people hear.Hearer replys in time and represents interested in interchange content, ites is desirable to continue.If hearer does not reply or the frequency of response is very low, trying to engage sb. in small tald, represent and interchange content is lost interest in, just should exit this Communicator's scope at leisure.
Current mobile communication enters 4G, and wireless bandwidth is increasing, adds the performance that current mobile terminal is fabulous, realizes remote personalized freely exchanging and there is not technical problem.The feature that the present invention closely exchanges according to people, gathers the sound characteristic of talker and respondent, mainly comprises speaker's volume, the response time of response and the frequency of response with circuit.Utilize these features to control and coordinate remote voice communication and exchange, strengthening the experience sense of user.
Summary of the invention
In the present invention, the voice signal that people exchange is divided into response language and new topic two class.Response language refer to current speaker's voice signal terminate after other people voice signal that starts in a period of time of determining.New topic refers to other the communicating voice signal except response language.The judgement of response language needs to use speech sound signal terminal point detection technique to determine the starting point of voice signal.A terminal judging, response language is divided into following two kinds of situations:
1) the first situation as shown in Figure 2, and the user of present terminal starts the voice signal hearing that other users send after talking and terminating through the t time.If t < is T (threshold value of system definition), then this user is defined as response language to the voice signal that current end user sends.
2) the second situation as shown in Figure 3, and after the user of present terminal hears that voice signal that other users send terminates, the user through t ' time present terminal starts speech, sends voice signal to other users.If t ' < is T ' (threshold value of system definition), then current end user is defined as response language to the voice signal that other users send.
Because the first above-mentioned situation needs the impact considering voice signal propagation delay, T during default threshold value, generally can be allowed more slightly bigger than T '.
The present invention realizes the framework of employing as shown in Figure 4, is a kind of distributed system architecture, can directly two-way communication between each user.During multi-person speech communication, the overlap-add operation of multi-path voice signal realizes in each user this locality.As shown in Figure 5, when multi-path voice signal delivers to a user, direct superposition can be adopted.If current total L road voice signal is brought, if K 1=K 2...=K l=1/L.Compare with traditional conference telephone system, the way of the upper superposing voice signal of each user and member corresponding to these voice signals not necessarily identical, the result after superposition exports to this user.
Each user in system has a living group table, and this group table defines the maximum magnitude of speech exchange member.The corresponding parameter Hi of each member in group's table except active user, represents the active degree that they are current.A Hi value counter represents, maximum is M, and this member of real-time counting sends the number of times of response language to current end user, i.e. the number of times of the response language of the first situation above-mentioned as shown in Figure 2.Once there be the parameter Hi value of a member to be greater than M/2 in group's table of user, in group's table, the Hi value of all members is divided by 2, and namely the counter of all members moves to right one, high-order benefit 0.Continue counting again.The member not being 0 Hi value in active user group table calls active member.Active member is pressed the sequence of Hi value size.
The range value of voice signal is not stable, alters a great deal, and starts in rear a period of time, the mean value of speech signal amplitude, that is: so volume A (n) of voice signal is defined as voice signal
A ( n ) = 1 N &Sigma; j = 0 N - 1 | x j | (formula one)
In order to eliminate some disturbance factors, can Continuous plus A (n) value always, select wherein suitable result to judge the volume of current speakers.Generally directly do not select maximum, and the Second Largest Value chosen wherein or the third-largest value are to judge the volume of talker.The volume maximum of each user varies with each individual, and it is max volume that system first gives tacit consent to P, and in application process, if the max volume of user is greater than P, then the temporary transient max volume with user substitutes P.Volume is divided into several grades by log law.When user talks by maximum volume, then voice signal can send to all members (except active user) in group, then selects the member maximum with current Hi value to converse by minimum shelves volume.Talker can control the range of transmission of voice signal in active member with the change of medium volume.A kind of situation of exception is had to need special processing as follows.
The current Hi value of this member when the voice signal of user belongs to the response language of the above-mentioned the second situation shown in Fig. 3, then will the member of current response object be put into before active member, temporarily be classified as the highest member of active degree, although may be 0.The situation that should be noted that is, if a more than member sends voice signal here to active user in the threshold range of time T ', active user should select wherein t ' to be worth minimum member as response object.
The present invention avoids the concern to communicating voice content dexterously, also avoids the identification to communicatee and judgement.Gather the sound characteristic of talker in speech exchange and respondent with circuit, mainly comprise speaker's volume, the response time of response and the frequency of response.Utilize these features to control and coordinate carrying out smoothly of remote speech communication exchanges.
Accompanying drawing illustrates:
The current conference telephone system structural representation of Fig. 1;
Fig. 2 judges the first situation schematic diagram of replying language;
Fig. 3 judges the second situation schematic diagram of replying language;
Fig. 4 the present invention realizes the configuration diagram adopted;
Fig. 5 multi-path voice Signal averaging schematic diagram;
Specific embodiment:
Below with a concrete process utilizing the sound characteristic of speech exchange person to control and to coordinate remote speech communication exchanges so that the method that the present invention introduces to be described.
System definition judges that the threshold value T of response language is 4 seconds, and T ' is 3 seconds.In system, each user has an identical group table.The corresponding counter of each member in group's table except active user, the maximum M of counter is 16, then M/2=8.This member of counter real-time counting sends the number of times of replying language to the user of present terminal, and once there be the Counter Value of a member to be greater than 8, in group's table, the counter of all members (except active user) moves to right one, high-order benefit 0.Continue counting again.The member not being 0 active user group table Counter value calls active member.Active member is pressed the sequence of Counter Value size.A special active member is: if the voice signal of active user's speech is response language, then will the member of current response object is put into before active member, is temporarily classified as the highest member of active degree.
System adopts 14 bit AD sample voice signals, and sample frequency is 8kHz, and first giving tacit consent to P=3862 is max volume.Volume is divided into following 3 grades: being more than or equal to 0.53P is large volume, be middle volume between 0.53P and 0.21P, and being less than or equal to 0.21P is small volume.Calculate an A (n) by formula one and select 64 sampled values, altogether 8ms.Before sending to voice signal from the voice signal starting point of user, (starting to select sending object) terminates Continuous plus 6 A (n) values, 6 A (n) is according to value sorted from big to small: A (1), A (2), A (3), A (4), A (5), A (6).Judge that the step of user's volume is as follows:
1) if A (2) >=0.53P, then judge that user adopts large volume to talk.If judge A (2) > P simultaneously, then another P=A (2), temporarily substitutes P with the second largest volume A (2) of user.
2) if 0.53P > A (2) > 0.21P, then volume speech in user's employing is judged.
3) if A (2)≤0.21P, then judge that user adopts small volume to talk.
When user adopts large volume to talk, voice signal can issue all members (except active user) in group.When during user adopts, volume is talked, voice signal can issue all active members current in group.When user adopts small volume to talk, voice signal only issues the member that in group, current active degree is the highest.
When multi-path voice signal delivers to user together, can directly superpose.Have 4 road voice signals to bring if current, then: Uo=0.25U 1+ 0.25U 2+ 0.25U 3+ 0.25U 4.Result after superposition exports to this user.
Can also button be adopted in specific embodiment of the invention, with the length of key press time or continuously compression number represent the volume of talker, so just do not need the volume judging talker's sound.Talker one side can control the range of transmission of voice signal by button; Listen a side of speech also can produce simple return signal with button equally and issue talker, by the statistics of the language that responses, reach same effect.
In sum, in existing technical foundation, adopt utilizing the sound characteristic of user to control and coordinating the mutual method exchanged of introduction of the present invention, remote personalized free voice communication can be realized, make the speech exchange of people convenient.

Claims (9)

1. a method for Sound control speech exchange, is characterized in that: the sound characteristic gathering talker and respondent with circuit, comprises speaker's volume, the response time of response and the frequency of response; Utilize these features to control and carrying out smoothly of coordinating that remote multi-person speech communication exchanges.
2. utilize Communicator's sound characteristic as claimed in claim 1 to control and the method coordinated voice communication and exchange, it is characterized in that: the voice signal that people exchange is divided into response language and new topic two class; Response language refer to current speaker's voice signal terminate after other people voice signal that starts in a period of time of determining; New topic refers to other the communicating voice signal except response language.
3. utilize Communicator's sound characteristic as claimed in claim 1 to control and the method coordinated voice communication and exchange, it is characterized in that: when multi-person speech communication exchanges, the overlap-add operation of multi-path voice signal realizes in each user this locality.
4. utilize Communicator's sound characteristic as claimed in claim 1 to control and the method coordinated voice communication and exchange, it is characterized in that: each user in system has a living group table, the corresponding parameter Hi of each member in group's table except active user, represents the active degree that they are current; A Hi value counter represents, maximum is M, and this member of real-time counting sends the number of times of response language to active user; Once there be the parameter Hi value of a member to be greater than M/2 in group's table of user, in group's table, the Hi value of all members is divided by 2, and namely the counter of all members moves to right one, high-order benefit 0, then continues counting.
5. as claimed in claim 4 the active member in active user group table is pressed the method for Hi value size sequence, it is characterized in that: the member not being 0 Hi value in active user group table calls active member, and active member is pressed the sequence of Hi value size; Consider a special active member, if the voice signal of active user's speech is response language, then will the member of current response object is put into before active member, is temporarily classified as the highest member of active degree.
6. judge the method for the response object that active user talks as claimed in claim 5, it is characterized in that: if a more than member sends voice signal here to active user in the threshold range of time T ', active user should select wherein t ' to be worth minimum member as response object.
7. utilize the volume of talker's sound to participate in the method controlled as claimed in claim 1, it is characterized in that: when user talks by maximum volume, then voice signal can send to all members (except active user) in group, then select the member maximum with current Hi value to converse by minimum shelves volume, talker controls the range of transmission of voice signal in group telogenesis person with volume.
8. the determination methods of the volume of talker's sound as claimed in claim 7, it is characterized in that: a period of time calculates A (n) value continuously, and A (n) is according to value sorted from big to small, choose Second Largest Value wherein or the third-largest value to judge the volume of talker.
9. utilize Communicator's sound characteristic as claimed in claim 1 to control and the method coordinated voice communication and exchange, it is characterized in that: can also button be adopted in concrete enforcement, with the length of key press time or continuously compression number represent the volume of talker, such talker one side can control the range of transmission of voice signal by button; Listen a side of speech also can produce simple return signal with button equally and issue talker, by the statistics of the language that responses, reach same effect.
CN201510184232.1A 2015-04-15 2015-04-15 Method for controlling voice communication through voice Pending CN104811318A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510184232.1A CN104811318A (en) 2015-04-15 2015-04-15 Method for controlling voice communication through voice

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510184232.1A CN104811318A (en) 2015-04-15 2015-04-15 Method for controlling voice communication through voice

Publications (1)

Publication Number Publication Date
CN104811318A true CN104811318A (en) 2015-07-29

Family

ID=53695833

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510184232.1A Pending CN104811318A (en) 2015-04-15 2015-04-15 Method for controlling voice communication through voice

Country Status (1)

Country Link
CN (1) CN104811318A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018094968A1 (en) * 2016-11-23 2018-05-31 中兴通讯股份有限公司 Audio processing method and apparatus, and media server
CN111105782A (en) * 2019-11-27 2020-05-05 深圳追一科技有限公司 Session interaction processing method and device, computer equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101179693A (en) * 2007-09-26 2008-05-14 深圳市丽视视讯科技有限公司 Mixed audio processing method of session television system
US20120253825A1 (en) * 2006-03-03 2012-10-04 At&T Intellectual Property Ii, L.P. Relevancy recognition for contextual question answering
CN103024224A (en) * 2012-11-22 2013-04-03 北京小米科技有限责任公司 Speech control method and device in multi-person speech communication
CN103217167A (en) * 2013-03-25 2013-07-24 深圳市凯立德科技股份有限公司 Method and apparatus for voice-activated navigation
CN103338145A (en) * 2013-06-03 2013-10-02 腾讯科技(深圳)有限公司 Method, device and system for controlling voice data transmission
CN103677582A (en) * 2012-09-18 2014-03-26 联想(北京)有限公司 Method for controlling electronic device, and electronic device
CN103794216A (en) * 2014-02-12 2014-05-14 能力天空科技(北京)有限公司 Voice audio mixing processing method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120253825A1 (en) * 2006-03-03 2012-10-04 At&T Intellectual Property Ii, L.P. Relevancy recognition for contextual question answering
CN101179693A (en) * 2007-09-26 2008-05-14 深圳市丽视视讯科技有限公司 Mixed audio processing method of session television system
CN103677582A (en) * 2012-09-18 2014-03-26 联想(北京)有限公司 Method for controlling electronic device, and electronic device
CN103024224A (en) * 2012-11-22 2013-04-03 北京小米科技有限责任公司 Speech control method and device in multi-person speech communication
CN103217167A (en) * 2013-03-25 2013-07-24 深圳市凯立德科技股份有限公司 Method and apparatus for voice-activated navigation
CN103338145A (en) * 2013-06-03 2013-10-02 腾讯科技(深圳)有限公司 Method, device and system for controlling voice data transmission
CN103794216A (en) * 2014-02-12 2014-05-14 能力天空科技(北京)有限公司 Voice audio mixing processing method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018094968A1 (en) * 2016-11-23 2018-05-31 中兴通讯股份有限公司 Audio processing method and apparatus, and media server
CN111105782A (en) * 2019-11-27 2020-05-05 深圳追一科技有限公司 Session interaction processing method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
US7933226B2 (en) System and method for providing communication channels that each comprise at least one property dynamically changeable during social interactions
US20160080433A1 (en) Remote Conference Implementation Method and Apparatus
CN103024224B (en) Speech control method and device in multi-person speech communication
CN109040501A (en) A kind of echo cancel method improving VOIP phone quality
CN102104473A (en) Method and system for conversation between simplex terminal and duplex terminal
CN104869216A (en) Method and mobile terminal for making and receiving calls
CN102781075A (en) Method for reducing communication power consumption of mobile terminal and mobile terminal
CN109005107A (en) The means of communication, intelligent terminal and the device with store function
CN103905646A (en) Communication terminal and voice processing method thereof
CN103237111A (en) Method and mobile terminal for amplifying conversation volume
CN104811318A (en) Method for controlling voice communication through voice
CN110381215A (en) Audio shunt method, device, storage medium and computer equipment
CN102082882A (en) Call management method and device and terminal
CN107846520A (en) single-pass detection method and device
Sinaeepourfard et al. Comparison of VoIP and PSTN services by statistical analysis
CN103873714B (en) End equipment and talking receiving terminal equipment are initiated in communication means and call
CN107301867A (en) A kind of voice restarts control system
CN106488023A (en) Intercommunication method, system and mobile terminal
CN105657149B (en) A kind of voice communication method, system and communication terminal
CN105704327A (en) Call rejection method and call rejection system
CN110336919A (en) A kind of audio communication system and its call scheme of intelligent monitoring device
CN106027745A (en) Call voice processing method and device
CN114979545A (en) Multi-terminal call method, storage medium and electronic device
CN102868975A (en) Interphone with embedded global system for mobile communications (GSM) module
CN103237139B (en) multi-party communication system and method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150729

RJ01 Rejection of invention patent application after publication