CN107800860A - Method of speech processing, device and terminal device - Google Patents
Method of speech processing, device and terminal device Download PDFInfo
- Publication number
- CN107800860A CN107800860A CN201610811985.5A CN201610811985A CN107800860A CN 107800860 A CN107800860 A CN 107800860A CN 201610811985 A CN201610811985 A CN 201610811985A CN 107800860 A CN107800860 A CN 107800860A
- Authority
- CN
- China
- Prior art keywords
- information
- voice call
- voice
- call
- acquisition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 238000012545 processing Methods 0.000 title claims abstract description 44
- 230000000694 effects Effects 0.000 claims abstract description 20
- 230000006854 communication Effects 0.000 claims description 69
- 238000004891 communication Methods 0.000 claims description 65
- 238000006243 chemical reaction Methods 0.000 claims description 43
- 230000001755 vocal effect Effects 0.000 claims description 33
- 230000008569 process Effects 0.000 description 18
- 238000010586 diagram Methods 0.000 description 16
- 230000015572 biosynthetic process Effects 0.000 description 12
- 238000003786 synthesis reaction Methods 0.000 description 12
- 239000011521 glass Substances 0.000 description 8
- 238000012937 correction Methods 0.000 description 6
- 230000033001 locomotion Effects 0.000 description 5
- 230000009467 reduction Effects 0.000 description 5
- 230000001815 facial effect Effects 0.000 description 4
- 230000002457 bidirectional effect Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000037361 pathway Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/72—Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
- H04M1/724—User interfaces specially adapted for cordless or mobile telephones
- H04M1/72403—User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
- H04M1/7243—User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
- H04M1/72433—User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for voice messaging, e.g. dictaphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/72—Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
- H04M1/725—Cordless telephones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M9/00—Arrangements for interconnection not involving centralised switching
- H04M9/08—Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
- G10L15/25—Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computer Networks & Wireless Communication (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Business, Economics & Management (AREA)
- General Business, Economics & Management (AREA)
- Telephone Function (AREA)
- Telephonic Communication Services (AREA)
Abstract
The invention provides a kind of method of speech processing, device and terminal device, wherein, this method includes:Obtain the voice call information for carrying out voice call;Obtain user's face characteristic information corresponding with the voice call information;In the case of the voice call information None- identified, the backup voice call-information of the voice call information is simulated according to the user's face characteristic information of acquisition.By the present invention, solve the improving countermeasure of voice call in correlation technique, exist background noise it is serious when voice resolution difference the problem of, reached the effect for improving voice call quality.
Description
Technical field
The present invention relates to the communications field, in particular to a kind of method of speech processing, device and terminal device.
Background technology
When carrying out audio or video calling on the mobile terminal device, in communication background noise serious situation
Under, the resolution of voice is just very poor, has a strong impact on the progress of voice communication.
At present, existing voice call improving environment on the mobile terminal device, mainly passing through diamylose gram reduces background
Scheme of noise etc..Scheme for reducing ambient noise above by diamylose gram mode, when ambient noise is bigger or noise
During environment complexity, noise reduction is bad, also recipient's speech loudness can be caused substantially to reduce, or even is difficult to differentiate.In addition, the party
Case is high to device coherence request, high to topology layout, seal request, the problems such as also causing product cost to rise.
Therefore, in correlation technique voice call improving countermeasure, exist background noise it is serious when voice resolution difference
Problem.
The content of the invention
The embodiments of the invention provide method of speech processing, device and terminal device, at least to solve language in correlation technique
Sound call improving countermeasure, exist background noise it is serious when voice resolution difference the problem of.
According to one embodiment of present invention, there is provided a kind of method of speech processing, including:Obtain and carry out voice call
Voice call information;Obtain user's face characteristic information corresponding with the voice call information;In the voice call information
In the case of None- identified, the standby language of the voice call information is simulated according to the user's face characteristic information of acquisition
Sound call-information.
Alternatively, in the case where the face feature information includes cheek vibration information, obtain and the voice call
The user's face characteristic information includes corresponding to information:Obtain the motion of the fore shell for the terminal device for carrying out the voice call
Information;According to the movable information of the fore shell of the terminal device of acquisition, it is determined that when carrying out the voice call
The cheek vibration information of user's cheek vibrations.
Alternatively, the terminal device according to acquisition the fore shell the movable information, it is determined that carry out institute
After the cheek vibration information that user's cheek shakes when stating voice call, in addition to:The cheek of acquisition is shaken into letter
Breath, is converted to information of voltage corresponding with the cheek vibration information;By the information of voltage after conversion, by simulation/number
After word A/D conversions, message code corresponding with the cheek vibration information is generated.
Alternatively, the described standby of the voice call information is simulated according to the user's face characteristic information of acquisition
Voice call information includes:According to the cheek vibration information of acquisition, it is determined that for characterize dialog context express the meaning information and
The shockproofness information strong and weak for characterizing cheek vibrations;Expressed the meaning according to determination information, and shockproofness letter
Breath, simulate the backup voice call-information of the voice call information.
Alternatively, in the case where the face feature information includes call Shape of mouth, obtain and the voice call
User's face characteristic information includes corresponding to information:Obtained by the image of the terminal device used during the voice call
Device, obtain the call Shape of mouth of the user's communication shape of the mouth as one speaks when carrying out the voice call.
Alternatively, the described standby of the voice call information is simulated according to the user's face characteristic information of acquisition
Voice call information includes:By the call Shape of mouth of acquisition, it is converted into and the call Shape of mouth corresponding first
Voice messaging;According to first voice messaging, the backup voice call-information of the voice call information is determined.
Alternatively, the described standby of the voice call information is simulated in the user's face characteristic information according to acquisition
After voice call-information, in addition at least one of:According to corresponding with the voice call in voice vocal print feature storehouse
User's vocal print feature information, the frequency and/or tone color of the backup voice call-information are modulated, obtain with it is described standby
The voice modulation information corresponding to voice call-information;Default background sound effect and the backup voice call-information are mixed
Sound, generate mixing information;The second voice messaging obtained after the text information conversion of input is inserted into the backup voice to lead to
The predeterminated position of information is talked about, generates the 3rd voice messaging.
Alternatively, the described standby of the voice call information is simulated in the user's face characteristic information according to acquisition
After voice call-information, in addition to:The backup voice call-information is sent to the terminal for receiving the voice call
Equipment.
Alternatively, the described standby of the voice call information is simulated in the user's face characteristic information according to acquisition
After voice call-information, in addition to:Play the backup voice call-information.
Alternatively, playing the backup voice call-information includes:According to the backup voice call-information, it is determined that being used for
Control the control information of the fore shell vibrations of terminal device;According to the control information of determination, the institute of the terminal device is controlled
Fore shell is stated to be shaken.
According to another embodiment of the invention, there is provided a kind of voice processing apparatus, including:First acquisition module, use
In the voice call information for obtaining progress voice call;Second acquisition module, it is corresponding with the voice call information for obtaining
User's face characteristic information;Analog module, in the case of the voice call information None- identified, according to acquisition
The user's face characteristic information simulates the backup voice call-information of the voice call information.
Alternatively, second acquisition module includes:First acquisition unit, for including the cheek in the face feature information
In the case of portion's vibration information, the movable information of the fore shell for the mobile terminal for carrying out the voice call is obtained;First determines list
Member, for the movable information of the fore shell of the mobile terminal according to acquisition, it is determined that carrying out the voice call
When the vibrations of user cheek cheek vibration information.
Alternatively, second acquisition module also includes:Converting unit, for by the cheek vibration information of acquisition,
Be converted to information of voltage corresponding with the cheek vibration information;Generation unit, for by the information of voltage after conversion, warp
After crossing analog/digital A/D conversions, message code corresponding with the cheek vibration information is generated.
Alternatively, the analog module includes:Second determining unit, for the cheek vibration information according to acquisition,
It is determined that express the meaning information and the shockproofness information strong and weak for characterizing cheek vibrations for characterizing dialog context;Analogue unit,
For information of being expressed the meaning according to determination, and the shockproofness information, the described standby of the voice call information is simulated
With voice call-information.
Alternatively, the acquisition module includes:Second acquisition unit, for including call mouth in the face feature information
In the case of type information, by the image acquiring device of the mobile terminal used during the voice call, obtain and carrying out
The call Shape of mouth of the user's communication shape of the mouth as one speaks during voice call.
Alternatively, the analog module includes:Conversion module, for by the call Shape of mouth of acquisition, being converted into
The first voice messaging corresponding with the call Shape of mouth;3rd determining module, for according to first voice messaging, really
The backup voice call-information of the fixed voice call information.
Alternatively, described device also includes at least one of:Modulation module, for according in voice vocal print feature storehouse with
User's vocal print feature information corresponding to the voice call, frequency and/or tone color to the backup voice call-information are carried out
Modulation, obtain voice modulation information corresponding with the backup voice call-information;Mix module, for by default background sound
Effect carries out audio mixing with the backup voice call-information, generates mixing information;Generation module, for the text information of input to be turned
The second voice messaging obtained after changing is inserted into the predeterminated position of the backup voice call-information, generates the 3rd voice messaging.
Alternatively, described device also includes:Sending module, for the backup voice call-information to be sent into institute's predicate
The opposite equip. of sound call.
Alternatively, described device also includes:Playing module, play the backup voice call-information.
Alternatively, the playing module includes:4th determining unit, for according to the backup voice call-information, really
It is fixed to be used to control the control information that the fore shell of mobile terminal shakes;Control unit, for the control information according to determination, control
The fore shell for making the mobile terminal is shaken.
According to still another embodiment of the invention, a kind of terminal device is additionally provided, the terminal device includes foregoing
Device described in one.
According to still another embodiment of the invention, a kind of storage medium is additionally provided.The storage medium is arranged to storage and used
In the program code for performing following steps:Obtain the voice call information for carrying out voice call;Obtain and believe with the voice call
User's face characteristic information corresponding to breath;In the case of the voice call information None- identified, according to the use of acquisition
Family face feature information simulates the backup voice call-information of the voice call information.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:In the facial characteristics
In the case that information includes cheek vibration information, the user's face characteristic information corresponding with the voice call information is obtained
Including:Obtain the movable information of the fore shell for the terminal device for carrying out the voice call;According to the terminal device of acquisition
The movable information of the fore shell, it is determined that the cheek vibration information that user's cheek shakes when carrying out the voice call.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:In the institute according to acquisition
The movable information of the fore shell of terminal device is stated, it is determined that when carrying out the voice call described in the vibrations of user's cheek
After cheek vibration information, in addition to:By the cheek vibration information of acquisition, be converted to corresponding with the cheek vibration information
Information of voltage;By the information of voltage after conversion, after analog/digital A/D conversions, generation is believed with cheek vibrations
Message code corresponding to breath.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:According to acquisition
The backup voice call-information that user's face characteristic information simulates the voice call information includes:According to the institute of acquisition
Cheek vibration information is stated, it is determined that express the meaning information and the shockproofness strong and weak for characterizing cheek vibrations for characterizing dialog context
Information;Expressed the meaning according to determination information, and the shockproofness information, simulate the described standby of the voice call information
With voice call-information.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:In the facial characteristics
In the case that information includes call Shape of mouth, user's face characteristic information bag corresponding with the voice call information is obtained
Include:By the image acquiring device of the terminal device used during the voice call, obtain and carrying out the voice call
When the user's communication shape of the mouth as one speaks call Shape of mouth.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:According to acquisition
The backup voice call-information that user's face characteristic information simulates the voice call information includes:By described in acquisition
Call Shape of mouth, it is converted into the first voice messaging corresponding with the call Shape of mouth;According to first voice messaging,
Determine the backup voice call-information of the voice call information.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:In the institute according to acquisition
After stating the backup voice call-information that user's face characteristic information simulates the voice call information, in addition to it is following
At least one:According to user's vocal print feature information corresponding with the voice call in voice vocal print feature storehouse, to described standby
The frequency and/or tone color of voice call information are modulated, and obtain voice modulation corresponding with the backup voice call-information
Information;Default background sound effect and the backup voice call-information are subjected to audio mixing, generate mixing information;By the word of input
The second voice messaging obtained after information conversion is inserted into the predeterminated position of the backup voice call-information, generates the 3rd voice
Information.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:In the institute according to acquisition
After stating the backup voice call-information that user's face characteristic information simulates the voice call information, in addition to:Will
The backup voice call-information is sent to the terminal device for receiving the voice call.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:In the institute according to acquisition
After stating the backup voice call-information that user's face characteristic information simulates the voice call information, in addition to:Broadcast
Put the backup voice call-information.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:Play the standby language
Sound call-information includes:According to the backup voice call-information, it is determined that for the control for controlling the fore shell of terminal device to shake
Information;According to the control information of determination, the fore shell of the terminal device is controlled to be shaken.
By the present invention, in the case of voice call information None- identified, according to the user's face characteristic information of acquisition
The backup voice call-information of user speech call-information is simulated, due to believing according to user's face characteristic information voice call
Breath be simulated backup, therefore, can solve the improving countermeasure of voice call in correlation technique, exist background noise it is serious when
The problem of resolution difference of voice, reach the effect for improving voice call quality.
Brief description of the drawings
Accompanying drawing described herein is used for providing a further understanding of the present invention, forms the part of the application, this hair
Bright schematic description and description is used to explain the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is a kind of hardware block diagram of the mobile terminal of method of speech processing of the embodiment of the present invention;
Fig. 2 is the flow chart of method of speech processing according to embodiments of the present invention;
Fig. 3 is voice communication accessory system using process diagram according to the preferred embodiment of the invention;
Fig. 4 is cheek vibrations pickup transition diagram according to the preferred embodiment of the invention;
Fig. 5 is the generation schematic diagram of phonetic synthesis code according to the preferred embodiment of the invention;
Fig. 6 is the generating process schematic diagram in vocal print phonetic feature storehouse according to the preferred embodiment of the invention;
Fig. 7 is the course of work schematic diagram of addition background sound effect according to the preferred embodiment of the invention;
Fig. 8 is text-to-speech communication scheme according to the preferred embodiment of the invention;
Fig. 9 is voice communication accessory system using process diagram according to the preferred embodiment of the invention;
Figure 10 is Mouth-Shape Recognition system work process schematic diagram according to the preferred embodiment of the invention;
Figure 11 is the generation signal of voice communication accessory system local voice feature database according to the preferred embodiment of the invention
Figure;
Figure 12 is voice communication accessory system text-to-speech communication scheme according to the preferred embodiment of the invention;
Figure 13 is the system converting voice insertion system signal of single sending end Mouth-Shape Recognition according to the preferred embodiment of the invention
Figure;
Figure 14 is the structured flowchart one of voice processing apparatus according to embodiments of the present invention;
The structured flowchart one of second acquisition module 144 of voice processing apparatus according to embodiments of the present invention Figure 15;
The structured flowchart two of second acquisition module 144 of voice processing apparatus according to embodiments of the present invention Figure 16;
The structured flowchart one of the analog module 146 of voice processing apparatus according to embodiments of the present invention Figure 17;
The structured flowchart three of second acquisition module 144 of voice processing apparatus according to embodiments of the present invention Figure 18;
The structured flowchart two of the analog module 146 of voice processing apparatus according to embodiments of the present invention Figure 19;
Figure 20 is the structured flowchart two of voice processing apparatus according to embodiments of the present invention;
Figure 21 is the structured flowchart three of voice processing apparatus according to embodiments of the present invention;
Figure 22 is the structured flowchart four of voice processing apparatus according to embodiments of the present invention;
The structured flowchart of the playing module 222 of voice processing apparatus according to embodiments of the present invention Figure 23;
Figure 24 is the structured flowchart of terminal device according to embodiments of the present invention.
Embodiment
Describe the present invention in detail below with reference to accompanying drawing and in conjunction with the embodiments.It should be noted that do not conflicting
In the case of, the feature in embodiment and embodiment in the application can be mutually combined.
It should be noted that term " first " in description and claims of this specification and above-mentioned accompanying drawing, "
Two " etc. be for distinguishing similar object, without for describing specific order or precedence.
Embodiment 1
The embodiment of the method that the embodiment of the present application one is provided can be at mobile terminal, terminal or similar end
Performed in end equipment.Exemplified by running on mobile terminals, Fig. 1 is a kind of movement of method of speech processing of the embodiment of the present invention
The hardware block diagram of terminal.Handled as shown in figure 1, mobile terminal 10 can include one or more (one is only shown in figure)
Device 12 (processor 12 can include but is not limited to Micro-processor MCV or PLD FPGA etc. processing unit), use
Memory 14 in data storage and the transmitting device 16 for communication function.It will appreciated by the skilled person that
Structure shown in Fig. 1 is only to illustrate, and it does not cause to limit to the structure of above-mentioned electronic installation.For example, mobile terminal 10 may be used also
Including more either less components than shown in Fig. 1 or there is the configuration different from shown in Fig. 1.
Memory 14 can be used for the software program and module of storage application software, at the voice in the embodiment of the present invention
Programmed instruction/module corresponding to reason method, processor 12 are stored in software program and module in memory 14 by operation,
So as to perform various function application and data processing, that is, realize above-mentioned method.Memory 14 may include high speed random storage
Device, nonvolatile memory is may also include, such as one or more magnetic storage device, flash memory or other are non-volatile solid
State memory.In some instances, memory 14 can further comprise relative to the remotely located memory of processor 12, these
Remote memory can pass through network connection to mobile terminal 10.The example of above-mentioned network includes but is not limited to internet, enterprise
In-house network, LAN, mobile radio communication and combinations thereof.
Transmitting device 16 is used to data are received or sent via a network.Above-mentioned network instantiation may include to move
The wireless network that the communication providerses of dynamic terminal 10 provide.In an example, transmitting device 16 includes a network adapter
(Network Interface Controller, NIC), its can be connected by base station with other network equipments so as to interconnection
Net is communicated.In an example, transmitting device 16 can be radio frequency (Radio Frequency, RF) module, and it is used to lead to
Wireless mode is crossed to be communicated with internet.
A kind of method of speech processing for running on above-mentioned mobile terminal is provided in the present embodiment, and Fig. 2 is according to this hair
The flow chart of the method for speech processing of bright embodiment, as shown in Fig. 2 the flow comprises the following steps:
Step S202, obtain the voice call information for carrying out voice call;
Step S204, obtain user's face characteristic information corresponding with voice call information;
Step S206, in the case of voice call information None- identified, according to the user's face characteristic information mould of acquisition
Draw up the backup voice call-information of voice call information.
By above-mentioned steps, in the case of voice call information None- identified, believed according to the user's face feature of acquisition
Breath simulates the backup voice call-information of user speech call-information, solves the improvement side of voice call in correlation technique
Case, exist background noise it is serious when voice resolution difference the problem of, improve voice call quality.
Alternatively, user's face characteristic information can include a variety of, for example, cheek vibration information, Shape of mouth of conversing.
Alternatively, face feature information include cheek vibration information in the case of, can obtain in the following way with
User's face characteristic information corresponding to voice call information:Obtain the motion letter of the fore shell for the terminal device for carrying out voice call
Breath, according to the movable information of acquisition, it is determined that the cheek vibration information that user's cheek shakes when carrying out voice call.Determine the cheek
Portion's vibration information, user's face characteristic information is also determined that.
Alternatively, it is determined that after the cheek vibration information, the cheek vibration information of acquisition can also be located as follows
Reason:By the cheek vibration information of acquisition, information of voltage corresponding with cheek vibration information is converted to;Voltage after conversion is believed
Breath, after analog/digital A/D conversions, generate message code corresponding with cheek vibration information.
Alternatively, the backup voice call-information of voice call information is simulated according to the user's face characteristic information of acquisition
It can include:According to the cheek vibration information of acquisition, it is determined that the information and for characterizing cheek of expressing the meaning for characterizing dialog context
Strong and weak shockproofness information is shaken, according to the information of expressing the meaning of determination, and shockproofness information, analog voice call-information
Backup voice call-information.
Alternatively, in the case where face feature information includes call Shape of mouth, obtain corresponding with voice call information
User's face characteristic information can include:By the image acquiring device of the terminal device used during voice call, obtain
Take the call Shape of mouth of the user's communication shape of the mouth as one speaks when carrying out voice call.
Alternatively, the backup voice call-information of voice call information is simulated according to the user's face characteristic information of acquisition
It can include:By the call Shape of mouth of acquisition, the first voice messaging corresponding with call Shape of mouth is converted into;According to first
Voice messaging, determine the backup voice call-information of voice call information.
Alternatively, converse and believe in the backup voice that voice call information is simulated according to the user's face characteristic information of acquisition
After breath, backup voice can be led to according to user's vocal print feature information corresponding with voice call in voice vocal print feature storehouse
The frequency and/or tone color for talking about information are modulated, and voice modulation information corresponding with backup voice call-information are obtained, by right
Backup voice information is modulated, and can be generated the call-information for meeting calling user voice characteristic, be improved Consumer's Experience.
Alternatively, converse and believe in the backup voice that voice call information is simulated according to the user's face characteristic information of acquisition
After breath, default background sound effect and backup voice call-information can be subjected to audio mixing, generate mixing information.
Alternatively, converse and believe in the backup voice that voice call information is simulated according to the user's face characteristic information of acquisition
After breath, the second voice messaging obtained after the text information conversion of input can be inserted into the pre- of backup voice call-information
If position, the 3rd voice messaging is generated, meets different user's requests.
Alternatively, after execution of step S206, backup voice call-information can be sent to reception voice call
Terminal device, can also will play backup voice call-information.Here operation is set according to being actually needed for user,
For example, being simulated in transmitting terminal, played in receiving terminal, or simulated and played in receiving terminal.
Alternatively, playing backup voice call-information on the terminal device includes:According to backup voice call-information, it is determined that
The control information shaken for the fore shell for controlling terminal device, according to the control information of determination, the fore shell of terminal device is controlled to enter
Row vibrations., can be larger in ambient noise by the way that voice messaging to be converted into the control information of control terminal device fore shell vibrations
When also can clearly hear voice messaging.
Based on above-described embodiment and preferred embodiment, to illustrate that the whole flow process of scheme interacts, in this preferred embodiment
In, there is provided a kind of method of speech processing.It is characterized as that cheek vibrations, the shape of the mouth as one speaks illustrate this preferred embodiment to face separately below
The flow of method of speech processing.It should be noted that in the method for speech processing, mobile terminal is said by taking mobile phone as an example
It is bright.
The scheme that first vibrations of identification cheek to the voice of voice call process check with correction illustrates.
In view of mobile phone user in the case where carrying out voice call or similar scene, speaker can make mobile phone tight with cheek
Contiguity is touched, and a kind of terminal speech communication assistance system is provided in this preferred embodiment, and the voice in voice communication can be sent out
Cheek when speaking for the person of going out is shaken while is identified, and the voice to above-mentioned voice communication course is shaken using the cheek of identification
Checked, corrected, accurately identify the above-mentioned voice messaging sent, and the voice for making to send after synthesizing by this information becomes
Accurately and clearly, while background noise can be eliminated, reach the purpose of digital noise reduction.
Usually, the cheek vibrations when voice person of sending in voice communication speaks, are usually glass by mobile phone terminal fore shell
Glass plate is conducted.Vibrations are further conducted into kinetic energy voltage transformation module, are carried out kinetic energy to the conversion of voltage, are then entered again
One step carries out analog/digital (Analog/Digital, referred to as A/D) conversion, is converted into digital code.Then further give birth to
Into the information correction code (that is, the information correction code being corrected to the voice messaging of voice call) of foregoing cheek vibrations.
Terminal speech communication assistance system provided in this preferred embodiment, it can also use in above-mentioned voice communication
The local voice vocal print feature storehouse that the voice person of sending generates in low environment noise, to the sound of the voice person of sending in voice communication
Voice frequency curvilinear characteristic has carried out identification and preserved, and is used during phonetic synthesis so that finally synthesizes the voice sent
Keep the voice tone color of the voice person of sending in voice communication.
The method of speech processing of this preferred embodiment is described in detail below.
Fig. 3 is voice communication accessory system using process diagram according to the preferred embodiment of the invention, as shown in figure 3,
After call starts, if now ambient noise is bigger, originator caller (caller A) opens voice communication accessory system
System.When the voice of originator can not be identified, the cheek vibrations to first speaker are identified, message code corresponding to generation
(the message code streams of cheek vibrations), the voice call for the caller A being now collected into, also generate corresponding message code (language
The message code stream of sound call), above-mentioned two message code stream is contrasted, to ensure the call voice in now noise circumstance
The information information expressed with cheek vibrations is basically identical.
If it is determined that in the case that above-mentioned two information is consistent, by message code stream (the information generation of cheek vibrations now
Code stream and the message code stream of voice call) synthesize voice, it is special using caller A local voice during synthesis
Storehouse is levied, the frequency and tone color of synthesized voice are modulated so that the voice of synthesis keeps caller's A tamber characteristics.Now synthesize
Voice be no ambient noise it is relatively good in the case of pure digital voice, reached the purpose of digital noise reduction.
The vocal print storehouse of the local voice feature database of caller A described in Fig. 3, i.e. caller A, can be that this passes through words
During in-time generatin, it is also possible to be the mode of advance typing, for example, typing during start, or transfer start vocal print password
Characteristic information storehouse.
Fig. 4 is cheek vibrations pickup transition diagram according to the preferred embodiment of the invention.Usually, sound vibrations are in glass
Propagation performance in glass plate is better than aerial propagation.As shown in figure 4, when the voice person of sending in voice communication speaks
Cheek shakes, and is usually conducted by the fore shell glass plate of mobile phone terminal;Vibrations are further conducted into kinetic energy voltage conversion
Module, conversion of the kinetic energy to voltage is carried out, be then further carried out A/D conversions, be converted into digital code;Then further
The message code of foregoing cheek vibrations is generated, in case correction uses.
Fig. 5 is the generation schematic diagram of phonetic synthesis code according to the preferred embodiment of the invention.As shown in figure 5, caller A is said
The real-time phonetic and cheek vibrating state sent during words is identified respectively, and the pickup of in general cheek vibration signal uses mobile phone
Fore shell glass and kinetic energy voltage conversion component, by picking up, changing, the process such as identifying, the two states identified respectively, respectively
It is identified as two kinds of features again:One kind is representation features, the generation of corresponding condition code of expressing the meaning;Another kind is that corresponding sound size is high
The strong and weak feature of low and cheek vibrations, the generation of corresponding loudness condition code.Real-time phonetic and cheek vibrations two states difference
Two kinds of feature codes of generation, referred to as message code corresponding to two states.Now two kinds of message codes are contrasted, if
If consistent, you can the synthesis that two kinds of message codes now are used for carrying out to voice acts.
Above two feature represents two dimensions respectively, for example, M grade, N number of grade, form M × N matrix, encode
When carried out using bidimensional coding is unified.Above-mentioned M, N number can be according to the requirements to speech quality grade, and processing speed etc.
Factor carries out comprehensive selection.
Fig. 6 is the generating process schematic diagram in vocal print phonetic feature storehouse according to the preferred embodiment of the invention.As shown in fig. 6,
During vocal print typing, caller A sound is carried out typing by the generation module in the vocal print phonetic feature storehouse of voice communication accessory system,
Caller A sound frequency timbre information characteristic information is therefrom extracted, generates my distinctive sound frequency characteristic curve mould
Type.This sound frequency characteristic curve is modulated, with life for foregoing during phonetic synthesis is carried out to generation institute voice
Into the voice for finally meeting my tamber characteristic.
To make the characteristic information of extraction accurate, perfect, above-mentioned generation module carries out the voice letter of successive ignition collection owner
Breath, compared with the feature database vocal print generated, progressive alternate improves the sound vocal print feature information bank of speaker.Typically
Ground, the length of iteration and time can determine according to tonequality requirement of generation voice etc..
Vocal print phonetic feature storehouse can individually carry out the recording of voiceprint, generate the vocal print phonetic feature of owner
Storehouse, in case being used during follow-up call.As shown in earlier figures 5, it can also be generated during immediate communication.It is usually preferential to use
The mode of the recording of voiceprint is individually carried out, to ensure the low noise sound effective value of feature database.
Fig. 7 is the course of work schematic diagram of addition background sound effect according to the preferred embodiment of the invention.As shown in fig. 7,
Background sound effect is added in the cheek vibrations identifying system of voice communication accessory system.By the synthesis clean speech of output with it is foregoing
Additive operation is carried out containing noisy voice, can be taken off environmental background noise now, after this background sound loudness is reduced, with life
Into clean speech carry out audio mixing, the clear call voice that final output matches with now environment, then send and connect
Debit's (caller B mobile phone terminal).Background sound effect or the one of certain pre-set background sound effect now
Kind, such as seashore, square, hall.
Fig. 8 is text-to-speech communication scheme according to the preferred embodiment of the invention.As shown in figure 8, voice call
During, when ambient noise of starting is big, originating subscriber can also synthesize voice in transmitting terminal using character input modes,
During synthesis, using the local voice feature database of first speaker, the frequency and tone color of synthesized voice are modulated so that synthesis
Voice keep first speaker tamber characteristic.
Above-mentioned terminal speech communication assistance system provided in this preferred embodiment, it is to carry out on the mobile terminal device
During voice-frequency telephony, if in the case where communicating background noise serious situation, the speech recognition degree of receiving terminal is using the system with regard to very poor
Terminal on the cheek vibrations of the voice person of sending in above-mentioned voice communication when speaking are identified, for speaker's
Acoustic information is checked, corrected, and carries out synthesis voice output, and the part of speech lacked in above-mentioned voice communication course is broken
Piece part is supplemented, and above-mentioned voice communication communication process can be made to be normally carried out.
Above-mentioned terminal speech communication assistance system provided in this preferred embodiment, also reside in the local of local first speaker
Voice vocal print feature storehouse, the frequency and tone color of synthesized voice can be modulated so that the synthesis voice sent keeps first speaker sound
Color characteristic.
Above-mentioned terminal speech communication assistance system provided in this preferred embodiment, also reside in the process of voice call
In, voice can also be synthesized using text mode in transmitting terminal, entered by the local voice vocal print feature storehouse of local first speaker
Line frequency tone color is modulated, same to keep first speaker tamber characteristic.
The start and stop of above-mentioned terminal speech communication assistance system provided in this preferred embodiment, can be by leading to
Words person's (or other users) is carried out manually, can also automatically be monitored by transmitting terminal, carries out start and stop automatically.
Above-mentioned terminal speech communication assistance system provided in this preferred embodiment can be used alone, can also be with phase
The noise reduction schemes combination of pass is used, and reaches preferable noise reduction sound effective value.
Above-mentioned terminal speech communication assistance system provided in this preferred embodiment can be bidirectional mode, when both sides'
When ambient noise is all bigger, both ends caller can open voice communication accessory system system, use bidirectional mode.
Above-mentioned terminal speech communication assistance system provided in this preferred embodiment, is corrected in the voice communication used
Cheek vibrations when the voice person of sending speaks, are usually conducted by the fore shell glass plate of mobile phone terminal.Vibrations are further
Conduct into kinetic energy voltage transformation module, carry out conversion of the kinetic energy to voltage, be then further carried out A/D conversions, be converted into counting
Word code.Then the message code of foregoing cheek vibrations is further generated, in case the voice correction of sending direction uses.Together
Sample, the reverse procedure of said process, i.e., the locally received call voice arrived, it can also be produced by the modular converter of voltage to kinetic energy
Raw vibrations, and the vibrations of generation are transmitted on the fore shell glass of mobile phone terminal, because now speaker (caller B) can make hand
Machine is in close contact with cheek, and the local user then allowed by way of osteoacusis in noise circumstance is heard, meanwhile, because not having
By air borne, the voice that this end subscriber receives will not be influenceed by local terminal ambient noise.
In addition, the vocal print feature of the local voice feature database, i.e. sender of first speaker described in this preferred embodiment
Storehouse, it is that this passes through in-time generatin during words.Also the mode of advance typing, including typing during start are can use, can also be adjusted
Take away the characteristic information storehouse of machine vocal print password.
Above-mentioned terminal speech communication assistance system provided in this preferred embodiment or in normal noise situation
Under characteristic use, background sound effect now or one kind of certain pre-set background sound effect, such as seashore, wide
Field, hall etc., or some music scenario of typing in advance, or the scene that other users are voluntarily recorded, reach certain
Amusement and the effect of personalized call.
Aforementioned sound frequency characteristic model, can also be by User Defined, as user is special to the sound frequency of oneself
Linearity curve is modified so that the sound that party B is sent during call, that is, keeps the sound characteristic of oneself, and can is more melodious to be moved
Listen, to reach the purpose for the sound that makes oneself beautiful and effect.
Secondly, the scheme for the voice of voice call process check correction to identification caller's shape of the mouth as one speaks illustrates.
Current Mouth-Shape Recognition existing on the mobile terminal device, main purpose are that the shape of the mouth as one speaks is changed into word, or conversion
The operation carried out into other-end, running of such as taking pictures.
The preferred embodiment of the present invention provides a kind of voice communication accessory system, and the voice in above-mentioned communication can be sent out
Shape of the mouth as one speaks when speaking for the person of going out is identified, and converts it into voice mode, and the phonological component lacked in above-mentioned communication process is entered
Row is filled up, and above-mentioned communication exchanges process can be made to be normally carried out.
Fig. 9 is voice communication accessory system using process diagram according to the preferred embodiment of the invention, as shown in figure 9,
After call starts, if now ambient noise is bigger, originator caller (caller A) opens voice communication accessory system
System.When the voice of originator can not be identified, originator video recording system starts, and receiving end Mouth-Shape Recognition system starts, the receiving end shape of the mouth as one speaks
The identifying system conversion shape of the mouth as one speaks is voice, replaces the unclear voice snippet in originator part, and communication process can continue, avoid
Interrupted due to receiving end None- identified voice.
Figure 10 is Mouth-Shape Recognition system work process schematic diagram according to the preferred embodiment of the invention, as shown in Figure 10, this
When mouth shape image transmission process, can be video calling mode.Can also start to use data channel, unidirectional receiving terminal is same
Step transmission, the mode that receiving terminal is unidirectionally received using data channel, display picture, originator can not also display pictures for receiving end
And only carry out the transmission transmitting procedure of image.
Processing for voice signal can also be two-way, when the ambient noise of both sides is all bigger, both ends
Caller can open voice communication accessory system system, use bidirectional mode.As foregoing as video call process may also be double
To.
Figure 11 is the generation signal of voice communication accessory system local voice feature database according to the preferred embodiment of the invention
Figure, as shown in figure 11, local phonetic feature storehouse can be protected to the sound frequency feature of the voice call person once carried out
Identification is deposited, when being conversed next time, can preferentially use the characteristic voice tone color of the user.
Figure 12 is voice communication accessory system text-to-speech communication scheme according to the preferred embodiment of the invention, is such as schemed
Shown in 12, during voice call, when ambient noise of starting is big, originating subscriber can also be defeated using word in transmitting terminal
Enter mode, sent through data channel, be converted into the mode of voice in reception local side, same local terminal uses foregoing originator caller
Original phonetic feature sound storehouse tone color.
Figure 13 is the system converting voice insertion system signal of single sending end Mouth-Shape Recognition according to the preferred embodiment of the invention
Figure, as shown in figure 13, during voice call, when transmitting terminal ambient noise is big, only start terminal speech in transmitting terminal
Communication assistance system, Camera (camera) system carry out mouth shape image identification, and carry out the characteristic voice of voice conversion speaker
Generation, it is then inserted into hair in voice pathway and sends to receiving terminal, voice call is normally carried out.
Above-mentioned terminal speech communication assistance system provided in this preferred embodiment, it is to carry out on the mobile terminal device
Audio, video calling when, if in the case where communicating background noise serious situation, the speech recognition degree of receiving terminal makes with regard to very poor
Shape of the mouth as one speaks when speaking of the voice person of sending in above-mentioned communication is identified with the terminal of the system, converts it into voice
Mode, the part voice fragmental part lacked in above-mentioned communication process is filled up, above-mentioned communication exchanges process can be made to obtain normally
Carry out.
Above-mentioned terminal speech communication assistance system provided in this preferred embodiment, also resides in local phonetic feature storehouse
The voice call person's sound frequency feature once carried out can be carried out saving identification, can be excellent when being conversed next time
First use the characteristic voice tone color of the user.
Above-mentioned terminal speech communication assistance system provided in this preferred embodiment, also reside in the process of voice call
In, text mode can also be used in transmitting terminal, be sent through data channel, be converted into the mode of voice in reception local side, together
Sample end uses the originator original phonetic feature sound storehouse tone color of caller.
The start and stop of above-mentioned terminal speech communication assistance system provided in this preferred embodiment, can be by hand
It is dynamic to carry out, it can also automatically be monitored by transmitting terminal or receiving terminal, carry out start and stop automatically.
In addition, the above-mentioned terminal speech communication assistance system provided in this preferred embodiment, can also be only in transmitting terminal
Mouth shape image identification is carried out, and image is subjected to voice conversion, then the voice of conversion is inserted directly into the voice of transmitting terminal.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation
The method of example can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but a lot
In the case of the former be more preferably embodiment.Based on such understanding, technical scheme is substantially in other words to existing
The part that technology contributes can be embodied in the form of software product, and the computer software product is stored in a storage
In medium (such as ROM/RAM, magnetic disc, CD), including some instructions to cause a station terminal equipment (can be mobile phone, calculate
Machine, server, or network equipment etc.) perform method described in each embodiment of the present invention.
Embodiment 2
A kind of voice processing apparatus is additionally provided in the present embodiment, and the device is used to realize above-described embodiment and preferred reality
Mode is applied, had carried out repeating no more for explanation.As used below, term " module " can realize the soft of predetermined function
The combination of part and/or hardware.Although device described by following examples is preferably realized with software, hardware, or
The realization of the combination of software and hardware is also what may and be contemplated.
Figure 14 is the structured flowchart one of voice processing apparatus according to embodiments of the present invention, as shown in figure 14, the device bag
Include:
First acquisition module 142, the voice call information of voice call is carried out for obtaining;
Second acquisition module 144, above-mentioned first acquisition module 142 is connected to, it is corresponding with voice call information for obtaining
User's face characteristic information;
Analog module 146, above-mentioned second acquisition module 144 is connected to, for the feelings in voice call information None- identified
Under condition, the backup voice call-information of voice call information is simulated according to the user's face characteristic information of acquisition.
The structured flowchart one of second acquisition module 144 of voice processing apparatus according to embodiments of the present invention Figure 15, such as scheme
Shown in 15, second acquisition module 144 includes:
First acquisition unit 152, in the case of including cheek vibration information in face feature information, obtain and carry out language
The movable information of the fore shell of the mobile terminal of sound call;
First determining unit 154, above-mentioned first acquisition unit 152 is connected to, before the mobile terminal according to acquisition
The movable information of shell, it is determined that the cheek vibration information that user's cheek shakes when carrying out voice call.
The structured flowchart two of second acquisition module 144 of voice processing apparatus according to embodiments of the present invention Figure 16, such as scheme
Shown in 16, the device in addition to including all modules shown in Figure 15, in addition to:
Converting unit 162, for by the cheek vibration information of acquisition, being converted to voltage letter corresponding with cheek vibration information
Breath;
Generation unit 164, above-mentioned converting unit 162 is connected to, for by the information of voltage after conversion, by simulation/number
After word A/D conversions, message code corresponding with cheek vibration information is generated.
The structured flowchart one of the analog module 146 of voice processing apparatus according to embodiments of the present invention Figure 17, such as Figure 17 institutes
Show, the analog module 146 includes:
Second determining unit 172, for the cheek vibration information according to acquisition, it is determined that for characterizing expressing the meaning for dialog context
Information and the shockproofness information strong and weak for characterizing cheek vibrations;
Analogue unit 174, above-mentioned second determining unit 172 is connected to, for the information of expressing the meaning according to determination, and vibrations
Strength information, the backup voice call-information of analog voice call-information.
The structured flowchart three of second acquisition module 144 of voice processing apparatus according to embodiments of the present invention Figure 18, such as scheme
Shown in 18, second acquisition module 144 includes:
Second acquisition unit 182, in the case of including call Shape of mouth in face feature information, by carrying out language
The image acquiring device for the mobile terminal that sound uses when conversing, obtain the call mouth of the user's communication shape of the mouth as one speaks when carrying out voice call
Type information.
The structured flowchart two of the analog module 146 of voice processing apparatus according to embodiments of the present invention Figure 19, such as Figure 19 institutes
Show, the analog module 146 includes:
Conversion unit 192, for by the call Shape of mouth of acquisition, being converted into the first language corresponding with call Shape of mouth
Message ceases;
3rd determining unit 194, above-mentioned conversion unit 192 is connected to, for according to the first voice messaging, determining that voice leads to
Talk about the backup voice call-information of information.
Figure 20 is the structured flowchart two of voice processing apparatus according to embodiments of the present invention, and as shown in figure 20, the device removes
Outside including all modules shown in Figure 15, in addition to:
Modulation module 202, for according to user's vocal print feature information corresponding with voice call in voice vocal print feature storehouse,
The frequency and/or tone color of backup voice call-information are modulated, voice corresponding with backup voice call-information is obtained and adjusts
Information processed;
Mix module 204, for default background sound effect and backup voice call-information to be carried out into audio mixing, generation audio mixing letter
Breath;
Generation module 206, the second voice messaging for being obtained after the text information conversion by input are inserted into standby language
The predeterminated position of sound call-information, generate the 3rd voice messaging.
Figure 21 is the structured flowchart three of voice processing apparatus according to embodiments of the present invention, and as shown in figure 21, the device removes
Outside including all modules shown in Figure 15, in addition to:
Sending module 212, for backup voice call-information to be sent to the opposite equip. of voice call.
Figure 22 is the structured flowchart four of voice processing apparatus according to embodiments of the present invention, and as shown in figure 22, the device removes
Outside including all modules shown in Figure 15, in addition to:
Playing module 222, for playing backup voice call-information on mobile terminals.
The structured flowchart of the playing module 222 of voice processing apparatus according to embodiments of the present invention Figure 23, as shown in figure 23,
The playing module 222 includes:
4th determining unit 232, for according to backup voice call-information, it is determined that for controlling the fore shell of mobile terminal to shake
Dynamic control information;
Control unit 234, above-mentioned 4th determining unit 232 is connected to, for the control information according to determination, control movement
The fore shell of terminal is shaken.
It should be noted that above-mentioned modules can be realized by software or hardware, for the latter, Ke Yitong
Cross in the following manner realization, but not limited to this:Above-mentioned module is respectively positioned in same processor;Or above-mentioned modules are with any
The form of combination is located in different processors respectively.
Embodiment 3
A kind of terminal device is additionally provided in embodiments of the invention.Figure 24 is terminal device according to embodiments of the present invention
Structured flowchart.As shown in figure 24, the terminal device includes:Any voice processing apparatus 242 in above-described embodiment.
Embodiment 4
A kind of storage medium is additionally provided in embodiments of the invention.Alternatively, in the present embodiment, above-mentioned storage medium
It can be configured to the program code that storage is used to perform following steps:
S1, obtain the voice call information for carrying out voice call;
S2, obtain user's face characteristic information corresponding with voice call information;
S3, in the case of voice call information None- identified, language is simulated according to the user's face characteristic information of acquisition
The backup voice call-information of sound call-information.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:
In the case where face feature information includes cheek vibration information, user plane corresponding with voice call information is obtained
Portion's characteristic information includes:
S1, obtain the movable information of the fore shell for the terminal device for carrying out voice call;
S2, according to the movable information of the fore shell of the terminal device of acquisition, it is determined that user's cheek shakes when carrying out voice call
Dynamic cheek vibration information.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:
In the movable information of the fore shell of the terminal device according to acquisition, it is determined that user's cheek shakes when carrying out voice call
Cheek vibration information after, in addition to:
S1, by the cheek vibration information of acquisition, be converted to information of voltage corresponding with cheek vibration information;
S2, by the information of voltage after conversion, after analog/digital A/D conversions, generation is corresponding with cheek vibration information
Message code.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:
The backup voice call-information of voice call information is simulated according to the user's face characteristic information of acquisition to be included:
S1, according to the cheek vibration information of acquisition, it is determined that the information and for characterizing the cheek of expressing the meaning for characterizing dialog context
The strong and weak shockproofness information of portion's vibrations;
S2, according to the information of expressing the meaning of determination, and shockproofness information, the backup voice of analog voice call-information is conversed
Information.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:
In the case where face feature information includes call Shape of mouth, user plane corresponding with voice call information is obtained
Portion's characteristic information includes:
By the image acquiring device of the terminal device used during voice call, obtain and used when carrying out voice call
The call Shape of mouth of the family call shape of the mouth as one speaks.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:
The backup voice call-information of voice call information is simulated according to the user's face characteristic information of acquisition to be included:
S1, by the call Shape of mouth of acquisition, it is converted into the first voice messaging corresponding with call Shape of mouth;
S2, according to the first voice messaging, determine the backup voice call-information of voice call information.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:
After the backup voice call-information of voice call information is simulated according to the user's face characteristic information of acquisition,
Also include at least one of:
S1, according to user's vocal print feature information corresponding with voice call in voice vocal print feature storehouse, backup voice is led to
The frequency and/or tone color for talking about information are modulated, and obtain voice modulation information corresponding with backup voice call-information;
S2, default background sound effect and backup voice call-information are subjected to audio mixing, generate mixing information;
S3, the second voice messaging obtained after the text information conversion of input is inserted into the pre- of backup voice call-information
If position, the 3rd voice messaging is generated.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:
After the backup voice call-information of voice call information is simulated according to the user's face characteristic information of acquisition,
Also include:
Backup voice call-information is sent to the terminal device for receiving voice call.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:
After the backup voice call-information of voice call information is simulated according to the user's face characteristic information of acquisition,
Also include:
Play backup voice call-information.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:
Playing backup voice call-information includes:
S1, according to backup voice call-information, it is determined that for the control information for controlling the fore shell of terminal device to shake;
S2, according to the control information of determination, the fore shell of terminal device is controlled to be shaken.
Alternatively, in the present embodiment, above-mentioned storage medium can include but is not limited to:USB flash disk, read-only storage (ROM,
Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disc or
CD etc. is various can be with the medium of store program codes.
Alternatively, in the present embodiment, processor performs according to the program code stored in storage medium:Obtain and carry out
The voice call information of voice call;Obtain user's face characteristic information corresponding with voice call information;Believe in voice call
In the case of ceasing None- identified, the backup voice that voice call information is simulated according to the user's face characteristic information of acquisition is conversed
Information.
Alternatively, in the present embodiment, processor performs according to the program code stored in storage medium:Facial special
In the case that reference breath includes cheek vibration information, obtaining user's face characteristic information corresponding with voice call information includes:
Obtain the movable information of the fore shell for the terminal device for carrying out voice call;Believed according to the motion of the fore shell of the terminal device of acquisition
Breath, it is determined that the cheek vibration information that user's cheek shakes when carrying out voice call.
Alternatively, in the present embodiment, processor performs according to the program code stored in storage medium:Obtained in basis
The movable information of the fore shell of the terminal device taken, it is determined that when carrying out voice call user's cheek shake cheek vibration information it
Afterwards, in addition to:By the cheek vibration information of acquisition, information of voltage corresponding with cheek vibration information is converted to;After conversion
Information of voltage, after analog/digital A/D conversions, generate message code corresponding with cheek vibration information.
Alternatively, in the present embodiment, processor performs according to the program code stored in storage medium:According to acquisition
User's face characteristic information simulate the backup voice call-information of voice call information and include:Shaken according to the cheek of acquisition
Information, it is determined that express the meaning information and the shockproofness information strong and weak for characterizing cheek vibrations for characterizing dialog context;According to
The information of expressing the meaning determined, and shockproofness information, the backup voice call-information of analog voice call-information.
Alternatively, in the present embodiment, processor performs according to the program code stored in storage medium:Facial special
In the case that reference breath includes call Shape of mouth, obtaining user's face characteristic information corresponding with voice call information includes:
By the image acquiring device of the terminal device used during voice call, the user's communication mouth when carrying out voice call is obtained
The call Shape of mouth of type.
Alternatively, in the present embodiment, processor performs according to the program code stored in storage medium:According to acquisition
User's face characteristic information simulate the backup voice call-information of voice call information and include:The call shape of the mouth as one speaks of acquisition is believed
Breath, it is converted into the first voice messaging corresponding with call Shape of mouth;According to the first voice messaging, voice call information is determined
Backup voice call-information.
Alternatively, in the present embodiment, processor performs according to the program code stored in storage medium:Obtained in basis
After the user's face characteristic information taken simulates the backup voice call-information of voice call information, in addition to it is following at least it
One:According to user's vocal print feature information corresponding with voice call in voice vocal print feature storehouse, to backup voice call-information
Frequency and/or tone color are modulated, and obtain voice modulation information corresponding with backup voice call-information;By default background sound
Effect carries out audio mixing with backup voice call-information, generates mixing information;The second language that will be obtained after the text information conversion of input
Message ceases the predeterminated position for being inserted into backup voice call-information, generates the 3rd voice messaging.
Alternatively, in the present embodiment, processor performs according to the program code stored in storage medium:Obtained in basis
After the user's face characteristic information taken simulates the backup voice call-information of voice call information, in addition to:By standby language
Sound call-information is sent to the terminal device for receiving voice call.
Alternatively, in the present embodiment, processor performs according to the program code stored in storage medium:Obtained in basis
After the user's face characteristic information taken simulates the backup voice call-information of voice call information, in addition to:Play standby
Voice call information.
Alternatively, in the present embodiment, processor performs according to the program code stored in storage medium:Set in terminal
Standby upper broadcasting backup voice call-information includes:According to backup voice call-information, it is determined that the fore shell for controlling terminal device
The control information of vibrations;According to the control information of determination, the fore shell of terminal device is controlled to be shaken.
Alternatively, the specific example in the present embodiment may be referred to described in above-described embodiment and optional embodiment
Example, the present embodiment will not be repeated here.
Obviously, those skilled in the art should be understood that above-mentioned each module of the invention or each step can be with general
Computing device realize that they can be concentrated on single computing device, or be distributed in multiple computing devices and formed
Network on, alternatively, they can be realized with the program code that computing device can perform, it is thus possible to they are stored
Performed in the storage device by computing device, and in some cases, can be with different from shown in order execution herein
The step of going out or describing, they are either fabricated to each integrated circuit modules respectively or by multiple modules in them or
Step is fabricated to single integrated circuit module to realize.So, the present invention is not restricted to any specific hardware and software combination.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area
For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies
Change, equivalent substitution, improvement etc., should be included in the scope of the protection.
Claims (21)
- A kind of 1. method of speech processing, it is characterised in that including:Obtain the voice call information for carrying out voice call;Obtain user's face characteristic information corresponding with the voice call information;In the case of the voice call information None- identified, institute is simulated according to the user's face characteristic information of acquisition The backup voice call-information of predicate sound call-information.
- 2. according to the method for claim 1, it is characterised in that include cheek vibration information in the face feature information In the case of, obtaining the user's face characteristic information corresponding with the voice call information includes:Obtain the movable information of the fore shell for the terminal device for carrying out the voice call;According to the movable information of the fore shell of the terminal device of acquisition, it is determined that being used when carrying out the voice call The cheek vibration information of family cheek vibrations.
- 3. according to the method for claim 2, it is characterised in that in the fore shell of the terminal device according to acquisition The movable information, it is determined that after the cheek vibration information that user's cheek shakes when carrying out the voice call, also wrap Include:By the cheek vibration information of acquisition, information of voltage corresponding with the cheek vibration information is converted to;By the information of voltage after conversion, after analog/digital A/D conversions, generation is corresponding with the cheek vibration information Message code.
- 4. according to the method for claim 2, it is characterised in that simulated according to the user's face characteristic information of acquisition The backup voice call-information of the voice call information includes:According to the cheek vibration information of acquisition, it is determined that for characterize dialog context express the meaning information and for characterize cheek shake Move strong and weak shockproofness information;Expressed the meaning according to determination information, and the shockproofness information, simulate the described standby of the voice call information With voice call-information.
- 5. according to the method for claim 1, it is characterised in that include call Shape of mouth in the face feature information In the case of, obtaining user's face characteristic information corresponding with the voice call information includes:By the image acquiring device of the terminal device used during the voice call, obtain and carrying out the voice call When the user's communication shape of the mouth as one speaks call Shape of mouth.
- 6. according to the method for claim 5, it is characterised in that simulated according to the user's face characteristic information of acquisition The backup voice call-information of the voice call information includes:By the call Shape of mouth of acquisition, the first voice messaging corresponding with the call Shape of mouth is converted into;According to first voice messaging, the backup voice call-information of the voice call information is determined.
- 7. according to the method for claim 1, it is characterised in that simulated according to the user's face characteristic information of acquisition Go out after the backup voice call-information of the voice call information, in addition at least one of:According to user's vocal print feature information corresponding with the voice call in voice vocal print feature storehouse, the backup voice is led to The frequency and/or tone color for talking about information are modulated, and obtain voice modulation information corresponding with the backup voice call-information;Default background sound effect and the backup voice call-information are subjected to audio mixing, generate mixing information;The second voice messaging obtained after the text information conversion of input is inserted into the default of the backup voice call-information Position, generate the 3rd voice messaging.
- 8. method according to any one of claim 1 to 7, it is characterised in that in the user's face according to acquisition After characteristic information simulates the backup voice call-information of the voice call information, in addition to:The backup voice call-information is sent to the terminal device for receiving the voice call.
- 9. according to the method for claim 1, it is characterised in that simulated according to the user's face characteristic information of acquisition Go out after the backup voice call-information of the voice call information, in addition to:Play the backup voice call-information.
- 10. according to the method for claim 9, it is characterised in that playing the backup voice call-information includes:According to the backup voice call-information, it is determined that for the control information for controlling the fore shell of terminal device to shake;According to the control information of determination, the fore shell of the terminal device is controlled to be shaken.
- A kind of 11. voice processing apparatus, it is characterised in that including:First acquisition module, the voice call information of voice call is carried out for obtaining;Second acquisition module, for obtaining user's face characteristic information corresponding with the voice call information;Analog module, it is special according to the user's face of acquisition in the case of the voice call information None- identified Reference ceases the backup voice call-information for simulating the voice call information.
- 12. device according to claim 11, it is characterised in that second acquisition module includes:First acquisition unit, in the case of including cheek vibration information in the face feature information, obtain described in carrying out The movable information of the fore shell of the mobile terminal of voice call;First determining unit, for the movable information of the fore shell of the mobile terminal according to acquisition, it is determined that entering The cheek vibration information of user's cheek vibrations during the row voice call.
- 13. device according to claim 12, it is characterised in that second acquisition module also includes:Converting unit, for by the cheek vibration information of acquisition, being converted to voltage corresponding with the cheek vibration information Information;Generation unit, for by the information of voltage after conversion, after analog/digital A/D conversions, generation and the cheek Message code corresponding to vibration information.
- 14. device according to claim 12, it is characterised in that the analog module includes:Second determining unit, for the cheek vibration information according to acquisition, it is determined that the letter of expressing the meaning for characterizing dialog context Breath and the shockproofness information strong and weak for characterizing cheek vibrations;Analogue unit, for information of being expressed the meaning according to determination, and the shockproofness information, simulate the voice call The backup voice call-information of information.
- 15. device according to claim 11, it is characterised in that second acquisition module includes:Second acquisition unit, in the case of including call Shape of mouth in the face feature information, by described in progress The image acquiring device of the mobile terminal used during voice call, obtain the user's communication shape of the mouth as one speaks when carrying out the voice call Call Shape of mouth.
- 16. device according to claim 15, it is characterised in that the analog module includes:Conversion unit, for by the call Shape of mouth of acquisition, being converted into and the call Shape of mouth corresponding first Voice messaging;3rd determining unit, for according to first voice messaging, determining the backup voice of the voice call information Call-information.
- 17. device according to claim 11, it is characterised in that also including at least one of:Modulation module, it is right for according to user's vocal print feature information corresponding with the voice call in voice vocal print feature storehouse The frequency and/or tone color of the backup voice call-information are modulated, and are obtained corresponding with the backup voice call-information Voice modulation information;Mix module, for default background sound effect and the backup voice call-information to be carried out into audio mixing, generate mixing information;Generation module, the second voice messaging for being obtained after the text information conversion by input are inserted into the backup voice and led to The predeterminated position of information is talked about, generates the 3rd voice messaging.
- 18. the device according to any one of claim 11 to 17, it is characterised in that also include:Sending module, for the backup voice call-information to be sent to the opposite equip. of the voice call.
- 19. device according to claim 11, it is characterised in that also include:Playing module, for playing the backup voice call-information.
- 20. device according to claim 19, it is characterised in that the playing module includes:4th determining unit, for according to the backup voice call-information, it is determined that for controlling the fore shell of mobile terminal to shake Control information;Control unit, for the control information according to determination, the fore shell of the mobile terminal is controlled to be shaken.
- A kind of 21. terminal device, it is characterised in that including:Device any one of claim 11 to 20.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610811985.5A CN107800860A (en) | 2016-09-07 | 2016-09-07 | Method of speech processing, device and terminal device |
PCT/CN2017/071139 WO2018045703A1 (en) | 2016-09-07 | 2017-01-13 | Voice processing method, apparatus and terminal device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610811985.5A CN107800860A (en) | 2016-09-07 | 2016-09-07 | Method of speech processing, device and terminal device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107800860A true CN107800860A (en) | 2018-03-13 |
Family
ID=61531013
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610811985.5A Pending CN107800860A (en) | 2016-09-07 | 2016-09-07 | Method of speech processing, device and terminal device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107800860A (en) |
WO (1) | WO2018045703A1 (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108965600A (en) * | 2018-07-24 | 2018-12-07 | Oppo(重庆)智能科技有限公司 | Voice pick-up method and Related product |
CN110415698A (en) * | 2018-11-15 | 2019-11-05 | 腾讯科技(深圳)有限公司 | Artificial intelligence data detection method and device and storage medium |
CN111447325A (en) * | 2020-04-03 | 2020-07-24 | 上海闻泰电子科技有限公司 | Call auxiliary method, device, terminal and storage medium |
WO2020172828A1 (en) * | 2019-02-27 | 2020-09-03 | 华为技术有限公司 | Sound source separating method, apparatus and device |
CN112820274A (en) * | 2021-01-08 | 2021-05-18 | 上海仙剑文化传媒股份有限公司 | Voice information recognition correction method and system |
CN113345436A (en) * | 2021-08-05 | 2021-09-03 | 创维电器股份有限公司 | Remote voice recognition control system and method based on multi-system integration high recognition rate |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102104651A (en) * | 2009-12-22 | 2011-06-22 | 康佳集团股份有限公司 | Method for playing reserved voice in incoming call reception of mobile terminal and mobile terminal |
CN201986001U (en) * | 2010-12-31 | 2011-09-21 | 上海华勤通讯技术有限公司 | Mouth shape identification input mobile terminal |
CN202329640U (en) * | 2011-08-19 | 2012-07-11 | 广东好帮手电子科技股份有限公司 | System for applying auxiliary voice recognition technology by mouth shape in vehicular navigation |
CN102324035A (en) * | 2011-08-19 | 2012-01-18 | 广东好帮手电子科技股份有限公司 | Method and system of applying lip posture assisted speech recognition technique to vehicle navigation |
CN106157956A (en) * | 2015-03-24 | 2016-11-23 | 中兴通讯股份有限公司 | The method and device of speech recognition |
CN106157957A (en) * | 2015-04-28 | 2016-11-23 | 中兴通讯股份有限公司 | Audio recognition method, device and subscriber equipment |
-
2016
- 2016-09-07 CN CN201610811985.5A patent/CN107800860A/en active Pending
-
2017
- 2017-01-13 WO PCT/CN2017/071139 patent/WO2018045703A1/en active Application Filing
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108965600A (en) * | 2018-07-24 | 2018-12-07 | Oppo(重庆)智能科技有限公司 | Voice pick-up method and Related product |
CN108965600B (en) * | 2018-07-24 | 2021-05-04 | Oppo(重庆)智能科技有限公司 | Voice pickup method and related product |
CN110415698A (en) * | 2018-11-15 | 2019-11-05 | 腾讯科技(深圳)有限公司 | Artificial intelligence data detection method and device and storage medium |
WO2020172828A1 (en) * | 2019-02-27 | 2020-09-03 | 华为技术有限公司 | Sound source separating method, apparatus and device |
CN111447325A (en) * | 2020-04-03 | 2020-07-24 | 上海闻泰电子科技有限公司 | Call auxiliary method, device, terminal and storage medium |
CN112820274A (en) * | 2021-01-08 | 2021-05-18 | 上海仙剑文化传媒股份有限公司 | Voice information recognition correction method and system |
CN112820274B (en) * | 2021-01-08 | 2021-09-28 | 上海仙剑文化传媒股份有限公司 | Voice information recognition correction method and system |
CN113345436A (en) * | 2021-08-05 | 2021-09-03 | 创维电器股份有限公司 | Remote voice recognition control system and method based on multi-system integration high recognition rate |
CN113345436B (en) * | 2021-08-05 | 2021-11-12 | 创维电器股份有限公司 | Remote voice recognition control system and method based on multi-system integration high recognition rate |
Also Published As
Publication number | Publication date |
---|---|
WO2018045703A1 (en) | 2018-03-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107800860A (en) | Method of speech processing, device and terminal device | |
US7664645B2 (en) | Individualization of voice output by matching synthesized voice target voice | |
CN107134286A (en) | ANTENNAUDIO player method, music player and storage medium based on interactive voice | |
JP2006510249A (en) | Avatar database for mobile video communication | |
CN111294463B (en) | Intelligent response method and system | |
US20080151786A1 (en) | Method and apparatus for hybrid audio-visual communication | |
CN107993646A (en) | A kind of method for realizing real-time voice intertranslation | |
WO2023098332A1 (en) | Audio processing method, apparatus and device, medium, and program product | |
CN102420897B (en) | Mobile phone communication information transmitting method and device | |
CN109451329B (en) | Audio mixing processing method and device | |
CN206339975U (en) | A kind of talkback unit for realizing real-time voice intertranslation | |
CN113784163A (en) | Live wheat-connecting method and related equipment | |
JP2008085421A (en) | Video telephone, calling method, program, voice quality conversion-image editing service providing system, and server | |
CN113194203A (en) | Communication system, answering and dialing method and communication system for hearing-impaired people | |
JP2014167517A (en) | Conversation providing system, game providing system, conversation providing method, game providing method, and program | |
US11741984B2 (en) | Method and apparatus and telephonic system for acoustic scene conversion | |
US11580954B2 (en) | Systems and methods of handling speech audio stream interruptions | |
CN101621482A (en) | Audio and video mail box device and realization method | |
CN112565668B (en) | Method for sharing sound in network conference | |
CN111787169B (en) | Three-party call terminal for mobile man-machine cooperation calling robot | |
CN101207500B (en) | Method for acoustic frequency data inflexion | |
JP4504216B2 (en) | Image processing apparatus and image processing program | |
JP2007251581A (en) | Voice transmission terminal and voice reproduction terminal | |
CN113938553B (en) | Audio equipment and sound effect adjusting method thereof | |
JP2003316375A (en) | Distributed dictation system, program, and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180313 |