CN107800860A - Method of speech processing, device and terminal device - Google Patents

Method of speech processing, device and terminal device Download PDF

Info

Publication number
CN107800860A
CN107800860A CN201610811985.5A CN201610811985A CN107800860A CN 107800860 A CN107800860 A CN 107800860A CN 201610811985 A CN201610811985 A CN 201610811985A CN 107800860 A CN107800860 A CN 107800860A
Authority
CN
China
Prior art keywords
information
voice call
voice
call
acquisition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610811985.5A
Other languages
Chinese (zh)
Inventor
罗雨来
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201610811985.5A priority Critical patent/CN107800860A/en
Priority to PCT/CN2017/071139 priority patent/WO2018045703A1/en
Publication of CN107800860A publication Critical patent/CN107800860A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72433User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for voice messaging, e.g. dictaphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/725Cordless telephones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • G10L15/25Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Telephone Function (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention provides a kind of method of speech processing, device and terminal device, wherein, this method includes:Obtain the voice call information for carrying out voice call;Obtain user's face characteristic information corresponding with the voice call information;In the case of the voice call information None- identified, the backup voice call-information of the voice call information is simulated according to the user's face characteristic information of acquisition.By the present invention, solve the improving countermeasure of voice call in correlation technique, exist background noise it is serious when voice resolution difference the problem of, reached the effect for improving voice call quality.

Description

Method of speech processing, device and terminal device
Technical field
The present invention relates to the communications field, in particular to a kind of method of speech processing, device and terminal device.
Background technology
When carrying out audio or video calling on the mobile terminal device, in communication background noise serious situation Under, the resolution of voice is just very poor, has a strong impact on the progress of voice communication.
At present, existing voice call improving environment on the mobile terminal device, mainly passing through diamylose gram reduces background Scheme of noise etc..Scheme for reducing ambient noise above by diamylose gram mode, when ambient noise is bigger or noise During environment complexity, noise reduction is bad, also recipient's speech loudness can be caused substantially to reduce, or even is difficult to differentiate.In addition, the party Case is high to device coherence request, high to topology layout, seal request, the problems such as also causing product cost to rise.
Therefore, in correlation technique voice call improving countermeasure, exist background noise it is serious when voice resolution difference Problem.
The content of the invention
The embodiments of the invention provide method of speech processing, device and terminal device, at least to solve language in correlation technique Sound call improving countermeasure, exist background noise it is serious when voice resolution difference the problem of.
According to one embodiment of present invention, there is provided a kind of method of speech processing, including:Obtain and carry out voice call Voice call information;Obtain user's face characteristic information corresponding with the voice call information;In the voice call information In the case of None- identified, the standby language of the voice call information is simulated according to the user's face characteristic information of acquisition Sound call-information.
Alternatively, in the case where the face feature information includes cheek vibration information, obtain and the voice call The user's face characteristic information includes corresponding to information:Obtain the motion of the fore shell for the terminal device for carrying out the voice call Information;According to the movable information of the fore shell of the terminal device of acquisition, it is determined that when carrying out the voice call The cheek vibration information of user's cheek vibrations.
Alternatively, the terminal device according to acquisition the fore shell the movable information, it is determined that carry out institute After the cheek vibration information that user's cheek shakes when stating voice call, in addition to:The cheek of acquisition is shaken into letter Breath, is converted to information of voltage corresponding with the cheek vibration information;By the information of voltage after conversion, by simulation/number After word A/D conversions, message code corresponding with the cheek vibration information is generated.
Alternatively, the described standby of the voice call information is simulated according to the user's face characteristic information of acquisition Voice call information includes:According to the cheek vibration information of acquisition, it is determined that for characterize dialog context express the meaning information and The shockproofness information strong and weak for characterizing cheek vibrations;Expressed the meaning according to determination information, and shockproofness letter Breath, simulate the backup voice call-information of the voice call information.
Alternatively, in the case where the face feature information includes call Shape of mouth, obtain and the voice call User's face characteristic information includes corresponding to information:Obtained by the image of the terminal device used during the voice call Device, obtain the call Shape of mouth of the user's communication shape of the mouth as one speaks when carrying out the voice call.
Alternatively, the described standby of the voice call information is simulated according to the user's face characteristic information of acquisition Voice call information includes:By the call Shape of mouth of acquisition, it is converted into and the call Shape of mouth corresponding first Voice messaging;According to first voice messaging, the backup voice call-information of the voice call information is determined.
Alternatively, the described standby of the voice call information is simulated in the user's face characteristic information according to acquisition After voice call-information, in addition at least one of:According to corresponding with the voice call in voice vocal print feature storehouse User's vocal print feature information, the frequency and/or tone color of the backup voice call-information are modulated, obtain with it is described standby The voice modulation information corresponding to voice call-information;Default background sound effect and the backup voice call-information are mixed Sound, generate mixing information;The second voice messaging obtained after the text information conversion of input is inserted into the backup voice to lead to The predeterminated position of information is talked about, generates the 3rd voice messaging.
Alternatively, the described standby of the voice call information is simulated in the user's face characteristic information according to acquisition After voice call-information, in addition to:The backup voice call-information is sent to the terminal for receiving the voice call Equipment.
Alternatively, the described standby of the voice call information is simulated in the user's face characteristic information according to acquisition After voice call-information, in addition to:Play the backup voice call-information.
Alternatively, playing the backup voice call-information includes:According to the backup voice call-information, it is determined that being used for Control the control information of the fore shell vibrations of terminal device;According to the control information of determination, the institute of the terminal device is controlled Fore shell is stated to be shaken.
According to another embodiment of the invention, there is provided a kind of voice processing apparatus, including:First acquisition module, use In the voice call information for obtaining progress voice call;Second acquisition module, it is corresponding with the voice call information for obtaining User's face characteristic information;Analog module, in the case of the voice call information None- identified, according to acquisition The user's face characteristic information simulates the backup voice call-information of the voice call information.
Alternatively, second acquisition module includes:First acquisition unit, for including the cheek in the face feature information In the case of portion's vibration information, the movable information of the fore shell for the mobile terminal for carrying out the voice call is obtained;First determines list Member, for the movable information of the fore shell of the mobile terminal according to acquisition, it is determined that carrying out the voice call When the vibrations of user cheek cheek vibration information.
Alternatively, second acquisition module also includes:Converting unit, for by the cheek vibration information of acquisition, Be converted to information of voltage corresponding with the cheek vibration information;Generation unit, for by the information of voltage after conversion, warp After crossing analog/digital A/D conversions, message code corresponding with the cheek vibration information is generated.
Alternatively, the analog module includes:Second determining unit, for the cheek vibration information according to acquisition, It is determined that express the meaning information and the shockproofness information strong and weak for characterizing cheek vibrations for characterizing dialog context;Analogue unit, For information of being expressed the meaning according to determination, and the shockproofness information, the described standby of the voice call information is simulated With voice call-information.
Alternatively, the acquisition module includes:Second acquisition unit, for including call mouth in the face feature information In the case of type information, by the image acquiring device of the mobile terminal used during the voice call, obtain and carrying out The call Shape of mouth of the user's communication shape of the mouth as one speaks during voice call.
Alternatively, the analog module includes:Conversion module, for by the call Shape of mouth of acquisition, being converted into The first voice messaging corresponding with the call Shape of mouth;3rd determining module, for according to first voice messaging, really The backup voice call-information of the fixed voice call information.
Alternatively, described device also includes at least one of:Modulation module, for according in voice vocal print feature storehouse with User's vocal print feature information corresponding to the voice call, frequency and/or tone color to the backup voice call-information are carried out Modulation, obtain voice modulation information corresponding with the backup voice call-information;Mix module, for by default background sound Effect carries out audio mixing with the backup voice call-information, generates mixing information;Generation module, for the text information of input to be turned The second voice messaging obtained after changing is inserted into the predeterminated position of the backup voice call-information, generates the 3rd voice messaging.
Alternatively, described device also includes:Sending module, for the backup voice call-information to be sent into institute's predicate The opposite equip. of sound call.
Alternatively, described device also includes:Playing module, play the backup voice call-information.
Alternatively, the playing module includes:4th determining unit, for according to the backup voice call-information, really It is fixed to be used to control the control information that the fore shell of mobile terminal shakes;Control unit, for the control information according to determination, control The fore shell for making the mobile terminal is shaken.
According to still another embodiment of the invention, a kind of terminal device is additionally provided, the terminal device includes foregoing Device described in one.
According to still another embodiment of the invention, a kind of storage medium is additionally provided.The storage medium is arranged to storage and used In the program code for performing following steps:Obtain the voice call information for carrying out voice call;Obtain and believe with the voice call User's face characteristic information corresponding to breath;In the case of the voice call information None- identified, according to the use of acquisition Family face feature information simulates the backup voice call-information of the voice call information.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:In the facial characteristics In the case that information includes cheek vibration information, the user's face characteristic information corresponding with the voice call information is obtained Including:Obtain the movable information of the fore shell for the terminal device for carrying out the voice call;According to the terminal device of acquisition The movable information of the fore shell, it is determined that the cheek vibration information that user's cheek shakes when carrying out the voice call.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:In the institute according to acquisition The movable information of the fore shell of terminal device is stated, it is determined that when carrying out the voice call described in the vibrations of user's cheek After cheek vibration information, in addition to:By the cheek vibration information of acquisition, be converted to corresponding with the cheek vibration information Information of voltage;By the information of voltage after conversion, after analog/digital A/D conversions, generation is believed with cheek vibrations Message code corresponding to breath.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:According to acquisition The backup voice call-information that user's face characteristic information simulates the voice call information includes:According to the institute of acquisition Cheek vibration information is stated, it is determined that express the meaning information and the shockproofness strong and weak for characterizing cheek vibrations for characterizing dialog context Information;Expressed the meaning according to determination information, and the shockproofness information, simulate the described standby of the voice call information With voice call-information.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:In the facial characteristics In the case that information includes call Shape of mouth, user's face characteristic information bag corresponding with the voice call information is obtained Include:By the image acquiring device of the terminal device used during the voice call, obtain and carrying out the voice call When the user's communication shape of the mouth as one speaks call Shape of mouth.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:According to acquisition The backup voice call-information that user's face characteristic information simulates the voice call information includes:By described in acquisition Call Shape of mouth, it is converted into the first voice messaging corresponding with the call Shape of mouth;According to first voice messaging, Determine the backup voice call-information of the voice call information.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:In the institute according to acquisition After stating the backup voice call-information that user's face characteristic information simulates the voice call information, in addition to it is following At least one:According to user's vocal print feature information corresponding with the voice call in voice vocal print feature storehouse, to described standby The frequency and/or tone color of voice call information are modulated, and obtain voice modulation corresponding with the backup voice call-information Information;Default background sound effect and the backup voice call-information are subjected to audio mixing, generate mixing information;By the word of input The second voice messaging obtained after information conversion is inserted into the predeterminated position of the backup voice call-information, generates the 3rd voice Information.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:In the institute according to acquisition After stating the backup voice call-information that user's face characteristic information simulates the voice call information, in addition to:Will The backup voice call-information is sent to the terminal device for receiving the voice call.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:In the institute according to acquisition After stating the backup voice call-information that user's face characteristic information simulates the voice call information, in addition to:Broadcast Put the backup voice call-information.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:Play the standby language Sound call-information includes:According to the backup voice call-information, it is determined that for the control for controlling the fore shell of terminal device to shake Information;According to the control information of determination, the fore shell of the terminal device is controlled to be shaken.
By the present invention, in the case of voice call information None- identified, according to the user's face characteristic information of acquisition The backup voice call-information of user speech call-information is simulated, due to believing according to user's face characteristic information voice call Breath be simulated backup, therefore, can solve the improving countermeasure of voice call in correlation technique, exist background noise it is serious when The problem of resolution difference of voice, reach the effect for improving voice call quality.
Brief description of the drawings
Accompanying drawing described herein is used for providing a further understanding of the present invention, forms the part of the application, this hair Bright schematic description and description is used to explain the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is a kind of hardware block diagram of the mobile terminal of method of speech processing of the embodiment of the present invention;
Fig. 2 is the flow chart of method of speech processing according to embodiments of the present invention;
Fig. 3 is voice communication accessory system using process diagram according to the preferred embodiment of the invention;
Fig. 4 is cheek vibrations pickup transition diagram according to the preferred embodiment of the invention;
Fig. 5 is the generation schematic diagram of phonetic synthesis code according to the preferred embodiment of the invention;
Fig. 6 is the generating process schematic diagram in vocal print phonetic feature storehouse according to the preferred embodiment of the invention;
Fig. 7 is the course of work schematic diagram of addition background sound effect according to the preferred embodiment of the invention;
Fig. 8 is text-to-speech communication scheme according to the preferred embodiment of the invention;
Fig. 9 is voice communication accessory system using process diagram according to the preferred embodiment of the invention;
Figure 10 is Mouth-Shape Recognition system work process schematic diagram according to the preferred embodiment of the invention;
Figure 11 is the generation signal of voice communication accessory system local voice feature database according to the preferred embodiment of the invention Figure;
Figure 12 is voice communication accessory system text-to-speech communication scheme according to the preferred embodiment of the invention;
Figure 13 is the system converting voice insertion system signal of single sending end Mouth-Shape Recognition according to the preferred embodiment of the invention Figure;
Figure 14 is the structured flowchart one of voice processing apparatus according to embodiments of the present invention;
The structured flowchart one of second acquisition module 144 of voice processing apparatus according to embodiments of the present invention Figure 15;
The structured flowchart two of second acquisition module 144 of voice processing apparatus according to embodiments of the present invention Figure 16;
The structured flowchart one of the analog module 146 of voice processing apparatus according to embodiments of the present invention Figure 17;
The structured flowchart three of second acquisition module 144 of voice processing apparatus according to embodiments of the present invention Figure 18;
The structured flowchart two of the analog module 146 of voice processing apparatus according to embodiments of the present invention Figure 19;
Figure 20 is the structured flowchart two of voice processing apparatus according to embodiments of the present invention;
Figure 21 is the structured flowchart three of voice processing apparatus according to embodiments of the present invention;
Figure 22 is the structured flowchart four of voice processing apparatus according to embodiments of the present invention;
The structured flowchart of the playing module 222 of voice processing apparatus according to embodiments of the present invention Figure 23;
Figure 24 is the structured flowchart of terminal device according to embodiments of the present invention.
Embodiment
Describe the present invention in detail below with reference to accompanying drawing and in conjunction with the embodiments.It should be noted that do not conflicting In the case of, the feature in embodiment and embodiment in the application can be mutually combined.
It should be noted that term " first " in description and claims of this specification and above-mentioned accompanying drawing, " Two " etc. be for distinguishing similar object, without for describing specific order or precedence.
Embodiment 1
The embodiment of the method that the embodiment of the present application one is provided can be at mobile terminal, terminal or similar end Performed in end equipment.Exemplified by running on mobile terminals, Fig. 1 is a kind of movement of method of speech processing of the embodiment of the present invention The hardware block diagram of terminal.Handled as shown in figure 1, mobile terminal 10 can include one or more (one is only shown in figure) Device 12 (processor 12 can include but is not limited to Micro-processor MCV or PLD FPGA etc. processing unit), use Memory 14 in data storage and the transmitting device 16 for communication function.It will appreciated by the skilled person that Structure shown in Fig. 1 is only to illustrate, and it does not cause to limit to the structure of above-mentioned electronic installation.For example, mobile terminal 10 may be used also Including more either less components than shown in Fig. 1 or there is the configuration different from shown in Fig. 1.
Memory 14 can be used for the software program and module of storage application software, at the voice in the embodiment of the present invention Programmed instruction/module corresponding to reason method, processor 12 are stored in software program and module in memory 14 by operation, So as to perform various function application and data processing, that is, realize above-mentioned method.Memory 14 may include high speed random storage Device, nonvolatile memory is may also include, such as one or more magnetic storage device, flash memory or other are non-volatile solid State memory.In some instances, memory 14 can further comprise relative to the remotely located memory of processor 12, these Remote memory can pass through network connection to mobile terminal 10.The example of above-mentioned network includes but is not limited to internet, enterprise In-house network, LAN, mobile radio communication and combinations thereof.
Transmitting device 16 is used to data are received or sent via a network.Above-mentioned network instantiation may include to move The wireless network that the communication providerses of dynamic terminal 10 provide.In an example, transmitting device 16 includes a network adapter (Network Interface Controller, NIC), its can be connected by base station with other network equipments so as to interconnection Net is communicated.In an example, transmitting device 16 can be radio frequency (Radio Frequency, RF) module, and it is used to lead to Wireless mode is crossed to be communicated with internet.
A kind of method of speech processing for running on above-mentioned mobile terminal is provided in the present embodiment, and Fig. 2 is according to this hair The flow chart of the method for speech processing of bright embodiment, as shown in Fig. 2 the flow comprises the following steps:
Step S202, obtain the voice call information for carrying out voice call;
Step S204, obtain user's face characteristic information corresponding with voice call information;
Step S206, in the case of voice call information None- identified, according to the user's face characteristic information mould of acquisition Draw up the backup voice call-information of voice call information.
By above-mentioned steps, in the case of voice call information None- identified, believed according to the user's face feature of acquisition Breath simulates the backup voice call-information of user speech call-information, solves the improvement side of voice call in correlation technique Case, exist background noise it is serious when voice resolution difference the problem of, improve voice call quality.
Alternatively, user's face characteristic information can include a variety of, for example, cheek vibration information, Shape of mouth of conversing.
Alternatively, face feature information include cheek vibration information in the case of, can obtain in the following way with User's face characteristic information corresponding to voice call information:Obtain the motion letter of the fore shell for the terminal device for carrying out voice call Breath, according to the movable information of acquisition, it is determined that the cheek vibration information that user's cheek shakes when carrying out voice call.Determine the cheek Portion's vibration information, user's face characteristic information is also determined that.
Alternatively, it is determined that after the cheek vibration information, the cheek vibration information of acquisition can also be located as follows Reason:By the cheek vibration information of acquisition, information of voltage corresponding with cheek vibration information is converted to;Voltage after conversion is believed Breath, after analog/digital A/D conversions, generate message code corresponding with cheek vibration information.
Alternatively, the backup voice call-information of voice call information is simulated according to the user's face characteristic information of acquisition It can include:According to the cheek vibration information of acquisition, it is determined that the information and for characterizing cheek of expressing the meaning for characterizing dialog context Strong and weak shockproofness information is shaken, according to the information of expressing the meaning of determination, and shockproofness information, analog voice call-information Backup voice call-information.
Alternatively, in the case where face feature information includes call Shape of mouth, obtain corresponding with voice call information User's face characteristic information can include:By the image acquiring device of the terminal device used during voice call, obtain Take the call Shape of mouth of the user's communication shape of the mouth as one speaks when carrying out voice call.
Alternatively, the backup voice call-information of voice call information is simulated according to the user's face characteristic information of acquisition It can include:By the call Shape of mouth of acquisition, the first voice messaging corresponding with call Shape of mouth is converted into;According to first Voice messaging, determine the backup voice call-information of voice call information.
Alternatively, converse and believe in the backup voice that voice call information is simulated according to the user's face characteristic information of acquisition After breath, backup voice can be led to according to user's vocal print feature information corresponding with voice call in voice vocal print feature storehouse The frequency and/or tone color for talking about information are modulated, and voice modulation information corresponding with backup voice call-information are obtained, by right Backup voice information is modulated, and can be generated the call-information for meeting calling user voice characteristic, be improved Consumer's Experience.
Alternatively, converse and believe in the backup voice that voice call information is simulated according to the user's face characteristic information of acquisition After breath, default background sound effect and backup voice call-information can be subjected to audio mixing, generate mixing information.
Alternatively, converse and believe in the backup voice that voice call information is simulated according to the user's face characteristic information of acquisition After breath, the second voice messaging obtained after the text information conversion of input can be inserted into the pre- of backup voice call-information If position, the 3rd voice messaging is generated, meets different user's requests.
Alternatively, after execution of step S206, backup voice call-information can be sent to reception voice call Terminal device, can also will play backup voice call-information.Here operation is set according to being actually needed for user, For example, being simulated in transmitting terminal, played in receiving terminal, or simulated and played in receiving terminal.
Alternatively, playing backup voice call-information on the terminal device includes:According to backup voice call-information, it is determined that The control information shaken for the fore shell for controlling terminal device, according to the control information of determination, the fore shell of terminal device is controlled to enter Row vibrations., can be larger in ambient noise by the way that voice messaging to be converted into the control information of control terminal device fore shell vibrations When also can clearly hear voice messaging.
Based on above-described embodiment and preferred embodiment, to illustrate that the whole flow process of scheme interacts, in this preferred embodiment In, there is provided a kind of method of speech processing.It is characterized as that cheek vibrations, the shape of the mouth as one speaks illustrate this preferred embodiment to face separately below The flow of method of speech processing.It should be noted that in the method for speech processing, mobile terminal is said by taking mobile phone as an example It is bright.
The scheme that first vibrations of identification cheek to the voice of voice call process check with correction illustrates.
In view of mobile phone user in the case where carrying out voice call or similar scene, speaker can make mobile phone tight with cheek Contiguity is touched, and a kind of terminal speech communication assistance system is provided in this preferred embodiment, and the voice in voice communication can be sent out Cheek when speaking for the person of going out is shaken while is identified, and the voice to above-mentioned voice communication course is shaken using the cheek of identification Checked, corrected, accurately identify the above-mentioned voice messaging sent, and the voice for making to send after synthesizing by this information becomes Accurately and clearly, while background noise can be eliminated, reach the purpose of digital noise reduction.
Usually, the cheek vibrations when voice person of sending in voice communication speaks, are usually glass by mobile phone terminal fore shell Glass plate is conducted.Vibrations are further conducted into kinetic energy voltage transformation module, are carried out kinetic energy to the conversion of voltage, are then entered again One step carries out analog/digital (Analog/Digital, referred to as A/D) conversion, is converted into digital code.Then further give birth to Into the information correction code (that is, the information correction code being corrected to the voice messaging of voice call) of foregoing cheek vibrations.
Terminal speech communication assistance system provided in this preferred embodiment, it can also use in above-mentioned voice communication The local voice vocal print feature storehouse that the voice person of sending generates in low environment noise, to the sound of the voice person of sending in voice communication Voice frequency curvilinear characteristic has carried out identification and preserved, and is used during phonetic synthesis so that finally synthesizes the voice sent Keep the voice tone color of the voice person of sending in voice communication.
The method of speech processing of this preferred embodiment is described in detail below.
Fig. 3 is voice communication accessory system using process diagram according to the preferred embodiment of the invention, as shown in figure 3, After call starts, if now ambient noise is bigger, originator caller (caller A) opens voice communication accessory system System.When the voice of originator can not be identified, the cheek vibrations to first speaker are identified, message code corresponding to generation (the message code streams of cheek vibrations), the voice call for the caller A being now collected into, also generate corresponding message code (language The message code stream of sound call), above-mentioned two message code stream is contrasted, to ensure the call voice in now noise circumstance The information information expressed with cheek vibrations is basically identical.
If it is determined that in the case that above-mentioned two information is consistent, by message code stream (the information generation of cheek vibrations now Code stream and the message code stream of voice call) synthesize voice, it is special using caller A local voice during synthesis Storehouse is levied, the frequency and tone color of synthesized voice are modulated so that the voice of synthesis keeps caller's A tamber characteristics.Now synthesize Voice be no ambient noise it is relatively good in the case of pure digital voice, reached the purpose of digital noise reduction.
The vocal print storehouse of the local voice feature database of caller A described in Fig. 3, i.e. caller A, can be that this passes through words During in-time generatin, it is also possible to be the mode of advance typing, for example, typing during start, or transfer start vocal print password Characteristic information storehouse.
Fig. 4 is cheek vibrations pickup transition diagram according to the preferred embodiment of the invention.Usually, sound vibrations are in glass Propagation performance in glass plate is better than aerial propagation.As shown in figure 4, when the voice person of sending in voice communication speaks Cheek shakes, and is usually conducted by the fore shell glass plate of mobile phone terminal;Vibrations are further conducted into kinetic energy voltage conversion Module, conversion of the kinetic energy to voltage is carried out, be then further carried out A/D conversions, be converted into digital code;Then further The message code of foregoing cheek vibrations is generated, in case correction uses.
Fig. 5 is the generation schematic diagram of phonetic synthesis code according to the preferred embodiment of the invention.As shown in figure 5, caller A is said The real-time phonetic and cheek vibrating state sent during words is identified respectively, and the pickup of in general cheek vibration signal uses mobile phone Fore shell glass and kinetic energy voltage conversion component, by picking up, changing, the process such as identifying, the two states identified respectively, respectively It is identified as two kinds of features again:One kind is representation features, the generation of corresponding condition code of expressing the meaning;Another kind is that corresponding sound size is high The strong and weak feature of low and cheek vibrations, the generation of corresponding loudness condition code.Real-time phonetic and cheek vibrations two states difference Two kinds of feature codes of generation, referred to as message code corresponding to two states.Now two kinds of message codes are contrasted, if If consistent, you can the synthesis that two kinds of message codes now are used for carrying out to voice acts.
Above two feature represents two dimensions respectively, for example, M grade, N number of grade, form M × N matrix, encode When carried out using bidimensional coding is unified.Above-mentioned M, N number can be according to the requirements to speech quality grade, and processing speed etc. Factor carries out comprehensive selection.
Fig. 6 is the generating process schematic diagram in vocal print phonetic feature storehouse according to the preferred embodiment of the invention.As shown in fig. 6, During vocal print typing, caller A sound is carried out typing by the generation module in the vocal print phonetic feature storehouse of voice communication accessory system, Caller A sound frequency timbre information characteristic information is therefrom extracted, generates my distinctive sound frequency characteristic curve mould Type.This sound frequency characteristic curve is modulated, with life for foregoing during phonetic synthesis is carried out to generation institute voice Into the voice for finally meeting my tamber characteristic.
To make the characteristic information of extraction accurate, perfect, above-mentioned generation module carries out the voice letter of successive ignition collection owner Breath, compared with the feature database vocal print generated, progressive alternate improves the sound vocal print feature information bank of speaker.Typically Ground, the length of iteration and time can determine according to tonequality requirement of generation voice etc..
Vocal print phonetic feature storehouse can individually carry out the recording of voiceprint, generate the vocal print phonetic feature of owner Storehouse, in case being used during follow-up call.As shown in earlier figures 5, it can also be generated during immediate communication.It is usually preferential to use The mode of the recording of voiceprint is individually carried out, to ensure the low noise sound effective value of feature database.
Fig. 7 is the course of work schematic diagram of addition background sound effect according to the preferred embodiment of the invention.As shown in fig. 7, Background sound effect is added in the cheek vibrations identifying system of voice communication accessory system.By the synthesis clean speech of output with it is foregoing Additive operation is carried out containing noisy voice, can be taken off environmental background noise now, after this background sound loudness is reduced, with life Into clean speech carry out audio mixing, the clear call voice that final output matches with now environment, then send and connect Debit's (caller B mobile phone terminal).Background sound effect or the one of certain pre-set background sound effect now Kind, such as seashore, square, hall.
Fig. 8 is text-to-speech communication scheme according to the preferred embodiment of the invention.As shown in figure 8, voice call During, when ambient noise of starting is big, originating subscriber can also synthesize voice in transmitting terminal using character input modes, During synthesis, using the local voice feature database of first speaker, the frequency and tone color of synthesized voice are modulated so that synthesis Voice keep first speaker tamber characteristic.
Above-mentioned terminal speech communication assistance system provided in this preferred embodiment, it is to carry out on the mobile terminal device During voice-frequency telephony, if in the case where communicating background noise serious situation, the speech recognition degree of receiving terminal is using the system with regard to very poor Terminal on the cheek vibrations of the voice person of sending in above-mentioned voice communication when speaking are identified, for speaker's Acoustic information is checked, corrected, and carries out synthesis voice output, and the part of speech lacked in above-mentioned voice communication course is broken Piece part is supplemented, and above-mentioned voice communication communication process can be made to be normally carried out.
Above-mentioned terminal speech communication assistance system provided in this preferred embodiment, also reside in the local of local first speaker Voice vocal print feature storehouse, the frequency and tone color of synthesized voice can be modulated so that the synthesis voice sent keeps first speaker sound Color characteristic.
Above-mentioned terminal speech communication assistance system provided in this preferred embodiment, also reside in the process of voice call In, voice can also be synthesized using text mode in transmitting terminal, entered by the local voice vocal print feature storehouse of local first speaker Line frequency tone color is modulated, same to keep first speaker tamber characteristic.
The start and stop of above-mentioned terminal speech communication assistance system provided in this preferred embodiment, can be by leading to Words person's (or other users) is carried out manually, can also automatically be monitored by transmitting terminal, carries out start and stop automatically.
Above-mentioned terminal speech communication assistance system provided in this preferred embodiment can be used alone, can also be with phase The noise reduction schemes combination of pass is used, and reaches preferable noise reduction sound effective value.
Above-mentioned terminal speech communication assistance system provided in this preferred embodiment can be bidirectional mode, when both sides' When ambient noise is all bigger, both ends caller can open voice communication accessory system system, use bidirectional mode.
Above-mentioned terminal speech communication assistance system provided in this preferred embodiment, is corrected in the voice communication used Cheek vibrations when the voice person of sending speaks, are usually conducted by the fore shell glass plate of mobile phone terminal.Vibrations are further Conduct into kinetic energy voltage transformation module, carry out conversion of the kinetic energy to voltage, be then further carried out A/D conversions, be converted into counting Word code.Then the message code of foregoing cheek vibrations is further generated, in case the voice correction of sending direction uses.Together Sample, the reverse procedure of said process, i.e., the locally received call voice arrived, it can also be produced by the modular converter of voltage to kinetic energy Raw vibrations, and the vibrations of generation are transmitted on the fore shell glass of mobile phone terminal, because now speaker (caller B) can make hand Machine is in close contact with cheek, and the local user then allowed by way of osteoacusis in noise circumstance is heard, meanwhile, because not having By air borne, the voice that this end subscriber receives will not be influenceed by local terminal ambient noise.
In addition, the vocal print feature of the local voice feature database, i.e. sender of first speaker described in this preferred embodiment Storehouse, it is that this passes through in-time generatin during words.Also the mode of advance typing, including typing during start are can use, can also be adjusted Take away the characteristic information storehouse of machine vocal print password.
Above-mentioned terminal speech communication assistance system provided in this preferred embodiment or in normal noise situation Under characteristic use, background sound effect now or one kind of certain pre-set background sound effect, such as seashore, wide Field, hall etc., or some music scenario of typing in advance, or the scene that other users are voluntarily recorded, reach certain Amusement and the effect of personalized call.
Aforementioned sound frequency characteristic model, can also be by User Defined, as user is special to the sound frequency of oneself Linearity curve is modified so that the sound that party B is sent during call, that is, keeps the sound characteristic of oneself, and can is more melodious to be moved Listen, to reach the purpose for the sound that makes oneself beautiful and effect.
Secondly, the scheme for the voice of voice call process check correction to identification caller's shape of the mouth as one speaks illustrates.
Current Mouth-Shape Recognition existing on the mobile terminal device, main purpose are that the shape of the mouth as one speaks is changed into word, or conversion The operation carried out into other-end, running of such as taking pictures.
The preferred embodiment of the present invention provides a kind of voice communication accessory system, and the voice in above-mentioned communication can be sent out Shape of the mouth as one speaks when speaking for the person of going out is identified, and converts it into voice mode, and the phonological component lacked in above-mentioned communication process is entered Row is filled up, and above-mentioned communication exchanges process can be made to be normally carried out.
Fig. 9 is voice communication accessory system using process diagram according to the preferred embodiment of the invention, as shown in figure 9, After call starts, if now ambient noise is bigger, originator caller (caller A) opens voice communication accessory system System.When the voice of originator can not be identified, originator video recording system starts, and receiving end Mouth-Shape Recognition system starts, the receiving end shape of the mouth as one speaks The identifying system conversion shape of the mouth as one speaks is voice, replaces the unclear voice snippet in originator part, and communication process can continue, avoid Interrupted due to receiving end None- identified voice.
Figure 10 is Mouth-Shape Recognition system work process schematic diagram according to the preferred embodiment of the invention, as shown in Figure 10, this When mouth shape image transmission process, can be video calling mode.Can also start to use data channel, unidirectional receiving terminal is same Step transmission, the mode that receiving terminal is unidirectionally received using data channel, display picture, originator can not also display pictures for receiving end And only carry out the transmission transmitting procedure of image.
Processing for voice signal can also be two-way, when the ambient noise of both sides is all bigger, both ends Caller can open voice communication accessory system system, use bidirectional mode.As foregoing as video call process may also be double To.
Figure 11 is the generation signal of voice communication accessory system local voice feature database according to the preferred embodiment of the invention Figure, as shown in figure 11, local phonetic feature storehouse can be protected to the sound frequency feature of the voice call person once carried out Identification is deposited, when being conversed next time, can preferentially use the characteristic voice tone color of the user.
Figure 12 is voice communication accessory system text-to-speech communication scheme according to the preferred embodiment of the invention, is such as schemed Shown in 12, during voice call, when ambient noise of starting is big, originating subscriber can also be defeated using word in transmitting terminal Enter mode, sent through data channel, be converted into the mode of voice in reception local side, same local terminal uses foregoing originator caller Original phonetic feature sound storehouse tone color.
Figure 13 is the system converting voice insertion system signal of single sending end Mouth-Shape Recognition according to the preferred embodiment of the invention Figure, as shown in figure 13, during voice call, when transmitting terminal ambient noise is big, only start terminal speech in transmitting terminal Communication assistance system, Camera (camera) system carry out mouth shape image identification, and carry out the characteristic voice of voice conversion speaker Generation, it is then inserted into hair in voice pathway and sends to receiving terminal, voice call is normally carried out.
Above-mentioned terminal speech communication assistance system provided in this preferred embodiment, it is to carry out on the mobile terminal device Audio, video calling when, if in the case where communicating background noise serious situation, the speech recognition degree of receiving terminal makes with regard to very poor Shape of the mouth as one speaks when speaking of the voice person of sending in above-mentioned communication is identified with the terminal of the system, converts it into voice Mode, the part voice fragmental part lacked in above-mentioned communication process is filled up, above-mentioned communication exchanges process can be made to obtain normally Carry out.
Above-mentioned terminal speech communication assistance system provided in this preferred embodiment, also resides in local phonetic feature storehouse The voice call person's sound frequency feature once carried out can be carried out saving identification, can be excellent when being conversed next time First use the characteristic voice tone color of the user.
Above-mentioned terminal speech communication assistance system provided in this preferred embodiment, also reside in the process of voice call In, text mode can also be used in transmitting terminal, be sent through data channel, be converted into the mode of voice in reception local side, together Sample end uses the originator original phonetic feature sound storehouse tone color of caller.
The start and stop of above-mentioned terminal speech communication assistance system provided in this preferred embodiment, can be by hand It is dynamic to carry out, it can also automatically be monitored by transmitting terminal or receiving terminal, carry out start and stop automatically.
In addition, the above-mentioned terminal speech communication assistance system provided in this preferred embodiment, can also be only in transmitting terminal Mouth shape image identification is carried out, and image is subjected to voice conversion, then the voice of conversion is inserted directly into the voice of transmitting terminal.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation The method of example can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but a lot In the case of the former be more preferably embodiment.Based on such understanding, technical scheme is substantially in other words to existing The part that technology contributes can be embodied in the form of software product, and the computer software product is stored in a storage In medium (such as ROM/RAM, magnetic disc, CD), including some instructions to cause a station terminal equipment (can be mobile phone, calculate Machine, server, or network equipment etc.) perform method described in each embodiment of the present invention.
Embodiment 2
A kind of voice processing apparatus is additionally provided in the present embodiment, and the device is used to realize above-described embodiment and preferred reality Mode is applied, had carried out repeating no more for explanation.As used below, term " module " can realize the soft of predetermined function The combination of part and/or hardware.Although device described by following examples is preferably realized with software, hardware, or The realization of the combination of software and hardware is also what may and be contemplated.
Figure 14 is the structured flowchart one of voice processing apparatus according to embodiments of the present invention, as shown in figure 14, the device bag Include:
First acquisition module 142, the voice call information of voice call is carried out for obtaining;
Second acquisition module 144, above-mentioned first acquisition module 142 is connected to, it is corresponding with voice call information for obtaining User's face characteristic information;
Analog module 146, above-mentioned second acquisition module 144 is connected to, for the feelings in voice call information None- identified Under condition, the backup voice call-information of voice call information is simulated according to the user's face characteristic information of acquisition.
The structured flowchart one of second acquisition module 144 of voice processing apparatus according to embodiments of the present invention Figure 15, such as scheme Shown in 15, second acquisition module 144 includes:
First acquisition unit 152, in the case of including cheek vibration information in face feature information, obtain and carry out language The movable information of the fore shell of the mobile terminal of sound call;
First determining unit 154, above-mentioned first acquisition unit 152 is connected to, before the mobile terminal according to acquisition The movable information of shell, it is determined that the cheek vibration information that user's cheek shakes when carrying out voice call.
The structured flowchart two of second acquisition module 144 of voice processing apparatus according to embodiments of the present invention Figure 16, such as scheme Shown in 16, the device in addition to including all modules shown in Figure 15, in addition to:
Converting unit 162, for by the cheek vibration information of acquisition, being converted to voltage letter corresponding with cheek vibration information Breath;
Generation unit 164, above-mentioned converting unit 162 is connected to, for by the information of voltage after conversion, by simulation/number After word A/D conversions, message code corresponding with cheek vibration information is generated.
The structured flowchart one of the analog module 146 of voice processing apparatus according to embodiments of the present invention Figure 17, such as Figure 17 institutes Show, the analog module 146 includes:
Second determining unit 172, for the cheek vibration information according to acquisition, it is determined that for characterizing expressing the meaning for dialog context Information and the shockproofness information strong and weak for characterizing cheek vibrations;
Analogue unit 174, above-mentioned second determining unit 172 is connected to, for the information of expressing the meaning according to determination, and vibrations Strength information, the backup voice call-information of analog voice call-information.
The structured flowchart three of second acquisition module 144 of voice processing apparatus according to embodiments of the present invention Figure 18, such as scheme Shown in 18, second acquisition module 144 includes:
Second acquisition unit 182, in the case of including call Shape of mouth in face feature information, by carrying out language The image acquiring device for the mobile terminal that sound uses when conversing, obtain the call mouth of the user's communication shape of the mouth as one speaks when carrying out voice call Type information.
The structured flowchart two of the analog module 146 of voice processing apparatus according to embodiments of the present invention Figure 19, such as Figure 19 institutes Show, the analog module 146 includes:
Conversion unit 192, for by the call Shape of mouth of acquisition, being converted into the first language corresponding with call Shape of mouth Message ceases;
3rd determining unit 194, above-mentioned conversion unit 192 is connected to, for according to the first voice messaging, determining that voice leads to Talk about the backup voice call-information of information.
Figure 20 is the structured flowchart two of voice processing apparatus according to embodiments of the present invention, and as shown in figure 20, the device removes Outside including all modules shown in Figure 15, in addition to:
Modulation module 202, for according to user's vocal print feature information corresponding with voice call in voice vocal print feature storehouse, The frequency and/or tone color of backup voice call-information are modulated, voice corresponding with backup voice call-information is obtained and adjusts Information processed;
Mix module 204, for default background sound effect and backup voice call-information to be carried out into audio mixing, generation audio mixing letter Breath;
Generation module 206, the second voice messaging for being obtained after the text information conversion by input are inserted into standby language The predeterminated position of sound call-information, generate the 3rd voice messaging.
Figure 21 is the structured flowchart three of voice processing apparatus according to embodiments of the present invention, and as shown in figure 21, the device removes Outside including all modules shown in Figure 15, in addition to:
Sending module 212, for backup voice call-information to be sent to the opposite equip. of voice call.
Figure 22 is the structured flowchart four of voice processing apparatus according to embodiments of the present invention, and as shown in figure 22, the device removes Outside including all modules shown in Figure 15, in addition to:
Playing module 222, for playing backup voice call-information on mobile terminals.
The structured flowchart of the playing module 222 of voice processing apparatus according to embodiments of the present invention Figure 23, as shown in figure 23, The playing module 222 includes:
4th determining unit 232, for according to backup voice call-information, it is determined that for controlling the fore shell of mobile terminal to shake Dynamic control information;
Control unit 234, above-mentioned 4th determining unit 232 is connected to, for the control information according to determination, control movement The fore shell of terminal is shaken.
It should be noted that above-mentioned modules can be realized by software or hardware, for the latter, Ke Yitong Cross in the following manner realization, but not limited to this:Above-mentioned module is respectively positioned in same processor;Or above-mentioned modules are with any The form of combination is located in different processors respectively.
Embodiment 3
A kind of terminal device is additionally provided in embodiments of the invention.Figure 24 is terminal device according to embodiments of the present invention Structured flowchart.As shown in figure 24, the terminal device includes:Any voice processing apparatus 242 in above-described embodiment.
Embodiment 4
A kind of storage medium is additionally provided in embodiments of the invention.Alternatively, in the present embodiment, above-mentioned storage medium It can be configured to the program code that storage is used to perform following steps:
S1, obtain the voice call information for carrying out voice call;
S2, obtain user's face characteristic information corresponding with voice call information;
S3, in the case of voice call information None- identified, language is simulated according to the user's face characteristic information of acquisition The backup voice call-information of sound call-information.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:
In the case where face feature information includes cheek vibration information, user plane corresponding with voice call information is obtained Portion's characteristic information includes:
S1, obtain the movable information of the fore shell for the terminal device for carrying out voice call;
S2, according to the movable information of the fore shell of the terminal device of acquisition, it is determined that user's cheek shakes when carrying out voice call Dynamic cheek vibration information.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:
In the movable information of the fore shell of the terminal device according to acquisition, it is determined that user's cheek shakes when carrying out voice call Cheek vibration information after, in addition to:
S1, by the cheek vibration information of acquisition, be converted to information of voltage corresponding with cheek vibration information;
S2, by the information of voltage after conversion, after analog/digital A/D conversions, generation is corresponding with cheek vibration information Message code.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:
The backup voice call-information of voice call information is simulated according to the user's face characteristic information of acquisition to be included:
S1, according to the cheek vibration information of acquisition, it is determined that the information and for characterizing the cheek of expressing the meaning for characterizing dialog context The strong and weak shockproofness information of portion's vibrations;
S2, according to the information of expressing the meaning of determination, and shockproofness information, the backup voice of analog voice call-information is conversed Information.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:
In the case where face feature information includes call Shape of mouth, user plane corresponding with voice call information is obtained Portion's characteristic information includes:
By the image acquiring device of the terminal device used during voice call, obtain and used when carrying out voice call The call Shape of mouth of the family call shape of the mouth as one speaks.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:
The backup voice call-information of voice call information is simulated according to the user's face characteristic information of acquisition to be included:
S1, by the call Shape of mouth of acquisition, it is converted into the first voice messaging corresponding with call Shape of mouth;
S2, according to the first voice messaging, determine the backup voice call-information of voice call information.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:
After the backup voice call-information of voice call information is simulated according to the user's face characteristic information of acquisition, Also include at least one of:
S1, according to user's vocal print feature information corresponding with voice call in voice vocal print feature storehouse, backup voice is led to The frequency and/or tone color for talking about information are modulated, and obtain voice modulation information corresponding with backup voice call-information;
S2, default background sound effect and backup voice call-information are subjected to audio mixing, generate mixing information;
S3, the second voice messaging obtained after the text information conversion of input is inserted into the pre- of backup voice call-information If position, the 3rd voice messaging is generated.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:
After the backup voice call-information of voice call information is simulated according to the user's face characteristic information of acquisition, Also include:
Backup voice call-information is sent to the terminal device for receiving voice call.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:
After the backup voice call-information of voice call information is simulated according to the user's face characteristic information of acquisition, Also include:
Play backup voice call-information.
Alternatively, storage medium is also configured to the program code that storage is used to perform following steps:
Playing backup voice call-information includes:
S1, according to backup voice call-information, it is determined that for the control information for controlling the fore shell of terminal device to shake;
S2, according to the control information of determination, the fore shell of terminal device is controlled to be shaken.
Alternatively, in the present embodiment, above-mentioned storage medium can include but is not limited to:USB flash disk, read-only storage (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disc or CD etc. is various can be with the medium of store program codes.
Alternatively, in the present embodiment, processor performs according to the program code stored in storage medium:Obtain and carry out The voice call information of voice call;Obtain user's face characteristic information corresponding with voice call information;Believe in voice call In the case of ceasing None- identified, the backup voice that voice call information is simulated according to the user's face characteristic information of acquisition is conversed Information.
Alternatively, in the present embodiment, processor performs according to the program code stored in storage medium:Facial special In the case that reference breath includes cheek vibration information, obtaining user's face characteristic information corresponding with voice call information includes: Obtain the movable information of the fore shell for the terminal device for carrying out voice call;Believed according to the motion of the fore shell of the terminal device of acquisition Breath, it is determined that the cheek vibration information that user's cheek shakes when carrying out voice call.
Alternatively, in the present embodiment, processor performs according to the program code stored in storage medium:Obtained in basis The movable information of the fore shell of the terminal device taken, it is determined that when carrying out voice call user's cheek shake cheek vibration information it Afterwards, in addition to:By the cheek vibration information of acquisition, information of voltage corresponding with cheek vibration information is converted to;After conversion Information of voltage, after analog/digital A/D conversions, generate message code corresponding with cheek vibration information.
Alternatively, in the present embodiment, processor performs according to the program code stored in storage medium:According to acquisition User's face characteristic information simulate the backup voice call-information of voice call information and include:Shaken according to the cheek of acquisition Information, it is determined that express the meaning information and the shockproofness information strong and weak for characterizing cheek vibrations for characterizing dialog context;According to The information of expressing the meaning determined, and shockproofness information, the backup voice call-information of analog voice call-information.
Alternatively, in the present embodiment, processor performs according to the program code stored in storage medium:Facial special In the case that reference breath includes call Shape of mouth, obtaining user's face characteristic information corresponding with voice call information includes: By the image acquiring device of the terminal device used during voice call, the user's communication mouth when carrying out voice call is obtained The call Shape of mouth of type.
Alternatively, in the present embodiment, processor performs according to the program code stored in storage medium:According to acquisition User's face characteristic information simulate the backup voice call-information of voice call information and include:The call shape of the mouth as one speaks of acquisition is believed Breath, it is converted into the first voice messaging corresponding with call Shape of mouth;According to the first voice messaging, voice call information is determined Backup voice call-information.
Alternatively, in the present embodiment, processor performs according to the program code stored in storage medium:Obtained in basis After the user's face characteristic information taken simulates the backup voice call-information of voice call information, in addition to it is following at least it One:According to user's vocal print feature information corresponding with voice call in voice vocal print feature storehouse, to backup voice call-information Frequency and/or tone color are modulated, and obtain voice modulation information corresponding with backup voice call-information;By default background sound Effect carries out audio mixing with backup voice call-information, generates mixing information;The second language that will be obtained after the text information conversion of input Message ceases the predeterminated position for being inserted into backup voice call-information, generates the 3rd voice messaging.
Alternatively, in the present embodiment, processor performs according to the program code stored in storage medium:Obtained in basis After the user's face characteristic information taken simulates the backup voice call-information of voice call information, in addition to:By standby language Sound call-information is sent to the terminal device for receiving voice call.
Alternatively, in the present embodiment, processor performs according to the program code stored in storage medium:Obtained in basis After the user's face characteristic information taken simulates the backup voice call-information of voice call information, in addition to:Play standby Voice call information.
Alternatively, in the present embodiment, processor performs according to the program code stored in storage medium:Set in terminal Standby upper broadcasting backup voice call-information includes:According to backup voice call-information, it is determined that the fore shell for controlling terminal device The control information of vibrations;According to the control information of determination, the fore shell of terminal device is controlled to be shaken.
Alternatively, the specific example in the present embodiment may be referred to described in above-described embodiment and optional embodiment Example, the present embodiment will not be repeated here.
Obviously, those skilled in the art should be understood that above-mentioned each module of the invention or each step can be with general Computing device realize that they can be concentrated on single computing device, or be distributed in multiple computing devices and formed Network on, alternatively, they can be realized with the program code that computing device can perform, it is thus possible to they are stored Performed in the storage device by computing device, and in some cases, can be with different from shown in order execution herein The step of going out or describing, they are either fabricated to each integrated circuit modules respectively or by multiple modules in them or Step is fabricated to single integrated circuit module to realize.So, the present invention is not restricted to any specific hardware and software combination.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies Change, equivalent substitution, improvement etc., should be included in the scope of the protection.

Claims (21)

  1. A kind of 1. method of speech processing, it is characterised in that including:
    Obtain the voice call information for carrying out voice call;
    Obtain user's face characteristic information corresponding with the voice call information;
    In the case of the voice call information None- identified, institute is simulated according to the user's face characteristic information of acquisition The backup voice call-information of predicate sound call-information.
  2. 2. according to the method for claim 1, it is characterised in that include cheek vibration information in the face feature information In the case of, obtaining the user's face characteristic information corresponding with the voice call information includes:
    Obtain the movable information of the fore shell for the terminal device for carrying out the voice call;
    According to the movable information of the fore shell of the terminal device of acquisition, it is determined that being used when carrying out the voice call The cheek vibration information of family cheek vibrations.
  3. 3. according to the method for claim 2, it is characterised in that in the fore shell of the terminal device according to acquisition The movable information, it is determined that after the cheek vibration information that user's cheek shakes when carrying out the voice call, also wrap Include:
    By the cheek vibration information of acquisition, information of voltage corresponding with the cheek vibration information is converted to;
    By the information of voltage after conversion, after analog/digital A/D conversions, generation is corresponding with the cheek vibration information Message code.
  4. 4. according to the method for claim 2, it is characterised in that simulated according to the user's face characteristic information of acquisition The backup voice call-information of the voice call information includes:
    According to the cheek vibration information of acquisition, it is determined that for characterize dialog context express the meaning information and for characterize cheek shake Move strong and weak shockproofness information;
    Expressed the meaning according to determination information, and the shockproofness information, simulate the described standby of the voice call information With voice call-information.
  5. 5. according to the method for claim 1, it is characterised in that include call Shape of mouth in the face feature information In the case of, obtaining user's face characteristic information corresponding with the voice call information includes:
    By the image acquiring device of the terminal device used during the voice call, obtain and carrying out the voice call When the user's communication shape of the mouth as one speaks call Shape of mouth.
  6. 6. according to the method for claim 5, it is characterised in that simulated according to the user's face characteristic information of acquisition The backup voice call-information of the voice call information includes:
    By the call Shape of mouth of acquisition, the first voice messaging corresponding with the call Shape of mouth is converted into;
    According to first voice messaging, the backup voice call-information of the voice call information is determined.
  7. 7. according to the method for claim 1, it is characterised in that simulated according to the user's face characteristic information of acquisition Go out after the backup voice call-information of the voice call information, in addition at least one of:
    According to user's vocal print feature information corresponding with the voice call in voice vocal print feature storehouse, the backup voice is led to The frequency and/or tone color for talking about information are modulated, and obtain voice modulation information corresponding with the backup voice call-information;
    Default background sound effect and the backup voice call-information are subjected to audio mixing, generate mixing information;
    The second voice messaging obtained after the text information conversion of input is inserted into the default of the backup voice call-information Position, generate the 3rd voice messaging.
  8. 8. method according to any one of claim 1 to 7, it is characterised in that in the user's face according to acquisition After characteristic information simulates the backup voice call-information of the voice call information, in addition to:
    The backup voice call-information is sent to the terminal device for receiving the voice call.
  9. 9. according to the method for claim 1, it is characterised in that simulated according to the user's face characteristic information of acquisition Go out after the backup voice call-information of the voice call information, in addition to:
    Play the backup voice call-information.
  10. 10. according to the method for claim 9, it is characterised in that playing the backup voice call-information includes:
    According to the backup voice call-information, it is determined that for the control information for controlling the fore shell of terminal device to shake;
    According to the control information of determination, the fore shell of the terminal device is controlled to be shaken.
  11. A kind of 11. voice processing apparatus, it is characterised in that including:
    First acquisition module, the voice call information of voice call is carried out for obtaining;
    Second acquisition module, for obtaining user's face characteristic information corresponding with the voice call information;
    Analog module, it is special according to the user's face of acquisition in the case of the voice call information None- identified Reference ceases the backup voice call-information for simulating the voice call information.
  12. 12. device according to claim 11, it is characterised in that second acquisition module includes:
    First acquisition unit, in the case of including cheek vibration information in the face feature information, obtain described in carrying out The movable information of the fore shell of the mobile terminal of voice call;
    First determining unit, for the movable information of the fore shell of the mobile terminal according to acquisition, it is determined that entering The cheek vibration information of user's cheek vibrations during the row voice call.
  13. 13. device according to claim 12, it is characterised in that second acquisition module also includes:
    Converting unit, for by the cheek vibration information of acquisition, being converted to voltage corresponding with the cheek vibration information Information;
    Generation unit, for by the information of voltage after conversion, after analog/digital A/D conversions, generation and the cheek Message code corresponding to vibration information.
  14. 14. device according to claim 12, it is characterised in that the analog module includes:
    Second determining unit, for the cheek vibration information according to acquisition, it is determined that the letter of expressing the meaning for characterizing dialog context Breath and the shockproofness information strong and weak for characterizing cheek vibrations;
    Analogue unit, for information of being expressed the meaning according to determination, and the shockproofness information, simulate the voice call The backup voice call-information of information.
  15. 15. device according to claim 11, it is characterised in that second acquisition module includes:
    Second acquisition unit, in the case of including call Shape of mouth in the face feature information, by described in progress The image acquiring device of the mobile terminal used during voice call, obtain the user's communication shape of the mouth as one speaks when carrying out the voice call Call Shape of mouth.
  16. 16. device according to claim 15, it is characterised in that the analog module includes:
    Conversion unit, for by the call Shape of mouth of acquisition, being converted into and the call Shape of mouth corresponding first Voice messaging;
    3rd determining unit, for according to first voice messaging, determining the backup voice of the voice call information Call-information.
  17. 17. device according to claim 11, it is characterised in that also including at least one of:
    Modulation module, it is right for according to user's vocal print feature information corresponding with the voice call in voice vocal print feature storehouse The frequency and/or tone color of the backup voice call-information are modulated, and are obtained corresponding with the backup voice call-information Voice modulation information;
    Mix module, for default background sound effect and the backup voice call-information to be carried out into audio mixing, generate mixing information;
    Generation module, the second voice messaging for being obtained after the text information conversion by input are inserted into the backup voice and led to The predeterminated position of information is talked about, generates the 3rd voice messaging.
  18. 18. the device according to any one of claim 11 to 17, it is characterised in that also include:
    Sending module, for the backup voice call-information to be sent to the opposite equip. of the voice call.
  19. 19. device according to claim 11, it is characterised in that also include:
    Playing module, for playing the backup voice call-information.
  20. 20. device according to claim 19, it is characterised in that the playing module includes:
    4th determining unit, for according to the backup voice call-information, it is determined that for controlling the fore shell of mobile terminal to shake Control information;
    Control unit, for the control information according to determination, the fore shell of the mobile terminal is controlled to be shaken.
  21. A kind of 21. terminal device, it is characterised in that including:Device any one of claim 11 to 20.
CN201610811985.5A 2016-09-07 2016-09-07 Method of speech processing, device and terminal device Pending CN107800860A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201610811985.5A CN107800860A (en) 2016-09-07 2016-09-07 Method of speech processing, device and terminal device
PCT/CN2017/071139 WO2018045703A1 (en) 2016-09-07 2017-01-13 Voice processing method, apparatus and terminal device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610811985.5A CN107800860A (en) 2016-09-07 2016-09-07 Method of speech processing, device and terminal device

Publications (1)

Publication Number Publication Date
CN107800860A true CN107800860A (en) 2018-03-13

Family

ID=61531013

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610811985.5A Pending CN107800860A (en) 2016-09-07 2016-09-07 Method of speech processing, device and terminal device

Country Status (2)

Country Link
CN (1) CN107800860A (en)
WO (1) WO2018045703A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108965600A (en) * 2018-07-24 2018-12-07 Oppo(重庆)智能科技有限公司 Voice pick-up method and Related product
CN110415698A (en) * 2018-11-15 2019-11-05 腾讯科技(深圳)有限公司 Artificial intelligence data detection method and device and storage medium
CN111447325A (en) * 2020-04-03 2020-07-24 上海闻泰电子科技有限公司 Call auxiliary method, device, terminal and storage medium
WO2020172828A1 (en) * 2019-02-27 2020-09-03 华为技术有限公司 Sound source separating method, apparatus and device
CN112820274A (en) * 2021-01-08 2021-05-18 上海仙剑文化传媒股份有限公司 Voice information recognition correction method and system
CN113345436A (en) * 2021-08-05 2021-09-03 创维电器股份有限公司 Remote voice recognition control system and method based on multi-system integration high recognition rate

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102104651A (en) * 2009-12-22 2011-06-22 康佳集团股份有限公司 Method for playing reserved voice in incoming call reception of mobile terminal and mobile terminal
CN201986001U (en) * 2010-12-31 2011-09-21 上海华勤通讯技术有限公司 Mouth shape identification input mobile terminal
CN202329640U (en) * 2011-08-19 2012-07-11 广东好帮手电子科技股份有限公司 System for applying auxiliary voice recognition technology by mouth shape in vehicular navigation
CN102324035A (en) * 2011-08-19 2012-01-18 广东好帮手电子科技股份有限公司 Method and system of applying lip posture assisted speech recognition technique to vehicle navigation
CN106157956A (en) * 2015-03-24 2016-11-23 中兴通讯股份有限公司 The method and device of speech recognition
CN106157957A (en) * 2015-04-28 2016-11-23 中兴通讯股份有限公司 Audio recognition method, device and subscriber equipment

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108965600A (en) * 2018-07-24 2018-12-07 Oppo(重庆)智能科技有限公司 Voice pick-up method and Related product
CN108965600B (en) * 2018-07-24 2021-05-04 Oppo(重庆)智能科技有限公司 Voice pickup method and related product
CN110415698A (en) * 2018-11-15 2019-11-05 腾讯科技(深圳)有限公司 Artificial intelligence data detection method and device and storage medium
WO2020172828A1 (en) * 2019-02-27 2020-09-03 华为技术有限公司 Sound source separating method, apparatus and device
CN111447325A (en) * 2020-04-03 2020-07-24 上海闻泰电子科技有限公司 Call auxiliary method, device, terminal and storage medium
CN112820274A (en) * 2021-01-08 2021-05-18 上海仙剑文化传媒股份有限公司 Voice information recognition correction method and system
CN112820274B (en) * 2021-01-08 2021-09-28 上海仙剑文化传媒股份有限公司 Voice information recognition correction method and system
CN113345436A (en) * 2021-08-05 2021-09-03 创维电器股份有限公司 Remote voice recognition control system and method based on multi-system integration high recognition rate
CN113345436B (en) * 2021-08-05 2021-11-12 创维电器股份有限公司 Remote voice recognition control system and method based on multi-system integration high recognition rate

Also Published As

Publication number Publication date
WO2018045703A1 (en) 2018-03-15

Similar Documents

Publication Publication Date Title
CN107800860A (en) Method of speech processing, device and terminal device
US7664645B2 (en) Individualization of voice output by matching synthesized voice target voice
CN107134286A (en) ANTENNAUDIO player method, music player and storage medium based on interactive voice
JP2006510249A (en) Avatar database for mobile video communication
CN111294463B (en) Intelligent response method and system
US20080151786A1 (en) Method and apparatus for hybrid audio-visual communication
CN107993646A (en) A kind of method for realizing real-time voice intertranslation
WO2023098332A1 (en) Audio processing method, apparatus and device, medium, and program product
CN102420897B (en) Mobile phone communication information transmitting method and device
CN109451329B (en) Audio mixing processing method and device
CN206339975U (en) A kind of talkback unit for realizing real-time voice intertranslation
CN113784163A (en) Live wheat-connecting method and related equipment
JP2008085421A (en) Video telephone, calling method, program, voice quality conversion-image editing service providing system, and server
CN113194203A (en) Communication system, answering and dialing method and communication system for hearing-impaired people
JP2014167517A (en) Conversation providing system, game providing system, conversation providing method, game providing method, and program
US11741984B2 (en) Method and apparatus and telephonic system for acoustic scene conversion
US11580954B2 (en) Systems and methods of handling speech audio stream interruptions
CN101621482A (en) Audio and video mail box device and realization method
CN112565668B (en) Method for sharing sound in network conference
CN111787169B (en) Three-party call terminal for mobile man-machine cooperation calling robot
CN101207500B (en) Method for acoustic frequency data inflexion
JP4504216B2 (en) Image processing apparatus and image processing program
JP2007251581A (en) Voice transmission terminal and voice reproduction terminal
CN113938553B (en) Audio equipment and sound effect adjusting method thereof
JP2003316375A (en) Distributed dictation system, program, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180313