CN110364170A - Voice transmission method, device, computer installation and storage medium - Google Patents
Voice transmission method, device, computer installation and storage medium Download PDFInfo
- Publication number
- CN110364170A CN110364170A CN201910459488.7A CN201910459488A CN110364170A CN 110364170 A CN110364170 A CN 110364170A CN 201910459488 A CN201910459488 A CN 201910459488A CN 110364170 A CN110364170 A CN 110364170A
- Authority
- CN
- China
- Prior art keywords
- transmission rate
- transmitted
- voice
- voice messaging
- transmission
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000005540 biological transmission Effects 0.000 title claims abstract description 328
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000009434 installation Methods 0.000 title claims abstract description 16
- 238000003860 storage Methods 0.000 title claims abstract description 13
- 238000012546 transfer Methods 0.000 claims abstract description 57
- 238000004891 communication Methods 0.000 claims abstract description 52
- 230000002708 enhancing effect Effects 0.000 claims description 3
- 230000006870 function Effects 0.000 description 8
- 238000012545 processing Methods 0.000 description 6
- 230000009467 reduction Effects 0.000 description 6
- 210000001260 vocal cord Anatomy 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 230000005611 electricity Effects 0.000 description 2
- 230000005284 excitation Effects 0.000 description 2
- 210000004704 glottis Anatomy 0.000 description 2
- 238000013139 quantization Methods 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012806 monitoring device Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L1/00—Arrangements for detecting or preventing errors in the information received
- H04L1/0001—Systems modifying transmission characteristics according to link quality, e.g. power backoff
- H04L1/0002—Systems modifying transmission characteristics according to link quality, e.g. power backoff by adapting the transmission rate
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Networks & Wireless Communication (AREA)
- Mathematical Physics (AREA)
- Quality & Reliability (AREA)
- Telephonic Communication Services (AREA)
Abstract
The present invention provides a kind of voice transmission method, comprising: receives the voice communication that first terminal is sent and transmits instruction, obtains voice messaging to be transmitted according to voice communication transmission instruction and receive the second terminal of the voice messaging to be transmitted;Obtain transmission rate when transmitting the voice messaging to be transmitted;Judge whether the transmission rate is lower than Preset Transfer rate;If the transmission rate is lower than the Preset Transfer rate, speech recognition is carried out to the voice messaging to be transmitted, obtains speech recognition result, institute's speech recognition result includes the corresponding text information of the voice messaging to be transmitted;The text information for including by institute's speech recognition result carries out voice coding, obtains target voice information;The target voice information is transmitted to the second terminal.The invention also discloses a kind of speech transmission device, computer installation and computer readable storage mediums.The quality of voice communication can be improved in the present invention.
Description
Technical field
The present invention relates to field of communication technology more particularly to a kind of voice transmission method, device, computer installation and storages
Medium.
Background technique
With the development of computer technology and popularizing for mobile terminal, various voice communication products are more and more, these languages
Sound converses product when Network status is preferable, and speech quality is also preferable, may be in voice transfer when Network status is bad
Occur reducing the quality of voice communication due to transmitting the situations such as discontinuous caused sound Caton, influencing user experience.
Summary of the invention
In view of the foregoing, it is necessary to which a kind of voice transmission method, device, computer installation and storage medium, energy are provided
Enough improve the quality of voice communication.
The present invention provides a kind of voice transmission method, which comprises
It receives the voice communication that first terminal is sent and transmits instruction, obtained according to voice communication transmission instruction to be transmitted
Voice messaging and the second terminal for receiving the voice messaging to be transmitted;
Obtain transmission rate when transmitting the voice messaging to be transmitted;
Judge whether the transmission rate is lower than Preset Transfer rate;
If the transmission rate is lower than the Preset Transfer rate, speech recognition is carried out to the voice messaging to be transmitted,
Speech recognition result is obtained, institute's speech recognition result includes the corresponding text information of the voice messaging to be transmitted;
The text information for including by institute's speech recognition result carries out voice coding, obtains target voice information;
The target voice information is transmitted to the second terminal.
In the optional implementation of the present invention, institute's speech recognition result further includes that the voice of the voice messaging to be transmitted is special
Sign, the phonetic feature includes fundamental frequency;
The text information for including by institute's speech recognition result carries out voice coding
By the phonetic feature of the corresponding text information of voice messaging to be transmitted and the voice messaging to be transmitted into
Row voice coding.
It is described to judge whether the transmission rate is lower than after Preset Transfer rate in the optional implementation of the present invention, it is described
Method further include:
If the transmission rate is higher than the Preset Transfer rate, judge whether the transmission rate is lower than the first transmission speed
Rate;
If the transmission rate is lower than first transmission rate, the voice to be transmitted is believed by GIA coding standard
Breath is encoded and is transmitted;
If the transmission rate is higher than first transmission rate, judge whether the transmission rate is lower than the second transmission speed
Rate;
If the transmission rate is lower than second transmission rate, the voice to be transmitted is believed by GSM coding standard
Breath is encoded and is transmitted;
If the transmission rate is higher than two transmission rate, judge whether the transmission rate is lower than third transmission speed
Rate;
If the transmission rate is lower than three transmission rate, the voice to be transmitted is believed by G.728 coding standard
Breath is encoded and is transmitted;
If the transmission rate is higher than the third transmission rate, judge whether the transmission rate is lower than the 4th transmission speed
Rate;
If the transmission rate is lower than the 4th transmission rate, by G.721 coding standard to the voice to be transmitted
Information is encoded and is transmitted;
If the transmission rate is higher than the 4th transmission rate, judge whether the transmission rate is lower than the 5th transmission speed
Rate;
If the transmission rate is lower than the 5th transmission rate, by G.722 coding standard to the voice to be transmitted
Information is encoded and is transmitted;
If the transmission rate is higher than the 5th transmission rate, the voice to be transmitted is believed by MPE coding standard
Breath is encoded and is transmitted.
In the optional implementation of the present invention, the Preset Transfer rate is 8kbit/s, and first transmission rate is
13.2kbt/s, second transmission rate are 16kbt/s, and the third transmission rate is 32kbt/s, the 4th transmission speed
Rate is 64kbt/s, and the 5th transmission rate is 128kbt/s.
It is described to include: to the voice messaging progress speech recognition to be transmitted in alternative embodiment of the present invention
The feature for extracting the voice messaging to be transmitted obtains the feature vector for indicating the voice messaging to be transmitted;
Described eigenvector is input to predetermined acoustic model, obtains the corresponding phoneme information of described eigenvector;
The phoneme information is input to preset language model, obtains the element that the phoneme information includes, the element
Including the words sequence being made of word or word;
The words sequence is decoded based on pre-set dictionary, obtains the corresponding text letter of the voice messaging to be transmitted
Breath.
In alternative embodiment of the present invention, the method also includes:
If the transmission rate is lower than the Preset Transfer rate, Xiang Suoshu first terminal or the second terminal, which are sent, to be increased
The suggestion message of strong network signal intensity, alternatively, sending to the second terminal, there are the reminder messages of voice transfer.
In alternative embodiment of the present invention, the suggestion message includes recommending connection network or recommendation mobile route.
The present invention also provides a kind of speech transmission device, described device includes:
Receiving module is transmitted for receiving the voice communication transmission instruction of first terminal transmission according to the voice communication
Instruction obtains voice messaging to be transmitted and receives the second terminal of the voice messaging to be transmitted;
Module is obtained, the transmission rate when voice messaging to be transmitted is transmitted for obtaining;
Judgment module, for judging whether the transmission rate is lower than Preset Transfer rate;
Identification module, if being lower than the Preset Transfer rate for the transmission rate, to the voice messaging to be transmitted
Speech recognition is carried out, obtains speech recognition result, institute's speech recognition result includes the corresponding text of the voice messaging to be transmitted
Word information;
Coding module, the text information for including by institute's speech recognition result carry out voice coding, obtain target language
Message breath;
First transmission module, for the target voice information to be transmitted to the second terminal.
In alternative embodiment of the present invention, institute's speech recognition result further includes that the voice of the voice messaging to be transmitted is special
Sign, the phonetic feature includes fundamental frequency;
The text information that institute's speech recognition result includes is carried out voice coding by the coding module
By the phonetic feature of the corresponding text information of voice messaging to be transmitted and the voice messaging to be transmitted into
Row voice coding.
In alternative embodiment of the present invention, described device further includes the second transmission module, and second transmission module is used for:
Judge whether the transmission rate is lower than after Preset Transfer rate, if the transmission rate is higher than the default biography
Defeated rate, judges whether the transmission rate is lower than the first transmission rate;
If the transmission rate is lower than first transmission rate, the voice to be transmitted is believed by GIA coding standard
Breath is encoded and is transmitted;
If the transmission rate is higher than first transmission rate, judge whether the transmission rate is lower than the second transmission speed
Rate;
If the transmission rate is lower than second transmission rate, the voice to be transmitted is believed by GSM coding standard
Breath is encoded and is transmitted;
If the transmission rate is higher than two transmission rate, judge whether the transmission rate is lower than third transmission speed
Rate;
If the transmission rate is lower than three transmission rate, the voice to be transmitted is believed by G.728 coding standard
Breath is encoded and is transmitted;
If the transmission rate is higher than the third transmission rate, judge whether the transmission rate is lower than the 4th transmission speed
Rate;
If the transmission rate is lower than the 4th transmission rate, by G.721 coding standard to the voice to be transmitted
Information is encoded and is transmitted;
If the transmission rate is higher than the 4th transmission rate, judge whether the transmission rate is lower than the 5th transmission speed
Rate;
If the transmission rate is lower than the 5th transmission rate, by G.722 coding standard to the voice to be transmitted
Information is encoded and is transmitted;
If the transmission rate is higher than the 5th transmission rate, the voice to be transmitted is believed by MPE coding standard
Breath is encoded and is transmitted.
In alternative embodiment of the present invention, the Preset Transfer rate is 8kbit/s, and first transmission rate is
13.2kbt/s, second transmission rate are 16kbt/s, and the third transmission rate is 32kbt/s, the 4th transmission speed
Rate is 64kbt/s, and the 5th transmission rate is 128kbt/s.
In alternative embodiment of the present invention, the identification module carries out speech recognition packet to the voice messaging to be transmitted
It includes:
The feature for extracting the voice messaging to be transmitted obtains the feature vector for indicating the voice messaging to be transmitted;
Described eigenvector is input to predetermined acoustic model, obtains the corresponding phoneme information of described eigenvector;
The phoneme information is input to preset language model, obtains the element that the phoneme information includes, the element
Including the words sequence being made of word or word;
The words sequence is decoded based on pre-set dictionary, obtains the corresponding text letter of the voice messaging to be transmitted
Breath.
In alternative embodiment of the present invention, described device further include:
Reminding module, if being lower than the Preset Transfer rate, Xiang Suoshu first terminal or described for the transmission rate
Second terminal sends the suggestion message of enhancing network signal intensity, alternatively, sending to the second terminal, there are voice transfers
Reminder message.
In alternative embodiment of the present invention, the suggestion message includes recommending connection network or recommendation mobile route.
The present invention also provides a kind of computer installation, the computer installation includes memory and processor, the storage
Device is for storing at least one instruction, and the processor is for executing at least one described instruction to realize institute in any embodiment
The voice transmission method stated.
The present invention also provides a kind of computer readable storage medium, the computer-readable recording medium storage has at least one
A instruction, at least one described instruction realize voice transmission method described in any embodiment when being executed by processor.
Being found out by above technical scheme, the present invention transmits instruction by receiving the voice communication that first terminal is sent, according to
The voice communication transmission instruction obtains voice messaging to be transmitted and receives the second terminal of the voice messaging to be transmitted;It obtains
Take transmission rate when transmitting the voice messaging to be transmitted;Judge whether the transmission rate is lower than Preset Transfer rate;If
The transmission rate is lower than the Preset Transfer rate, carries out speech recognition to the voice messaging to be transmitted, obtains voice and know
Not as a result, institute's speech recognition result includes the corresponding text information of the voice messaging to be transmitted;By the speech recognition knot
The text information that fruit includes carries out voice coding, obtains target voice information;The target voice information is transmitted to described
Two terminals.Since when transmission rate is lower than Preset Transfer rate, the text information for including by speech recognition result is encoded,
The voice content for remaining voice messaging to be transmitted, the information encoded when reducing voice coding, to be conducive to voice communication
The purpose for improving the quality of voice communication is realized in the call of Shi Jinhang smoothness, avoids occurring when voice communication in Caton or call
It is disconnected.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis
The attached drawing of offer obtains other attached drawings.
Fig. 1 is a kind of flow chart of voice transmission method provided in an embodiment of the present invention;
Fig. 2 is a kind of functional block diagram of speech transmission device provided in an embodiment of the present invention;
Fig. 3 is the structural schematic diagram of the computer installation for the preferred embodiment that the present invention realizes voice transmission method.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real
Applying mode, the present invention is described in further detail.
As shown in FIG. 1, FIG. 1 is a kind of flow charts of voice transmission method provided in an embodiment of the present invention.According to different
Demand, the sequence of step can change in the flow chart, and certain steps can be omitted.
S11, receive first terminal send voice communication transmit instruction, according to the voice communication transmit instruction obtain to
Transmitting voice information and the second terminal for receiving the voice messaging to be transmitted.
In the present embodiment, the first terminal and the second terminal can be identical electronic equipment or different electricity
Sub- equipment, for example, the first terminal and the second terminal are all mobile phone, alternatively, the first terminal is mobile phone, described the
Two terminals are computer.
The voice communication transmission instruction is the instruction for sending voice messaging between two terminals.
In the present embodiment, the first terminal is the sender that voice messaging is sent, i.e. calling party, the second terminal
For the recipient of voice messaging, i.e. callee.
In a kind of possible embodiment, instruction is transmitted according to the voice communication and obtains voice messaging to be transmitted and reception
The second terminal of the voice messaging to be transmitted include: obtain voice communication transmission instruction instruction voice messaging to be transmitted and
Receive the second terminal of voice messaging to be transmitted.
For example, the recipient comprising voice messaging to be transmitted and voice messaging to be transmitted in voice communication transmission instruction,
Receive the second terminal of transmitting voice information.
S12 obtains transmission rate when transmitting the voice messaging to be transmitted.
Transmission rate, that is, the network transmission speed refers to that host on the computer network transmits number on digital channel
According to rate.For example, transmission rate is 16bit/s, the data volume of transmission 16bit per second is indicated.
In the present embodiment, the transmission rate obtained when transmitting the voice messaging to be transmitted includes: acquisition first terminal
The receiving velocity of the transmission rate of first terminal or second terminal when transmitting voice messaging to be transmitted to second terminal.
For example, obtaining the transmission rate of calling party, which reflects master when carrying out voice transfer by communication software
Message transmission rate when direction base station/server being made to send voice messaging;Alternatively, carrying out voice biography by communication such as software
When defeated, the transmission rate of callee is obtained, which reflects message transmission rate when callee receives voice messaging.
S13, judges whether the transmission rate is lower than Preset Transfer rate.
In the present embodiment, judge transmission rate whether be lower than Preset Transfer rate for determine carry out voice transfer when, it is logical
Whether letter both sides are in poor network environment, if will affect speech quality.
The occurrence of the Preset Transfer rate, which can according to need, to be preset.
Optionally, the Preset Transfer rate is 8kbit/s.
Optionally, in an alternative embodiment of the invention, described to judge whether the transmission rate is lower than Preset Transfer rate
Later, the method also includes:
If the transmission rate is higher than the Preset Transfer rate, judge whether the transmission rate is lower than the first transmission speed
Rate;
If the transmission rate is lower than first transmission rate, the voice to be transmitted is believed by GIA coding standard
Breath is encoded and is transmitted;
If the transmission rate is higher than first transmission rate, judge whether the transmission rate is lower than the second transmission speed
Rate;
If the transmission rate is lower than second transmission rate, the voice to be transmitted is believed by GSM coding standard
Breath is encoded and is transmitted;
If the transmission rate is higher than two transmission rate, judge whether the transmission rate is lower than third transmission speed
Rate;
If the transmission rate is lower than three transmission rate, the voice to be transmitted is believed by G.728 coding standard
Breath is encoded and is transmitted;
If the transmission rate is higher than the third transmission rate, judge whether the transmission rate is lower than the 4th transmission speed
Rate;
If the transmission rate is lower than the 4th transmission rate, by G.721 coding standard to the voice to be transmitted
Information is encoded and is transmitted;
If the transmission rate is higher than the 4th transmission rate, judge whether the transmission rate is lower than the 5th transmission speed
Rate;
If the transmission rate is lower than the 5th transmission rate, by G.722 coding standard to the voice to be transmitted
Information is encoded and is transmitted;
If the transmission rate is higher than the 5th transmission rate, the voice to be transmitted is believed by MPE coding standard
Breath is encoded and is transmitted.
Coding is the process that information is indicated with code, in digital encoding procedure, extract the sound of certain point frequency values and
The energy value of the frequency simultaneously passes through digital quantization, and relative to the signal of nature, any digital audio encoding scheme is all damaged
, the coding mode of highest fidelity is exactly pcm encoder at present, by pcm encoder can with infinite degree close to original sound, but
It is PCM bulky, is unfavorable for transmitting, therefore during audio transmission, we can carry out the coding of other forms to audio,
To compress to audio, the fluency of transmission is improved.
In the present embodiment, voice messaging is encoded using different encryption algorithms based on different coding standards.
It is encoded for example, being realized by SB-ADPCM algorithm based on G.722 coding standard, is realized by ADPCM algorithm
It is encoded based on G.721 coding standard, is realized by LD-CELP algorithm and encoded based on G.728 coding standard, passed through
RPE-LTP algorithm is realized to be encoded based on GSM coding standard, is realized by VSELPC algorithm and is carried out based on GIA coding standard
Coding.
In the present embodiment, it under different transmission rate situations, is encoded using different coding standards, thus
At different transmission rates in transmission process, more fully voice messaging can be retained as far as possible, improve the quality of sound.
Optionally, the first transmission rate is 13.2kbt/s, and second transmission rate is 16kbt/s, the third transmission
Rate is 32kbt/s, and the 4th transmission rate is 64kbt/s, and the 5th transmission rate is 128kbt/s.
S14 carries out voice knowledge to the voice messaging to be transmitted if the transmission rate is lower than the Preset Transfer rate
Not, speech recognition result is obtained, institute's speech recognition result includes the corresponding text information of the voice messaging to be transmitted.
In the present embodiment, speech recognition, which refers to, converts voice signals into corresponding text information.
Specifically, treating transmitting voice information by speech recognition technology carries out speech recognition.
Optionally, in an alternative embodiment of the invention, described to include: to the voice messaging progress speech recognition to be transmitted
The feature for extracting the voice messaging to be transmitted obtains the feature vector for indicating the voice messaging to be transmitted;
Described eigenvector is input to predetermined acoustic model, obtains the corresponding phoneme information of described eigenvector;
The phoneme information is input to preset language model, obtains the element that the phoneme information includes, the element
Including the words sequence being made of word or word;
The words sequence is decoded based on pre-set dictionary, obtains the corresponding text letter of the voice messaging to be transmitted
Breath.
The predetermined acoustic model and preset language model can according to need selection.
S15, the text information for including by institute's speech recognition result carry out voice coding, obtain target voice information.
It is to compile text information that the text information that speech recognition result includes, which is carried out voice coding, in the present embodiment
Code, encodes sampled voice different from traditional, can greatly reduce data volume when transmission.
Traditional code mode is that sample code is carried out to the frequency and amplitude of sound, the data volume that when traditional code transmits
Calculation mode is as follows:
Data volume (byte per second)=channel number/8 sample rate (Hz) * sample size (bit) *
By taking sample rate 16K monophonic as an example: the voice data size of 1s are as follows: 16000*16*1/8=32Kb
In the present embodiment, the voice data of target voice information transmission per second after being encoded are as follows: passed when voice coding
Defeated data volume (byte per second)=number of characters * per second that says corresponds to character code size (bit), wherein per second to say number of characters
For speech recognition to voice messaging in number of characters per second, different number of characters (such as Chinese character) has corresponding character code big
It is small, corresponding character code size can be determined according to preset characters with the corresponding relationship of character code size.
By taking voice messaging to be transmitted is Chinese as an example, common people's Chinese character per second can say is at 10 hereinafter, encoding of chinese characters
For 2 characters/Chinese character, then the data volume of 1s are as follows: 10*2=20bit can be seemed, when the present embodiment carries out transmission of speech information,
The data volume of transmission per second greatly reduces.
Optionally, in an alternative embodiment of the invention, institute's speech recognition result further includes the voice messaging to be transmitted
Phonetic feature, the phonetic feature includes fundamental frequency;
The text information for including by institute's speech recognition result carries out voice coding
By the phonetic feature of the corresponding text information of voice messaging to be transmitted and the voice messaging to be transmitted into
Row voice coding.
Phonetic feature refers to the information of reflection phonetic feature.For example, the sound intensity of voice, loudness or pitch.
For usual people when sending out sound turbid, air-flow makes vocal cords generate the vibration of relaxation vibrating type by glottis, generates one paracycle
Air pulse, this air-flow excitation sound channel just generate voiced sound, also known as speech sound, it carries most of energy in voice.
The frequency of this vocal cord vibration becomes fundamental frequency.
The length of fundamental frequency and vocal cords, thin and thick, toughness, stiffness and pronunciation habit etc. are related, can be largely anti-
Answer personal feature.Therefore, it combines fundamental frequency to be encoded in the present embodiment, can guarantee the same of content the accurate transmission
When, utmostly retain the feature of sound.
In the present embodiment, the fundamental frequency of voice messaging can be obtained by Cepstrum Method.
By the corresponding text information of voice messaging to be transmitted and the voice messaging to be transmitted in the present embodiment
It is to encode text information combination phonetic feature that phonetic feature, which carries out voice coding, also different from traditional to sampled voice
It is encoded, data volume when transmission can be greatly reduced.
In the present embodiment, the voice data of target voice information transmission per second after being encoded are as follows: passed when voice coding
Defeated data volume (byte per second)=number of characters * per second that says corresponds to character code size (bit)+phonetic feature (according to what is extracted
Depending on phonetic feature, such as 10bit/s), wherein it is per second say number of characters be speech recognition to voice messaging in character per second
Number, different number of characters (such as Chinese character) have corresponding character code size, can be according to preset characters and character code size
Corresponding relationship determines corresponding character code size.
By taking voice messaging to be transmitted is Chinese as an example, common people's Chinese character per second can say is at 10 hereinafter, encoding of chinese characters
For 2 characters/Chinese character, then the data volume of 1s are as follows: 10*2+10=30bit can be seemed, and the present embodiment carries out transmission of speech information
When, the data volume of transmission per second greatly reduces.
The target voice information is transmitted to the second terminal by S16.
In a kind of alternative embodiment, after second terminal receives target voice information, target voice information is carried out
Decoding, i.e., be reduced into voice for the text information received (or text information and phonetic feature).
In a kind of alternative embodiment, if there is no content in voice after reduction, filled by white noise.Wherein, white
Noise is a Duan Shengyin, specifically, white noise is power spectral density equally distributed noise in entire frequency domain.
By filling in the sound of reduction by white noise, the language after reduction can be listened by second terminal to avoid user
When sound, maloperation caused by sound Shi Yiwei voice interruption (such as exiting) is not being heard.
Through this embodiment, although such as audio, volume feature are lost in an encoding process, in the feelings that network is very poor
Under condition, still be able to greatly retain voice content, avoid occurring when voice communication voice intermittently, lose voice content very
The situation that can not extremely converse.
In an alternative embodiment of the invention, the method also includes:
If the transmission rate is lower than the Preset Transfer rate, Xiang Suoshu first terminal or the second terminal, which are sent, to be increased
The suggestion message of strong network signal intensity, alternatively, sending to the second terminal, there are the reminder messages of voice transfer.
In the present embodiment, it when transmission rate is lower than Preset Transfer rate, sends and increases to first terminal or second terminal
How the suggestion message of strong network signal intensity makes first terminal or second terminal enhance the suggestion of network signal if specifically can be,
To which transmission rate when being conducive to transmit voice messaging to be transmitted between first terminal and second terminal is higher, and then improve language
The quality of sound call.
Optionally, the suggestion message includes recommending connection network or recommendation mobile route.
It is described to recommend to connect network to be that other recommended to first terminal or second terminal connect net in the present embodiment
Network.The recommendation shift position, which refers to, is moved to that position can make first terminal or second eventually for first terminal or second terminal
The network signal at end enhances.
Further, in an alternative embodiment of the invention, recommendation connection network, the method can be obtained in the following manner
Further include:
The network-connectable around first terminal or second terminal is obtained, it is strong to obtain network signal in the network-connectable
The network that degree is greater than the network signal intensity threshold value is to recommend connection network;Or
The network-connectable around first terminal or second terminal is obtained, it is strong to obtain network signal in the network-connectable
Strongest network is spent to recommend connection network;Or
The network-connectable around first terminal or second terminal is obtained, the safety net in the network-connectable is obtained
Network, obtaining network signal intensity in the secure network to be greater than the network of the network signal intensity threshold value is recommendation network;Or
Person
The network-connectable around first terminal or second terminal is obtained, the history connection in the network-connectable is obtained
Network, obtaining network signal intensity in the web-based history to be greater than the network of the network signal intensity threshold value is recommendation network.
Wherein, the history connection network refers to the network that first terminal or second terminal once connected.
The network for being greater than network signal intensity threshold value by obtaining network signal intensity in secure network is recommendation network, from
And safe network can be got, it is connected to first terminal or second terminal in safe network, avoids that there are network securitys
Problem.
Further, in an alternative embodiment of the invention, recommendation mobile route, the side can be obtained in the following manner
Method further include:
It is available around first position and first terminal locating for acquisition first terminal or second terminal or second terminal
Connect network;
The second position of connection network can be used described in acquisition;
Using the first position as initial position, the second position obtains the initial position as final position
Mobile route between the final position is the recommendation mobile route.
When the distance of first terminal and second terminal is closer with network-connectable, it is strong more to obtain better network signal
Degree.For example, can more obtain better network signal intensity when closer apart from router.
In the present embodiment, it obtains and recommends mobile route, first terminal can be conducive to or second terminal is moved, from
And make first terminal or second terminal that there is better network signal intensity, be conducive to pass between first terminal and second terminal
Transmission rate when defeated voice messaging to be transmitted is higher, and then improves the quality of voice communication.
A kind of voice transmission method provided by the invention receives the voice communication that first terminal is sent and transmits instruction, according to
The voice communication transmission instruction obtains voice messaging to be transmitted and receives the second terminal of the voice messaging to be transmitted;It obtains
Take transmission rate when transmitting the voice messaging to be transmitted;Judge whether the transmission rate is lower than Preset Transfer rate;If
The transmission rate is lower than the Preset Transfer rate, carries out speech recognition to the voice messaging to be transmitted, obtains voice and know
Not as a result, institute's speech recognition result includes the corresponding text information of the voice messaging to be transmitted;By the speech recognition knot
The text information that fruit includes carries out voice coding, obtains target voice information;The target voice information is transmitted to described
Two terminals.Since when transmission rate is lower than Preset Transfer rate, the text information for including by speech recognition result is encoded,
The voice content for remaining voice messaging to be transmitted, the information encoded when reducing voice coding, to be conducive to voice communication
The purpose for improving the quality of voice communication is realized in the call of Shi Jinhang smoothness, avoids occurring when voice communication in Caton or call
It is disconnected.
As shown in Fig. 2, Fig. 2 is a kind of functional block diagram of speech transmission device provided in an embodiment of the present invention.Institute's predicate
Sound transmitting device includes receiving module 210, obtains module 220, judgment module 230, identification module 240, coding module 250 and the
One transmission module 260.The so-called module of the present invention refers to that one kind performed by processor and can complete fixed function
Series of computation machine program segment, be stored in the memory of computer equipment.In the present embodiment, the function about each module
It can will be described in detail in subsequent embodiment.
Receiving module 210 is passed for receiving the voice communication transmission instruction of first terminal transmission according to the voice communication
Defeated instruction obtains voice messaging to be transmitted and receives the second terminal of the voice messaging to be transmitted.
In the present embodiment, the first terminal and the second terminal can be identical electronic equipment or different electricity
Sub- equipment, for example, the first terminal and the second terminal are all mobile phone, alternatively, the first terminal is mobile phone, described the
Two terminals are computer.
The voice communication transmission instruction is the instruction for sending voice messaging between two terminals.
In the present embodiment, the first terminal is the sender that voice messaging is sent, i.e. calling party, the second terminal
For the recipient of voice messaging, i.e. callee.
In a kind of possible embodiment, instruction is transmitted according to the voice communication and obtains voice messaging to be transmitted and reception
The second terminal of the voice messaging to be transmitted include: obtain voice communication transmission instruction instruction voice messaging to be transmitted and
Receive the second terminal of voice messaging to be transmitted.
For example, the recipient comprising voice messaging to be transmitted and voice messaging to be transmitted in voice communication transmission instruction,
Receive the second terminal of transmitting voice information.
Module 220 is obtained, the transmission rate when voice messaging to be transmitted is transmitted for obtaining.
Transmission rate, that is, the network transmission speed refers to that host on the computer network transmits number on digital channel
According to rate.For example, transmission rate is 16bit/s, the data volume of transmission 16bit per second is indicated.
In the present embodiment, the transmission rate obtained when transmitting the voice messaging to be transmitted includes: acquisition first terminal
The receiving velocity of the transmission rate of first terminal or second terminal when transmitting voice messaging to be transmitted to second terminal.
For example, obtaining the transmission rate of calling party, which reflects master when carrying out voice transfer by communication software
Message transmission rate when direction base station/server being made to send voice messaging;Alternatively, carrying out voice biography by communication such as software
When defeated, the transmission rate of callee is obtained, which reflects message transmission rate when callee receives voice messaging.
Judgment module 230, for judging whether the transmission rate is lower than Preset Transfer rate.
In the present embodiment, judge transmission rate whether be lower than Preset Transfer rate for determine carry out voice transfer when, it is logical
Whether letter both sides are in poor network environment, if will affect speech quality.
The occurrence of the Preset Transfer rate, which can according to need, to be preset.
Optionally, the Preset Transfer rate is 8kbit/s.
Identification module 240 believes the voice to be transmitted if being lower than the Preset Transfer rate for the transmission rate
Breath carries out speech recognition, obtains speech recognition result, institute's speech recognition result includes that the voice messaging to be transmitted is corresponding
Text information.
In the present embodiment, speech recognition, which refers to, converts voice signals into corresponding text information.
Specifically, treating transmitting voice information by speech recognition technology carries out speech recognition.
Optionally, in an alternative embodiment of the invention, the identification module 240 carries out language to the voice messaging to be transmitted
Sound identifies
The feature for extracting the voice messaging to be transmitted obtains the feature vector for indicating the voice messaging to be transmitted;
Described eigenvector is input to predetermined acoustic model, obtains the corresponding phoneme information of described eigenvector;
The phoneme information is input to preset language model, obtains the element that the phoneme information includes, the element
Including the words sequence being made of word or word;
The words sequence is decoded based on pre-set dictionary, obtains the corresponding text letter of the voice messaging to be transmitted
Breath.
The predetermined acoustic model and preset language model can according to need selection.
Coding module 250, the text information for including by institute's speech recognition result carry out voice coding, obtain target
Voice messaging.
The text information that speech recognition result includes voice coding is carried out in the present embodiment to compile text information
Code, encodes sampled voice different from traditional, can greatly reduce data volume when transmission.
Traditional code mode is that sample code is carried out to the frequency and amplitude of sound, the data volume that when traditional code transmits
Calculation mode is as follows:
Data volume (byte per second)=channel number/8 sample rate (Hz) * sample size (bit) *
By taking sample rate 16K monophonic as an example: the voice data size of 1s are as follows: 16000*16*1/8=32Kb
In the present embodiment, the voice data of target voice information transmission per second after being encoded are as follows: passed when voice coding
Defeated data volume (byte per second)=number of characters * per second that says corresponds to character code size (bit), wherein per second to say number of characters
For speech recognition to voice messaging in number of characters per second, different number of characters (such as Chinese character) has corresponding character code big
It is small, corresponding character code size can be determined according to preset characters with the corresponding relationship of character code size.
By taking voice messaging to be transmitted is Chinese as an example, common people's Chinese character per second can say is at 10 hereinafter, encoding of chinese characters
For 2 characters/Chinese character, then the data volume of 1s are as follows: 10*2=20bit can be seemed, when the present embodiment carries out transmission of speech information,
The data volume of transmission per second greatly reduces.
Optionally, in an alternative embodiment of the invention, institute's speech recognition result further includes the voice messaging to be transmitted
Phonetic feature, the phonetic feature includes fundamental frequency;
The text information that institute's speech recognition result includes is carried out voice coding by the coding module 250
By the phonetic feature of the corresponding text information of voice messaging to be transmitted and the voice messaging to be transmitted into
Row voice coding.
Phonetic feature refers to the information of reflection phonetic feature.For example, the sound intensity of voice, loudness or pitch.
For usual people when sending out sound turbid, air-flow makes vocal cords generate the vibration of relaxation vibrating type by glottis, generates one paracycle
Air pulse, this air-flow excitation sound channel just generate voiced sound, also known as speech sound, it carries most of energy in voice.
The frequency of this vocal cord vibration becomes fundamental frequency.
The length of fundamental frequency and vocal cords, thin and thick, toughness, stiffness and pronunciation habit etc. are related, can be largely anti-
Answer personal feature.Therefore, it combines fundamental frequency to be encoded in the present embodiment, can guarantee the same of content the accurate transmission
When, utmostly retain the feature of sound.
In the present embodiment, the fundamental frequency of voice messaging can be obtained by Cepstrum Method.
By the corresponding text information of voice messaging to be transmitted and the voice messaging to be transmitted in the present embodiment
It is to encode text information combination phonetic feature that phonetic feature, which carries out voice coding, also different from traditional to sampled voice
It is encoded, data volume when transmission can be greatly reduced.
In the present embodiment, the voice data of target voice information transmission per second after being encoded are as follows: passed when voice coding
Defeated data volume (byte per second)=number of characters * per second that says corresponds to character code size (bit)+phonetic feature (according to what is extracted
Depending on phonetic feature, such as 10bit/s), wherein it is per second say number of characters be speech recognition to voice messaging in character per second
Number, different number of characters (such as Chinese character) have corresponding character code size, can be according to preset characters and character code size
Corresponding relationship determines corresponding character code size.
By taking voice messaging to be transmitted is Chinese as an example, common people's Chinese character per second can say is at 10 hereinafter, encoding of chinese characters
For 2 characters/Chinese character, then the data volume of 1s are as follows: 10*2+10=30bit can be seemed, and the present embodiment carries out transmission of speech information
When, the data volume of transmission per second greatly reduces.
First transmission module 260, for the target voice information to be transmitted to the second terminal.
In a kind of alternative embodiment, after second terminal receives target voice information, target voice information is carried out
Decoding, i.e., be reduced into voice for the text information received (or text information and phonetic feature).
In a kind of alternative embodiment, if there is no content in voice after reduction, filled by white noise.Wherein, white
Noise is a Duan Shengyin, specifically, white noise is power spectral density equally distributed noise in entire frequency domain.
By filling in the sound of reduction by white noise, the language after reduction can be listened by second terminal to avoid user
When sound, maloperation caused by sound Shi Yiwei voice interruption (such as exiting) is not being heard.
Through this embodiment, although such as audio, volume feature are lost in an encoding process, in the feelings that network is very poor
Under condition, still be able to greatly retain voice content, avoid occurring when voice communication voice intermittently, lose voice content very
The situation that can not extremely converse.
In an alternative embodiment of the invention, described device further include:
Reminding module, if being lower than the Preset Transfer rate, Xiang Suoshu first terminal or described for the transmission rate
Second terminal sends the suggestion message of enhancing network signal intensity, alternatively, sending to the second terminal, there are voice transfers
Reminder message.
In the present embodiment, it when transmission rate is lower than Preset Transfer rate, sends and increases to first terminal or second terminal
How the suggestion message of strong network signal intensity makes first terminal or second terminal enhance the suggestion of network signal if specifically can be,
To which transmission rate when being conducive to transmit voice messaging to be transmitted between first terminal and second terminal is higher, and then improve language
The quality of sound call.
Optionally, the suggestion message includes recommending connection network or recommendation mobile route.
It is described to recommend to connect network to be that other recommended to first terminal or second terminal connect net in the present embodiment
Network.The recommendation shift position, which refers to, is moved to that position can make first terminal or second eventually for first terminal or second terminal
The network signal at end enhances.
Further, in an alternative embodiment of the invention, it can be obtained by recommending module and recommend connection network, recommending module
For:
The network-connectable around first terminal or second terminal is obtained, it is strong to obtain network signal in the network-connectable
The network that degree is greater than the network signal intensity threshold value is to recommend connection network;Or
The network-connectable around first terminal or second terminal is obtained, it is strong to obtain network signal in the network-connectable
Strongest network is spent to recommend connection network;Or
The network-connectable around first terminal or second terminal is obtained, the safety net in the network-connectable is obtained
Network, obtaining network signal intensity in the secure network to be greater than the network of the network signal intensity threshold value is recommendation network;Or
Person
The network-connectable around first terminal or second terminal is obtained, the history connection in the network-connectable is obtained
Network, obtaining network signal intensity in the web-based history to be greater than the network of the network signal intensity threshold value is recommendation network.
Wherein, the history connection network refers to the network that first terminal or second terminal once connected.
The network for being greater than network signal intensity threshold value by obtaining network signal intensity in secure network is recommendation network, from
And safe network can be got, it is connected to first terminal or second terminal in safe network, avoids that there are network securitys
Problem.
Further, in an alternative embodiment of the invention, it can also be obtained by recommending module and recommend mobile route, it is described
Recommending module is also used to:
It is available around first position and first terminal locating for acquisition first terminal or second terminal or second terminal
Connect network;
The second position of connection network can be used described in acquisition;
Using the first position as initial position, the second position obtains the initial position as final position
Mobile route between the final position is the recommendation mobile route.
When the distance of first terminal and second terminal is closer with network-connectable, it is strong more to obtain better network signal
Degree.For example, can more obtain better network signal intensity when closer apart from router.
In the present embodiment, it obtains and recommends mobile route, first terminal can be conducive to or second terminal is moved, from
And make first terminal or second terminal that there is better network signal intensity, be conducive to pass between first terminal and second terminal
Transmission rate when defeated voice messaging to be transmitted is higher, and then improves the quality of voice communication.
Optionally, in an alternative embodiment of the invention, described device further includes the second transmission module, the second transmission mould
Block is used for:
Judge whether the transmission rate is lower than after Preset Transfer rate, if the transmission rate is higher than the default biography
Defeated rate, judges whether the transmission rate is lower than the first transmission rate;
If the transmission rate is lower than first transmission rate, the voice to be transmitted is believed by GIA coding standard
Breath is encoded and is transmitted;
If the transmission rate is higher than first transmission rate, judge whether the transmission rate is lower than the second transmission speed
Rate;
If the transmission rate is lower than second transmission rate, the voice to be transmitted is believed by GSM coding standard
Breath is encoded and is transmitted;
If the transmission rate is higher than two transmission rate, judge whether the transmission rate is lower than third transmission speed
Rate;
If the transmission rate is lower than three transmission rate, the voice to be transmitted is believed by G.728 coding standard
Breath is encoded and is transmitted;
If the transmission rate is higher than the third transmission rate, judge whether the transmission rate is lower than the 4th transmission speed
Rate;
If the transmission rate is lower than the 4th transmission rate, by G.721 coding standard to the voice to be transmitted
Information is encoded and is transmitted;
If the transmission rate is higher than the 4th transmission rate, judge whether the transmission rate is lower than the 5th transmission speed
Rate;
If the transmission rate is lower than the 5th transmission rate, by G.722 coding standard to the voice to be transmitted
Information is encoded and is transmitted;
If the transmission rate is higher than the 5th transmission rate, the voice to be transmitted is believed by MPE coding standard
Breath is encoded and is transmitted.
Coding is the process that information is indicated with code, in digital encoding procedure, extract the sound of certain point frequency values and
The energy value of the frequency simultaneously passes through digital quantization, and relative to the signal of nature, any digital audio encoding scheme is all damaged
, the coding mode of highest fidelity is exactly pcm encoder at present, by pcm encoder can with infinite degree close to original sound, but
It is PCM bulky, is unfavorable for transmitting, therefore during audio transmission, we can carry out the coding of other forms to audio,
To compress to audio, the fluency of transmission is improved.
In the present embodiment, voice messaging is encoded using different encryption algorithms based on different coding standards.
It is encoded for example, being realized by SB-ADPCM algorithm based on G.722 coding standard, is realized by ADPCM algorithm
It is encoded based on G.721 coding standard, is realized by LD-CELP algorithm and encoded based on G.728 coding standard, passed through
RPE-LTP algorithm is realized to be encoded based on GSM coding standard, is realized by VSELPC algorithm and is carried out based on GIA coding standard
Coding.
In the present embodiment, it under different transmission rate situations, is encoded using different coding standards, thus
At different transmission rates in transmission process, more fully voice messaging can be retained as far as possible, improve the quality of sound.
Optionally, the first transmission rate is 13.2kbt/s, and second transmission rate is 16kbt/s, the third transmission
Rate is 32kbt/s, and the 4th transmission rate is 64kbt/s, and the 5th transmission rate is 128kbt/s.
A kind of speech transmission device provided by the invention receives the voice communication that first terminal is sent by receiving module and passes
Defeated instruction transmits instruction acquisition voice messaging to be transmitted according to the voice communication and receives the voice messaging to be transmitted
Second terminal;Obtain transmission rate when module obtains the transmission voice messaging to be transmitted;Judgment module judges the transmission
Whether rate is lower than Preset Transfer rate;If the transmission rate be lower than the Preset Transfer rate, identification module to it is described to
Transmitting voice information carries out speech recognition, obtains speech recognition result, institute's speech recognition result includes the voice to be transmitted
The corresponding text information of information;The text information that institute's speech recognition result is included by coding module carries out voice coding, obtains
Target voice information;The target voice information is transmitted to the second terminal by the first transmission module.Due in transmission rate
When lower than Preset Transfer rate, the text information for including by speech recognition result is encoded, and remains voice messaging to be transmitted
Voice content, the information encoded when reducing voice coding realizes to carry out smooth call when being conducive to voice communication
The purpose for improving the quality of voice communication, avoids occurring Caton or dropped calls when voice communication.
The above-mentioned integrated unit realized in the form of software function module, can store and computer-readable deposit at one
In storage media.Above-mentioned software function module is stored in a storage medium, including some instructions are used so that a computer
It is each that equipment (can be personal computer, server or the network equipment etc.) or processor (processor) execute the present invention
The part steps of embodiment the method.
As shown in figure 3, the structure that Fig. 3 is the computer installation for the preferred embodiment that the present invention realizes voice transmission method is shown
It is intended to.The computer installation include at least one sending device 31, at least one processor 32, at least one processor 33,
At least one reception device 34 and at least one communication bus.Wherein, the communication bus is for realizing between these components
Connection communication.
The computer installation be it is a kind of can according to the instruction for being previously set or store, it is automatic carry out numerical value calculate with/
Or the equipment of information processing, hardware include but is not limited to microprocessor, specific integrated circuit (Application Specific
Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), number
Word processing device (Digital Signal Processor, DSP), embedded device etc..The computer installation may also include net
Network equipment and/or user equipment.Wherein, the network equipment includes but is not limited to single network server, multiple network services
The server group of device composition or being made of a large amount of hosts or network server based on cloud computing (Cloud Computing)
Cloud, wherein cloud computing is one kind of distributed computing, a super virtual computing being made of the computer of a group loose couplings
Machine.
The computer installation may be, but not limited to, any one and can be set with user by keyboard, touch tablet or acoustic control
The modes such as standby carry out the electronic product of human-computer interaction, for example, the terminals such as tablet computer, smart phone, monitoring device.
Network locating for the computer installation includes, but are not limited to internet, wide area network, Metropolitan Area Network (MAN), local area network, virtual
Dedicated network (Virtual Private Network, VPN) etc..
Wherein, the reception device 34 and the sending device 31 can be wired sending port, or wirelessly set
It is standby, for example including antenna assembly, for carrying out data communication with other equipment.
The memory 32 is for storing program code.The memory 32, which can be, does not have physical form in integrated circuit
The circuit with store function, such as RAM (Random-Access Memory, random access memory), FIFO (First In
First Out, push-up storage) etc..Alternatively, the memory 32 is also possible to the memory with physical form, such as
Memory bar, TF card (Trans-flash Card), smart media card (smart media card), safe digital card (secure
Digital card), storage facilities such as flash memory cards (flash card) etc..
The processor 33 may include one or more microprocessor, digital processing unit.The processor 33 is adjustable
With the program code stored in memory 32 to execute relevant function.For example, modules described in Fig. 2 are stored in institute
The program code in memory 32 is stated, and as performed by the processor 33, to realize a kind of voice transmission method.The processing
Device 33 is also known as central processing unit (CPU, Central Processing Unit), is one piece of ultra-large integrated circuit, is fortune
Calculate core (Core) and control core (Control Unit).
In several embodiments provided by the present invention, it should be understood that disclosed system, device and method can be with
It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the module
It divides, only a kind of logical function partition, there may be another division manner in actual implementation.
The module as illustrated by the separation member may or may not be physically separated, aobvious as module
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.Some or all of the modules therein can be selected to realize the mesh of this embodiment scheme according to the actual needs
's.
It, can also be in addition, each functional module in each embodiment of the present invention can integrate in one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of hardware adds software function module.
It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie
In the case where without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter
From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power
Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims
Variation is included in the present invention.Any attached associated diagram label in claim should not be considered as right involved in limitation to want
It asks.Furthermore, it is to be understood that one word of " comprising " does not exclude other units or steps, odd number is not excluded for plural number.It is stated in system claims
Multiple units or device can also be implemented through software or hardware by a unit or device.Second equal words are used to table
Show title, and does not indicate any particular order.
Finally it should be noted that the above examples are only used to illustrate the technical scheme of the present invention and are not limiting, although reference
Preferred embodiment describes the invention in detail, those skilled in the art should understand that, it can be to of the invention
Technical solution is modified or equivalent replacement, without departing from the spirit and scope of the technical solution of the present invention.
Claims (10)
1. a kind of voice transmission method, which is characterized in that the described method includes:
It receives the voice communication that first terminal is sent and transmits instruction, instruction is transmitted according to the voice communication and obtains voice to be transmitted
Information and the second terminal for receiving the voice messaging to be transmitted;
Obtain transmission rate when transmitting the voice messaging to be transmitted;
Judge whether the transmission rate is lower than Preset Transfer rate;
If the transmission rate is lower than the Preset Transfer rate, speech recognition is carried out to the voice messaging to be transmitted, is obtained
Speech recognition result, institute's speech recognition result include the corresponding text information of the voice messaging to be transmitted;
The text information for including by institute's speech recognition result carries out voice coding, obtains target voice information;
The target voice information is transmitted to the second terminal.
2. the method as described in claim 1, which is characterized in that institute's speech recognition result further includes the voice letter to be transmitted
The phonetic feature of breath, the phonetic feature include fundamental frequency;
The text information for including by institute's speech recognition result carries out voice coding
The phonetic feature of the corresponding text information of voice messaging to be transmitted and the voice messaging to be transmitted is subjected to language
Sound coding.
3. method according to claim 2, which is characterized in that described to judge whether the transmission rate is lower than Preset Transfer speed
After rate, the method also includes:
If the transmission rate is higher than the Preset Transfer rate, judge whether the transmission rate is lower than the first transmission rate;
If the transmission rate be lower than first transmission rate, by GIA coding standard to the voice messaging to be transmitted into
Row is encoded and is transmitted;
If the transmission rate is higher than first transmission rate, judge whether the transmission rate is lower than the second transmission rate;
If the transmission rate be lower than second transmission rate, by GSM coding standard to the voice messaging to be transmitted into
Row is encoded and is transmitted;
If the transmission rate is higher than two transmission rate, judge whether the transmission rate is lower than third transmission rate;
If the transmission rate be lower than three transmission rate, by G.728 coding standard to the voice messaging to be transmitted into
Row is encoded and is transmitted;
If the transmission rate is higher than the third transmission rate, judge whether the transmission rate is lower than the 4th transmission rate;
If the transmission rate is lower than the 4th transmission rate, by G.721 coding standard to the voice messaging to be transmitted
It is encoded and is transmitted;
If the transmission rate is higher than the 4th transmission rate, judge whether the transmission rate is lower than the 5th transmission rate;
If the transmission rate is lower than the 5th transmission rate, by G.722 coding standard to the voice messaging to be transmitted
It is encoded and is transmitted;
If the transmission rate be higher than the 5th transmission rate, by MPE coding standard to the voice messaging to be transmitted into
Row is encoded and is transmitted.
4. method as claimed in claim 3, which is characterized in that the Preset Transfer rate is 8kbit/s, first transmission
Rate is 13.2kbt/s, second transmission rate be 16kbt/s, the third transmission rate be 32kbt/s, the described 4th
Transmission rate is 64kbt/s, and the 5th transmission rate is 128kbt/s.
5. method according to any one of claims 1 to 4, which is characterized in that it is described to the voice messaging to be transmitted into
Row speech recognition includes:
The feature for extracting the voice messaging to be transmitted obtains the feature vector for indicating the voice messaging to be transmitted;
Described eigenvector is input to predetermined acoustic model, obtains the corresponding phoneme information of described eigenvector;
The phoneme information is input to preset language model, obtains the element that the phoneme information includes, the element includes
The words sequence being made of word or word;
The words sequence is decoded based on pre-set dictionary, obtains the corresponding text information of the voice messaging to be transmitted.
6. method according to any one of claims 1 to 4, which is characterized in that the method also includes:
If the transmission rate is lower than the Preset Transfer rate, Xiang Suoshu first terminal or the second terminal send enhancing net
The suggestion message of network signal strength, alternatively, sending to the second terminal, there are the reminder messages of voice transfer.
7. method as claimed in claim 6, which is characterized in that the suggestion message includes recommending connection network or recommending to move
Route.
8. a kind of speech transmission device, which is characterized in that described device includes:
Receiving module is transmitted according to the voice communication and is instructed for receiving the voice communication transmission instruction of first terminal transmission
It obtains voice messaging to be transmitted and receives the second terminal of the voice messaging to be transmitted;
Module is obtained, the transmission rate when voice messaging to be transmitted is transmitted for obtaining;
Judgment module, for judging whether the transmission rate is lower than Preset Transfer rate;
Identification module carries out the voice messaging to be transmitted if being lower than the Preset Transfer rate for the transmission rate
Speech recognition, obtains speech recognition result, and institute's speech recognition result includes the corresponding text letter of the voice messaging to be transmitted
Breath;
Coding module, the text information for including by institute's speech recognition result carry out voice coding, obtain target language message
Breath;
First transmission module, for the target voice information to be transmitted to the second terminal.
9. a kind of computer installation, which is characterized in that the computer installation includes memory and processor, and the memory is used
In storing at least one instruction, the processor is for executing at least one described instruction to realize as appointed in claim 1 to 7
Voice transmission method described in one.
10. a kind of computer readable storage medium, is stored thereon with computer instruction, it is characterised in that: the computer instruction
The voice transmission method as described in any one of claims 1 to 7 is realized when being executed by processor.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910459488.7A CN110364170B (en) | 2019-05-29 | 2019-05-29 | Voice transmission method, voice transmission device, computer device and storage medium |
PCT/CN2019/118022 WO2020238058A1 (en) | 2019-05-29 | 2019-11-13 | Voice transmission method and apparatus, computer device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910459488.7A CN110364170B (en) | 2019-05-29 | 2019-05-29 | Voice transmission method, voice transmission device, computer device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110364170A true CN110364170A (en) | 2019-10-22 |
CN110364170B CN110364170B (en) | 2024-01-30 |
Family
ID=68215394
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910459488.7A Active CN110364170B (en) | 2019-05-29 | 2019-05-29 | Voice transmission method, voice transmission device, computer device and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110364170B (en) |
WO (1) | WO2020238058A1 (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111199747A (en) * | 2020-03-05 | 2020-05-26 | 北京花兰德科技咨询服务有限公司 | Artificial intelligence communication system and communication method |
CN111245868A (en) * | 2020-03-10 | 2020-06-05 | 诺领科技(南京)有限公司 | Narrowband Internet of things voice message communication method and system |
CN111785293A (en) * | 2020-06-04 | 2020-10-16 | 杭州海康威视***技术有限公司 | Voice transmission method, device and equipment and storage medium |
WO2020238058A1 (en) * | 2019-05-29 | 2020-12-03 | 平安科技(深圳)有限公司 | Voice transmission method and apparatus, computer device and storage medium |
CN112202803A (en) * | 2020-10-10 | 2021-01-08 | 北京字节跳动网络技术有限公司 | Audio processing method, device, terminal and storage medium |
CN112822297A (en) * | 2021-04-01 | 2021-05-18 | 深圳市顺易通信息科技有限公司 | Parking lot service data transmission method and related equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08116385A (en) * | 1994-10-14 | 1996-05-07 | Hitachi Ltd | Individual information terminal equipment and voice response system |
CN103714823A (en) * | 2013-12-19 | 2014-04-09 | 同济大学 | Integrated speech coding-based adaptive underwater communication method |
WO2016119560A1 (en) * | 2015-01-29 | 2016-08-04 | ***通信集团公司 | Self-adaptive audio transmission method and device |
CN106850615A (en) * | 2017-01-24 | 2017-06-13 | 华为技术有限公司 | A kind of method of code rate control, relevant apparatus and system |
CN107066477A (en) * | 2016-12-13 | 2017-08-18 | 合网络技术(北京)有限公司 | A kind of method and device of intelligent recommendation video |
CN107770387A (en) * | 2017-10-31 | 2018-03-06 | 珠海市魅族科技有限公司 | Communication control method, device, computer installation and computer-readable recording medium |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080162150A1 (en) * | 2006-12-28 | 2008-07-03 | Vianix Delaware, Llc | System and Method for a High Performance Audio Codec |
CN102790997B (en) * | 2011-05-19 | 2017-05-10 | 中兴通讯股份有限公司 | Method and device for transmission of adaptive multi-rate (AMR) voice data |
CN109712631B (en) * | 2019-03-28 | 2019-06-28 | 南昌黑鲨科技有限公司 | Audio data transfer control method, device, system and readable storage medium storing program for executing |
CN110364170B (en) * | 2019-05-29 | 2024-01-30 | 平安科技(深圳)有限公司 | Voice transmission method, voice transmission device, computer device and storage medium |
-
2019
- 2019-05-29 CN CN201910459488.7A patent/CN110364170B/en active Active
- 2019-11-13 WO PCT/CN2019/118022 patent/WO2020238058A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08116385A (en) * | 1994-10-14 | 1996-05-07 | Hitachi Ltd | Individual information terminal equipment and voice response system |
CN103714823A (en) * | 2013-12-19 | 2014-04-09 | 同济大学 | Integrated speech coding-based adaptive underwater communication method |
WO2016119560A1 (en) * | 2015-01-29 | 2016-08-04 | ***通信集团公司 | Self-adaptive audio transmission method and device |
CN107066477A (en) * | 2016-12-13 | 2017-08-18 | 合网络技术(北京)有限公司 | A kind of method and device of intelligent recommendation video |
CN106850615A (en) * | 2017-01-24 | 2017-06-13 | 华为技术有限公司 | A kind of method of code rate control, relevant apparatus and system |
CN107770387A (en) * | 2017-10-31 | 2018-03-06 | 珠海市魅族科技有限公司 | Communication control method, device, computer installation and computer-readable recording medium |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020238058A1 (en) * | 2019-05-29 | 2020-12-03 | 平安科技(深圳)有限公司 | Voice transmission method and apparatus, computer device and storage medium |
CN111199747A (en) * | 2020-03-05 | 2020-05-26 | 北京花兰德科技咨询服务有限公司 | Artificial intelligence communication system and communication method |
CN111245868A (en) * | 2020-03-10 | 2020-06-05 | 诺领科技(南京)有限公司 | Narrowband Internet of things voice message communication method and system |
CN111245868B (en) * | 2020-03-10 | 2021-04-13 | 诺领科技(南京)有限公司 | Narrowband Internet of things voice message communication method and system |
CN111785293A (en) * | 2020-06-04 | 2020-10-16 | 杭州海康威视***技术有限公司 | Voice transmission method, device and equipment and storage medium |
CN111785293B (en) * | 2020-06-04 | 2023-04-25 | 杭州海康威视***技术有限公司 | Voice transmission method, device and equipment and storage medium |
CN112202803A (en) * | 2020-10-10 | 2021-01-08 | 北京字节跳动网络技术有限公司 | Audio processing method, device, terminal and storage medium |
CN112822297A (en) * | 2021-04-01 | 2021-05-18 | 深圳市顺易通信息科技有限公司 | Parking lot service data transmission method and related equipment |
Also Published As
Publication number | Publication date |
---|---|
CN110364170B (en) | 2024-01-30 |
WO2020238058A1 (en) | 2020-12-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110364170A (en) | Voice transmission method, device, computer installation and storage medium | |
CN110782882B (en) | Voice recognition method and device, electronic equipment and storage medium | |
CN103853703B (en) | A kind of information processing method and electronic equipment | |
CN1983909B (en) | Method and device for hiding throw-away frame | |
CN104781879B (en) | Method and apparatus for being encoded to audio signal | |
CN106504742B (en) | Synthesize transmission method, cloud server and the terminal device of voice | |
CN109599092B (en) | Audio synthesis method and device | |
CN106409283A (en) | Audio frequency-based man-machine mixed interaction system and method | |
CN109473104B (en) | Voice recognition network delay optimization method and device | |
EP4012705A1 (en) | Speech transmission method, system, and apparatus, computer readable storage medium, and device | |
CN113724683B (en) | Audio generation method, computer device and computer readable storage medium | |
CN103915097B (en) | Voice signal processing method, device and system | |
CN114338623B (en) | Audio processing method, device, equipment and medium | |
CN110119514A (en) | The instant translation method of information, device and system | |
CN107731232A (en) | Voice translation method and device | |
CN110797004B (en) | Data transmission method and device | |
CN114333862B (en) | Audio encoding method, decoding method, device, equipment, storage medium and product | |
CN112712793A (en) | ASR (error correction) method based on pre-training model under voice interaction and related equipment | |
RU2005127871A (en) | QUANTIZING CLASSES FOR DISTRIBUTED SPEECH RECOGNITION | |
JP4437011B2 (en) | Speech encoding device | |
CN111862967B (en) | Voice recognition method and device, electronic equipment and storage medium | |
CN115713939A (en) | Voice recognition method and device and electronic equipment | |
CN113658581B (en) | Acoustic model training method, acoustic model processing method, acoustic model training device, acoustic model processing equipment and storage medium | |
CN114842857A (en) | Voice processing method, device, system, equipment and storage medium | |
CN116935851A (en) | Method and device for voice conversion, voice conversion system and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |