CN110364170A - Voice transmission method, device, computer installation and storage medium - Google Patents

Voice transmission method, device, computer installation and storage medium Download PDF

Info

Publication number
CN110364170A
CN110364170A CN201910459488.7A CN201910459488A CN110364170A CN 110364170 A CN110364170 A CN 110364170A CN 201910459488 A CN201910459488 A CN 201910459488A CN 110364170 A CN110364170 A CN 110364170A
Authority
CN
China
Prior art keywords
transmission rate
transmitted
voice
voice messaging
transmission
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910459488.7A
Other languages
Chinese (zh)
Other versions
CN110364170B (en
Inventor
邹昆伦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910459488.7A priority Critical patent/CN110364170B/en
Publication of CN110364170A publication Critical patent/CN110364170A/en
Priority to PCT/CN2019/118022 priority patent/WO2020238058A1/en
Application granted granted Critical
Publication of CN110364170B publication Critical patent/CN110364170B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/0001Systems modifying transmission characteristics according to link quality, e.g. power backoff
    • H04L1/0002Systems modifying transmission characteristics according to link quality, e.g. power backoff by adapting the transmission rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The present invention provides a kind of voice transmission method, comprising: receives the voice communication that first terminal is sent and transmits instruction, obtains voice messaging to be transmitted according to voice communication transmission instruction and receive the second terminal of the voice messaging to be transmitted;Obtain transmission rate when transmitting the voice messaging to be transmitted;Judge whether the transmission rate is lower than Preset Transfer rate;If the transmission rate is lower than the Preset Transfer rate, speech recognition is carried out to the voice messaging to be transmitted, obtains speech recognition result, institute's speech recognition result includes the corresponding text information of the voice messaging to be transmitted;The text information for including by institute's speech recognition result carries out voice coding, obtains target voice information;The target voice information is transmitted to the second terminal.The invention also discloses a kind of speech transmission device, computer installation and computer readable storage mediums.The quality of voice communication can be improved in the present invention.

Description

Voice transmission method, device, computer installation and storage medium
Technical field
The present invention relates to field of communication technology more particularly to a kind of voice transmission method, device, computer installation and storages Medium.
Background technique
With the development of computer technology and popularizing for mobile terminal, various voice communication products are more and more, these languages Sound converses product when Network status is preferable, and speech quality is also preferable, may be in voice transfer when Network status is bad Occur reducing the quality of voice communication due to transmitting the situations such as discontinuous caused sound Caton, influencing user experience.
Summary of the invention
In view of the foregoing, it is necessary to which a kind of voice transmission method, device, computer installation and storage medium, energy are provided Enough improve the quality of voice communication.
The present invention provides a kind of voice transmission method, which comprises
It receives the voice communication that first terminal is sent and transmits instruction, obtained according to voice communication transmission instruction to be transmitted Voice messaging and the second terminal for receiving the voice messaging to be transmitted;
Obtain transmission rate when transmitting the voice messaging to be transmitted;
Judge whether the transmission rate is lower than Preset Transfer rate;
If the transmission rate is lower than the Preset Transfer rate, speech recognition is carried out to the voice messaging to be transmitted, Speech recognition result is obtained, institute's speech recognition result includes the corresponding text information of the voice messaging to be transmitted;
The text information for including by institute's speech recognition result carries out voice coding, obtains target voice information;
The target voice information is transmitted to the second terminal.
In the optional implementation of the present invention, institute's speech recognition result further includes that the voice of the voice messaging to be transmitted is special Sign, the phonetic feature includes fundamental frequency;
The text information for including by institute's speech recognition result carries out voice coding
By the phonetic feature of the corresponding text information of voice messaging to be transmitted and the voice messaging to be transmitted into Row voice coding.
It is described to judge whether the transmission rate is lower than after Preset Transfer rate in the optional implementation of the present invention, it is described Method further include:
If the transmission rate is higher than the Preset Transfer rate, judge whether the transmission rate is lower than the first transmission speed Rate;
If the transmission rate is lower than first transmission rate, the voice to be transmitted is believed by GIA coding standard Breath is encoded and is transmitted;
If the transmission rate is higher than first transmission rate, judge whether the transmission rate is lower than the second transmission speed Rate;
If the transmission rate is lower than second transmission rate, the voice to be transmitted is believed by GSM coding standard Breath is encoded and is transmitted;
If the transmission rate is higher than two transmission rate, judge whether the transmission rate is lower than third transmission speed Rate;
If the transmission rate is lower than three transmission rate, the voice to be transmitted is believed by G.728 coding standard Breath is encoded and is transmitted;
If the transmission rate is higher than the third transmission rate, judge whether the transmission rate is lower than the 4th transmission speed Rate;
If the transmission rate is lower than the 4th transmission rate, by G.721 coding standard to the voice to be transmitted Information is encoded and is transmitted;
If the transmission rate is higher than the 4th transmission rate, judge whether the transmission rate is lower than the 5th transmission speed Rate;
If the transmission rate is lower than the 5th transmission rate, by G.722 coding standard to the voice to be transmitted Information is encoded and is transmitted;
If the transmission rate is higher than the 5th transmission rate, the voice to be transmitted is believed by MPE coding standard Breath is encoded and is transmitted.
In the optional implementation of the present invention, the Preset Transfer rate is 8kbit/s, and first transmission rate is 13.2kbt/s, second transmission rate are 16kbt/s, and the third transmission rate is 32kbt/s, the 4th transmission speed Rate is 64kbt/s, and the 5th transmission rate is 128kbt/s.
It is described to include: to the voice messaging progress speech recognition to be transmitted in alternative embodiment of the present invention
The feature for extracting the voice messaging to be transmitted obtains the feature vector for indicating the voice messaging to be transmitted;
Described eigenvector is input to predetermined acoustic model, obtains the corresponding phoneme information of described eigenvector;
The phoneme information is input to preset language model, obtains the element that the phoneme information includes, the element Including the words sequence being made of word or word;
The words sequence is decoded based on pre-set dictionary, obtains the corresponding text letter of the voice messaging to be transmitted Breath.
In alternative embodiment of the present invention, the method also includes:
If the transmission rate is lower than the Preset Transfer rate, Xiang Suoshu first terminal or the second terminal, which are sent, to be increased The suggestion message of strong network signal intensity, alternatively, sending to the second terminal, there are the reminder messages of voice transfer.
In alternative embodiment of the present invention, the suggestion message includes recommending connection network or recommendation mobile route.
The present invention also provides a kind of speech transmission device, described device includes:
Receiving module is transmitted for receiving the voice communication transmission instruction of first terminal transmission according to the voice communication Instruction obtains voice messaging to be transmitted and receives the second terminal of the voice messaging to be transmitted;
Module is obtained, the transmission rate when voice messaging to be transmitted is transmitted for obtaining;
Judgment module, for judging whether the transmission rate is lower than Preset Transfer rate;
Identification module, if being lower than the Preset Transfer rate for the transmission rate, to the voice messaging to be transmitted Speech recognition is carried out, obtains speech recognition result, institute's speech recognition result includes the corresponding text of the voice messaging to be transmitted Word information;
Coding module, the text information for including by institute's speech recognition result carry out voice coding, obtain target language Message breath;
First transmission module, for the target voice information to be transmitted to the second terminal.
In alternative embodiment of the present invention, institute's speech recognition result further includes that the voice of the voice messaging to be transmitted is special Sign, the phonetic feature includes fundamental frequency;
The text information that institute's speech recognition result includes is carried out voice coding by the coding module
By the phonetic feature of the corresponding text information of voice messaging to be transmitted and the voice messaging to be transmitted into Row voice coding.
In alternative embodiment of the present invention, described device further includes the second transmission module, and second transmission module is used for:
Judge whether the transmission rate is lower than after Preset Transfer rate, if the transmission rate is higher than the default biography Defeated rate, judges whether the transmission rate is lower than the first transmission rate;
If the transmission rate is lower than first transmission rate, the voice to be transmitted is believed by GIA coding standard Breath is encoded and is transmitted;
If the transmission rate is higher than first transmission rate, judge whether the transmission rate is lower than the second transmission speed Rate;
If the transmission rate is lower than second transmission rate, the voice to be transmitted is believed by GSM coding standard Breath is encoded and is transmitted;
If the transmission rate is higher than two transmission rate, judge whether the transmission rate is lower than third transmission speed Rate;
If the transmission rate is lower than three transmission rate, the voice to be transmitted is believed by G.728 coding standard Breath is encoded and is transmitted;
If the transmission rate is higher than the third transmission rate, judge whether the transmission rate is lower than the 4th transmission speed Rate;
If the transmission rate is lower than the 4th transmission rate, by G.721 coding standard to the voice to be transmitted Information is encoded and is transmitted;
If the transmission rate is higher than the 4th transmission rate, judge whether the transmission rate is lower than the 5th transmission speed Rate;
If the transmission rate is lower than the 5th transmission rate, by G.722 coding standard to the voice to be transmitted Information is encoded and is transmitted;
If the transmission rate is higher than the 5th transmission rate, the voice to be transmitted is believed by MPE coding standard Breath is encoded and is transmitted.
In alternative embodiment of the present invention, the Preset Transfer rate is 8kbit/s, and first transmission rate is 13.2kbt/s, second transmission rate are 16kbt/s, and the third transmission rate is 32kbt/s, the 4th transmission speed Rate is 64kbt/s, and the 5th transmission rate is 128kbt/s.
In alternative embodiment of the present invention, the identification module carries out speech recognition packet to the voice messaging to be transmitted It includes:
The feature for extracting the voice messaging to be transmitted obtains the feature vector for indicating the voice messaging to be transmitted;
Described eigenvector is input to predetermined acoustic model, obtains the corresponding phoneme information of described eigenvector;
The phoneme information is input to preset language model, obtains the element that the phoneme information includes, the element Including the words sequence being made of word or word;
The words sequence is decoded based on pre-set dictionary, obtains the corresponding text letter of the voice messaging to be transmitted Breath.
In alternative embodiment of the present invention, described device further include:
Reminding module, if being lower than the Preset Transfer rate, Xiang Suoshu first terminal or described for the transmission rate Second terminal sends the suggestion message of enhancing network signal intensity, alternatively, sending to the second terminal, there are voice transfers Reminder message.
In alternative embodiment of the present invention, the suggestion message includes recommending connection network or recommendation mobile route.
The present invention also provides a kind of computer installation, the computer installation includes memory and processor, the storage Device is for storing at least one instruction, and the processor is for executing at least one described instruction to realize institute in any embodiment The voice transmission method stated.
The present invention also provides a kind of computer readable storage medium, the computer-readable recording medium storage has at least one A instruction, at least one described instruction realize voice transmission method described in any embodiment when being executed by processor.
Being found out by above technical scheme, the present invention transmits instruction by receiving the voice communication that first terminal is sent, according to The voice communication transmission instruction obtains voice messaging to be transmitted and receives the second terminal of the voice messaging to be transmitted;It obtains Take transmission rate when transmitting the voice messaging to be transmitted;Judge whether the transmission rate is lower than Preset Transfer rate;If The transmission rate is lower than the Preset Transfer rate, carries out speech recognition to the voice messaging to be transmitted, obtains voice and know Not as a result, institute's speech recognition result includes the corresponding text information of the voice messaging to be transmitted;By the speech recognition knot The text information that fruit includes carries out voice coding, obtains target voice information;The target voice information is transmitted to described Two terminals.Since when transmission rate is lower than Preset Transfer rate, the text information for including by speech recognition result is encoded, The voice content for remaining voice messaging to be transmitted, the information encoded when reducing voice coding, to be conducive to voice communication The purpose for improving the quality of voice communication is realized in the call of Shi Jinhang smoothness, avoids occurring when voice communication in Caton or call It is disconnected.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.
Fig. 1 is a kind of flow chart of voice transmission method provided in an embodiment of the present invention;
Fig. 2 is a kind of functional block diagram of speech transmission device provided in an embodiment of the present invention;
Fig. 3 is the structural schematic diagram of the computer installation for the preferred embodiment that the present invention realizes voice transmission method.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real Applying mode, the present invention is described in further detail.
As shown in FIG. 1, FIG. 1 is a kind of flow charts of voice transmission method provided in an embodiment of the present invention.According to different Demand, the sequence of step can change in the flow chart, and certain steps can be omitted.
S11, receive first terminal send voice communication transmit instruction, according to the voice communication transmit instruction obtain to Transmitting voice information and the second terminal for receiving the voice messaging to be transmitted.
In the present embodiment, the first terminal and the second terminal can be identical electronic equipment or different electricity Sub- equipment, for example, the first terminal and the second terminal are all mobile phone, alternatively, the first terminal is mobile phone, described the Two terminals are computer.
The voice communication transmission instruction is the instruction for sending voice messaging between two terminals.
In the present embodiment, the first terminal is the sender that voice messaging is sent, i.e. calling party, the second terminal For the recipient of voice messaging, i.e. callee.
In a kind of possible embodiment, instruction is transmitted according to the voice communication and obtains voice messaging to be transmitted and reception The second terminal of the voice messaging to be transmitted include: obtain voice communication transmission instruction instruction voice messaging to be transmitted and Receive the second terminal of voice messaging to be transmitted.
For example, the recipient comprising voice messaging to be transmitted and voice messaging to be transmitted in voice communication transmission instruction, Receive the second terminal of transmitting voice information.
S12 obtains transmission rate when transmitting the voice messaging to be transmitted.
Transmission rate, that is, the network transmission speed refers to that host on the computer network transmits number on digital channel According to rate.For example, transmission rate is 16bit/s, the data volume of transmission 16bit per second is indicated.
In the present embodiment, the transmission rate obtained when transmitting the voice messaging to be transmitted includes: acquisition first terminal The receiving velocity of the transmission rate of first terminal or second terminal when transmitting voice messaging to be transmitted to second terminal.
For example, obtaining the transmission rate of calling party, which reflects master when carrying out voice transfer by communication software Message transmission rate when direction base station/server being made to send voice messaging;Alternatively, carrying out voice biography by communication such as software When defeated, the transmission rate of callee is obtained, which reflects message transmission rate when callee receives voice messaging.
S13, judges whether the transmission rate is lower than Preset Transfer rate.
In the present embodiment, judge transmission rate whether be lower than Preset Transfer rate for determine carry out voice transfer when, it is logical Whether letter both sides are in poor network environment, if will affect speech quality.
The occurrence of the Preset Transfer rate, which can according to need, to be preset.
Optionally, the Preset Transfer rate is 8kbit/s.
Optionally, in an alternative embodiment of the invention, described to judge whether the transmission rate is lower than Preset Transfer rate Later, the method also includes:
If the transmission rate is higher than the Preset Transfer rate, judge whether the transmission rate is lower than the first transmission speed Rate;
If the transmission rate is lower than first transmission rate, the voice to be transmitted is believed by GIA coding standard Breath is encoded and is transmitted;
If the transmission rate is higher than first transmission rate, judge whether the transmission rate is lower than the second transmission speed Rate;
If the transmission rate is lower than second transmission rate, the voice to be transmitted is believed by GSM coding standard Breath is encoded and is transmitted;
If the transmission rate is higher than two transmission rate, judge whether the transmission rate is lower than third transmission speed Rate;
If the transmission rate is lower than three transmission rate, the voice to be transmitted is believed by G.728 coding standard Breath is encoded and is transmitted;
If the transmission rate is higher than the third transmission rate, judge whether the transmission rate is lower than the 4th transmission speed Rate;
If the transmission rate is lower than the 4th transmission rate, by G.721 coding standard to the voice to be transmitted Information is encoded and is transmitted;
If the transmission rate is higher than the 4th transmission rate, judge whether the transmission rate is lower than the 5th transmission speed Rate;
If the transmission rate is lower than the 5th transmission rate, by G.722 coding standard to the voice to be transmitted Information is encoded and is transmitted;
If the transmission rate is higher than the 5th transmission rate, the voice to be transmitted is believed by MPE coding standard Breath is encoded and is transmitted.
Coding is the process that information is indicated with code, in digital encoding procedure, extract the sound of certain point frequency values and The energy value of the frequency simultaneously passes through digital quantization, and relative to the signal of nature, any digital audio encoding scheme is all damaged , the coding mode of highest fidelity is exactly pcm encoder at present, by pcm encoder can with infinite degree close to original sound, but It is PCM bulky, is unfavorable for transmitting, therefore during audio transmission, we can carry out the coding of other forms to audio, To compress to audio, the fluency of transmission is improved.
In the present embodiment, voice messaging is encoded using different encryption algorithms based on different coding standards.
It is encoded for example, being realized by SB-ADPCM algorithm based on G.722 coding standard, is realized by ADPCM algorithm It is encoded based on G.721 coding standard, is realized by LD-CELP algorithm and encoded based on G.728 coding standard, passed through RPE-LTP algorithm is realized to be encoded based on GSM coding standard, is realized by VSELPC algorithm and is carried out based on GIA coding standard Coding.
In the present embodiment, it under different transmission rate situations, is encoded using different coding standards, thus At different transmission rates in transmission process, more fully voice messaging can be retained as far as possible, improve the quality of sound.
Optionally, the first transmission rate is 13.2kbt/s, and second transmission rate is 16kbt/s, the third transmission Rate is 32kbt/s, and the 4th transmission rate is 64kbt/s, and the 5th transmission rate is 128kbt/s.
S14 carries out voice knowledge to the voice messaging to be transmitted if the transmission rate is lower than the Preset Transfer rate Not, speech recognition result is obtained, institute's speech recognition result includes the corresponding text information of the voice messaging to be transmitted.
In the present embodiment, speech recognition, which refers to, converts voice signals into corresponding text information.
Specifically, treating transmitting voice information by speech recognition technology carries out speech recognition.
Optionally, in an alternative embodiment of the invention, described to include: to the voice messaging progress speech recognition to be transmitted
The feature for extracting the voice messaging to be transmitted obtains the feature vector for indicating the voice messaging to be transmitted;
Described eigenvector is input to predetermined acoustic model, obtains the corresponding phoneme information of described eigenvector;
The phoneme information is input to preset language model, obtains the element that the phoneme information includes, the element Including the words sequence being made of word or word;
The words sequence is decoded based on pre-set dictionary, obtains the corresponding text letter of the voice messaging to be transmitted Breath.
The predetermined acoustic model and preset language model can according to need selection.
S15, the text information for including by institute's speech recognition result carry out voice coding, obtain target voice information.
It is to compile text information that the text information that speech recognition result includes, which is carried out voice coding, in the present embodiment Code, encodes sampled voice different from traditional, can greatly reduce data volume when transmission.
Traditional code mode is that sample code is carried out to the frequency and amplitude of sound, the data volume that when traditional code transmits Calculation mode is as follows:
Data volume (byte per second)=channel number/8 sample rate (Hz) * sample size (bit) *
By taking sample rate 16K monophonic as an example: the voice data size of 1s are as follows: 16000*16*1/8=32Kb
In the present embodiment, the voice data of target voice information transmission per second after being encoded are as follows: passed when voice coding Defeated data volume (byte per second)=number of characters * per second that says corresponds to character code size (bit), wherein per second to say number of characters For speech recognition to voice messaging in number of characters per second, different number of characters (such as Chinese character) has corresponding character code big It is small, corresponding character code size can be determined according to preset characters with the corresponding relationship of character code size.
By taking voice messaging to be transmitted is Chinese as an example, common people's Chinese character per second can say is at 10 hereinafter, encoding of chinese characters For 2 characters/Chinese character, then the data volume of 1s are as follows: 10*2=20bit can be seemed, when the present embodiment carries out transmission of speech information, The data volume of transmission per second greatly reduces.
Optionally, in an alternative embodiment of the invention, institute's speech recognition result further includes the voice messaging to be transmitted Phonetic feature, the phonetic feature includes fundamental frequency;
The text information for including by institute's speech recognition result carries out voice coding
By the phonetic feature of the corresponding text information of voice messaging to be transmitted and the voice messaging to be transmitted into Row voice coding.
Phonetic feature refers to the information of reflection phonetic feature.For example, the sound intensity of voice, loudness or pitch.
For usual people when sending out sound turbid, air-flow makes vocal cords generate the vibration of relaxation vibrating type by glottis, generates one paracycle Air pulse, this air-flow excitation sound channel just generate voiced sound, also known as speech sound, it carries most of energy in voice. The frequency of this vocal cord vibration becomes fundamental frequency.
The length of fundamental frequency and vocal cords, thin and thick, toughness, stiffness and pronunciation habit etc. are related, can be largely anti- Answer personal feature.Therefore, it combines fundamental frequency to be encoded in the present embodiment, can guarantee the same of content the accurate transmission When, utmostly retain the feature of sound.
In the present embodiment, the fundamental frequency of voice messaging can be obtained by Cepstrum Method.
By the corresponding text information of voice messaging to be transmitted and the voice messaging to be transmitted in the present embodiment It is to encode text information combination phonetic feature that phonetic feature, which carries out voice coding, also different from traditional to sampled voice It is encoded, data volume when transmission can be greatly reduced.
In the present embodiment, the voice data of target voice information transmission per second after being encoded are as follows: passed when voice coding Defeated data volume (byte per second)=number of characters * per second that says corresponds to character code size (bit)+phonetic feature (according to what is extracted Depending on phonetic feature, such as 10bit/s), wherein it is per second say number of characters be speech recognition to voice messaging in character per second Number, different number of characters (such as Chinese character) have corresponding character code size, can be according to preset characters and character code size Corresponding relationship determines corresponding character code size.
By taking voice messaging to be transmitted is Chinese as an example, common people's Chinese character per second can say is at 10 hereinafter, encoding of chinese characters For 2 characters/Chinese character, then the data volume of 1s are as follows: 10*2+10=30bit can be seemed, and the present embodiment carries out transmission of speech information When, the data volume of transmission per second greatly reduces.
The target voice information is transmitted to the second terminal by S16.
In a kind of alternative embodiment, after second terminal receives target voice information, target voice information is carried out Decoding, i.e., be reduced into voice for the text information received (or text information and phonetic feature).
In a kind of alternative embodiment, if there is no content in voice after reduction, filled by white noise.Wherein, white Noise is a Duan Shengyin, specifically, white noise is power spectral density equally distributed noise in entire frequency domain.
By filling in the sound of reduction by white noise, the language after reduction can be listened by second terminal to avoid user When sound, maloperation caused by sound Shi Yiwei voice interruption (such as exiting) is not being heard.
Through this embodiment, although such as audio, volume feature are lost in an encoding process, in the feelings that network is very poor Under condition, still be able to greatly retain voice content, avoid occurring when voice communication voice intermittently, lose voice content very The situation that can not extremely converse.
In an alternative embodiment of the invention, the method also includes:
If the transmission rate is lower than the Preset Transfer rate, Xiang Suoshu first terminal or the second terminal, which are sent, to be increased The suggestion message of strong network signal intensity, alternatively, sending to the second terminal, there are the reminder messages of voice transfer.
In the present embodiment, it when transmission rate is lower than Preset Transfer rate, sends and increases to first terminal or second terminal How the suggestion message of strong network signal intensity makes first terminal or second terminal enhance the suggestion of network signal if specifically can be, To which transmission rate when being conducive to transmit voice messaging to be transmitted between first terminal and second terminal is higher, and then improve language The quality of sound call.
Optionally, the suggestion message includes recommending connection network or recommendation mobile route.
It is described to recommend to connect network to be that other recommended to first terminal or second terminal connect net in the present embodiment Network.The recommendation shift position, which refers to, is moved to that position can make first terminal or second eventually for first terminal or second terminal The network signal at end enhances.
Further, in an alternative embodiment of the invention, recommendation connection network, the method can be obtained in the following manner Further include:
The network-connectable around first terminal or second terminal is obtained, it is strong to obtain network signal in the network-connectable The network that degree is greater than the network signal intensity threshold value is to recommend connection network;Or
The network-connectable around first terminal or second terminal is obtained, it is strong to obtain network signal in the network-connectable Strongest network is spent to recommend connection network;Or
The network-connectable around first terminal or second terminal is obtained, the safety net in the network-connectable is obtained Network, obtaining network signal intensity in the secure network to be greater than the network of the network signal intensity threshold value is recommendation network;Or Person
The network-connectable around first terminal or second terminal is obtained, the history connection in the network-connectable is obtained Network, obtaining network signal intensity in the web-based history to be greater than the network of the network signal intensity threshold value is recommendation network.
Wherein, the history connection network refers to the network that first terminal or second terminal once connected.
The network for being greater than network signal intensity threshold value by obtaining network signal intensity in secure network is recommendation network, from And safe network can be got, it is connected to first terminal or second terminal in safe network, avoids that there are network securitys Problem.
Further, in an alternative embodiment of the invention, recommendation mobile route, the side can be obtained in the following manner Method further include:
It is available around first position and first terminal locating for acquisition first terminal or second terminal or second terminal Connect network;
The second position of connection network can be used described in acquisition;
Using the first position as initial position, the second position obtains the initial position as final position Mobile route between the final position is the recommendation mobile route.
When the distance of first terminal and second terminal is closer with network-connectable, it is strong more to obtain better network signal Degree.For example, can more obtain better network signal intensity when closer apart from router.
In the present embodiment, it obtains and recommends mobile route, first terminal can be conducive to or second terminal is moved, from And make first terminal or second terminal that there is better network signal intensity, be conducive to pass between first terminal and second terminal Transmission rate when defeated voice messaging to be transmitted is higher, and then improves the quality of voice communication.
A kind of voice transmission method provided by the invention receives the voice communication that first terminal is sent and transmits instruction, according to The voice communication transmission instruction obtains voice messaging to be transmitted and receives the second terminal of the voice messaging to be transmitted;It obtains Take transmission rate when transmitting the voice messaging to be transmitted;Judge whether the transmission rate is lower than Preset Transfer rate;If The transmission rate is lower than the Preset Transfer rate, carries out speech recognition to the voice messaging to be transmitted, obtains voice and know Not as a result, institute's speech recognition result includes the corresponding text information of the voice messaging to be transmitted;By the speech recognition knot The text information that fruit includes carries out voice coding, obtains target voice information;The target voice information is transmitted to described Two terminals.Since when transmission rate is lower than Preset Transfer rate, the text information for including by speech recognition result is encoded, The voice content for remaining voice messaging to be transmitted, the information encoded when reducing voice coding, to be conducive to voice communication The purpose for improving the quality of voice communication is realized in the call of Shi Jinhang smoothness, avoids occurring when voice communication in Caton or call It is disconnected.
As shown in Fig. 2, Fig. 2 is a kind of functional block diagram of speech transmission device provided in an embodiment of the present invention.Institute's predicate Sound transmitting device includes receiving module 210, obtains module 220, judgment module 230, identification module 240, coding module 250 and the One transmission module 260.The so-called module of the present invention refers to that one kind performed by processor and can complete fixed function Series of computation machine program segment, be stored in the memory of computer equipment.In the present embodiment, the function about each module It can will be described in detail in subsequent embodiment.
Receiving module 210 is passed for receiving the voice communication transmission instruction of first terminal transmission according to the voice communication Defeated instruction obtains voice messaging to be transmitted and receives the second terminal of the voice messaging to be transmitted.
In the present embodiment, the first terminal and the second terminal can be identical electronic equipment or different electricity Sub- equipment, for example, the first terminal and the second terminal are all mobile phone, alternatively, the first terminal is mobile phone, described the Two terminals are computer.
The voice communication transmission instruction is the instruction for sending voice messaging between two terminals.
In the present embodiment, the first terminal is the sender that voice messaging is sent, i.e. calling party, the second terminal For the recipient of voice messaging, i.e. callee.
In a kind of possible embodiment, instruction is transmitted according to the voice communication and obtains voice messaging to be transmitted and reception The second terminal of the voice messaging to be transmitted include: obtain voice communication transmission instruction instruction voice messaging to be transmitted and Receive the second terminal of voice messaging to be transmitted.
For example, the recipient comprising voice messaging to be transmitted and voice messaging to be transmitted in voice communication transmission instruction, Receive the second terminal of transmitting voice information.
Module 220 is obtained, the transmission rate when voice messaging to be transmitted is transmitted for obtaining.
Transmission rate, that is, the network transmission speed refers to that host on the computer network transmits number on digital channel According to rate.For example, transmission rate is 16bit/s, the data volume of transmission 16bit per second is indicated.
In the present embodiment, the transmission rate obtained when transmitting the voice messaging to be transmitted includes: acquisition first terminal The receiving velocity of the transmission rate of first terminal or second terminal when transmitting voice messaging to be transmitted to second terminal.
For example, obtaining the transmission rate of calling party, which reflects master when carrying out voice transfer by communication software Message transmission rate when direction base station/server being made to send voice messaging;Alternatively, carrying out voice biography by communication such as software When defeated, the transmission rate of callee is obtained, which reflects message transmission rate when callee receives voice messaging.
Judgment module 230, for judging whether the transmission rate is lower than Preset Transfer rate.
In the present embodiment, judge transmission rate whether be lower than Preset Transfer rate for determine carry out voice transfer when, it is logical Whether letter both sides are in poor network environment, if will affect speech quality.
The occurrence of the Preset Transfer rate, which can according to need, to be preset.
Optionally, the Preset Transfer rate is 8kbit/s.
Identification module 240 believes the voice to be transmitted if being lower than the Preset Transfer rate for the transmission rate Breath carries out speech recognition, obtains speech recognition result, institute's speech recognition result includes that the voice messaging to be transmitted is corresponding Text information.
In the present embodiment, speech recognition, which refers to, converts voice signals into corresponding text information.
Specifically, treating transmitting voice information by speech recognition technology carries out speech recognition.
Optionally, in an alternative embodiment of the invention, the identification module 240 carries out language to the voice messaging to be transmitted Sound identifies
The feature for extracting the voice messaging to be transmitted obtains the feature vector for indicating the voice messaging to be transmitted;
Described eigenvector is input to predetermined acoustic model, obtains the corresponding phoneme information of described eigenvector;
The phoneme information is input to preset language model, obtains the element that the phoneme information includes, the element Including the words sequence being made of word or word;
The words sequence is decoded based on pre-set dictionary, obtains the corresponding text letter of the voice messaging to be transmitted Breath.
The predetermined acoustic model and preset language model can according to need selection.
Coding module 250, the text information for including by institute's speech recognition result carry out voice coding, obtain target Voice messaging.
The text information that speech recognition result includes voice coding is carried out in the present embodiment to compile text information Code, encodes sampled voice different from traditional, can greatly reduce data volume when transmission.
Traditional code mode is that sample code is carried out to the frequency and amplitude of sound, the data volume that when traditional code transmits Calculation mode is as follows:
Data volume (byte per second)=channel number/8 sample rate (Hz) * sample size (bit) *
By taking sample rate 16K monophonic as an example: the voice data size of 1s are as follows: 16000*16*1/8=32Kb
In the present embodiment, the voice data of target voice information transmission per second after being encoded are as follows: passed when voice coding Defeated data volume (byte per second)=number of characters * per second that says corresponds to character code size (bit), wherein per second to say number of characters For speech recognition to voice messaging in number of characters per second, different number of characters (such as Chinese character) has corresponding character code big It is small, corresponding character code size can be determined according to preset characters with the corresponding relationship of character code size.
By taking voice messaging to be transmitted is Chinese as an example, common people's Chinese character per second can say is at 10 hereinafter, encoding of chinese characters For 2 characters/Chinese character, then the data volume of 1s are as follows: 10*2=20bit can be seemed, when the present embodiment carries out transmission of speech information, The data volume of transmission per second greatly reduces.
Optionally, in an alternative embodiment of the invention, institute's speech recognition result further includes the voice messaging to be transmitted Phonetic feature, the phonetic feature includes fundamental frequency;
The text information that institute's speech recognition result includes is carried out voice coding by the coding module 250
By the phonetic feature of the corresponding text information of voice messaging to be transmitted and the voice messaging to be transmitted into Row voice coding.
Phonetic feature refers to the information of reflection phonetic feature.For example, the sound intensity of voice, loudness or pitch.
For usual people when sending out sound turbid, air-flow makes vocal cords generate the vibration of relaxation vibrating type by glottis, generates one paracycle Air pulse, this air-flow excitation sound channel just generate voiced sound, also known as speech sound, it carries most of energy in voice. The frequency of this vocal cord vibration becomes fundamental frequency.
The length of fundamental frequency and vocal cords, thin and thick, toughness, stiffness and pronunciation habit etc. are related, can be largely anti- Answer personal feature.Therefore, it combines fundamental frequency to be encoded in the present embodiment, can guarantee the same of content the accurate transmission When, utmostly retain the feature of sound.
In the present embodiment, the fundamental frequency of voice messaging can be obtained by Cepstrum Method.
By the corresponding text information of voice messaging to be transmitted and the voice messaging to be transmitted in the present embodiment It is to encode text information combination phonetic feature that phonetic feature, which carries out voice coding, also different from traditional to sampled voice It is encoded, data volume when transmission can be greatly reduced.
In the present embodiment, the voice data of target voice information transmission per second after being encoded are as follows: passed when voice coding Defeated data volume (byte per second)=number of characters * per second that says corresponds to character code size (bit)+phonetic feature (according to what is extracted Depending on phonetic feature, such as 10bit/s), wherein it is per second say number of characters be speech recognition to voice messaging in character per second Number, different number of characters (such as Chinese character) have corresponding character code size, can be according to preset characters and character code size Corresponding relationship determines corresponding character code size.
By taking voice messaging to be transmitted is Chinese as an example, common people's Chinese character per second can say is at 10 hereinafter, encoding of chinese characters For 2 characters/Chinese character, then the data volume of 1s are as follows: 10*2+10=30bit can be seemed, and the present embodiment carries out transmission of speech information When, the data volume of transmission per second greatly reduces.
First transmission module 260, for the target voice information to be transmitted to the second terminal.
In a kind of alternative embodiment, after second terminal receives target voice information, target voice information is carried out Decoding, i.e., be reduced into voice for the text information received (or text information and phonetic feature).
In a kind of alternative embodiment, if there is no content in voice after reduction, filled by white noise.Wherein, white Noise is a Duan Shengyin, specifically, white noise is power spectral density equally distributed noise in entire frequency domain.
By filling in the sound of reduction by white noise, the language after reduction can be listened by second terminal to avoid user When sound, maloperation caused by sound Shi Yiwei voice interruption (such as exiting) is not being heard.
Through this embodiment, although such as audio, volume feature are lost in an encoding process, in the feelings that network is very poor Under condition, still be able to greatly retain voice content, avoid occurring when voice communication voice intermittently, lose voice content very The situation that can not extremely converse.
In an alternative embodiment of the invention, described device further include:
Reminding module, if being lower than the Preset Transfer rate, Xiang Suoshu first terminal or described for the transmission rate Second terminal sends the suggestion message of enhancing network signal intensity, alternatively, sending to the second terminal, there are voice transfers Reminder message.
In the present embodiment, it when transmission rate is lower than Preset Transfer rate, sends and increases to first terminal or second terminal How the suggestion message of strong network signal intensity makes first terminal or second terminal enhance the suggestion of network signal if specifically can be, To which transmission rate when being conducive to transmit voice messaging to be transmitted between first terminal and second terminal is higher, and then improve language The quality of sound call.
Optionally, the suggestion message includes recommending connection network or recommendation mobile route.
It is described to recommend to connect network to be that other recommended to first terminal or second terminal connect net in the present embodiment Network.The recommendation shift position, which refers to, is moved to that position can make first terminal or second eventually for first terminal or second terminal The network signal at end enhances.
Further, in an alternative embodiment of the invention, it can be obtained by recommending module and recommend connection network, recommending module For:
The network-connectable around first terminal or second terminal is obtained, it is strong to obtain network signal in the network-connectable The network that degree is greater than the network signal intensity threshold value is to recommend connection network;Or
The network-connectable around first terminal or second terminal is obtained, it is strong to obtain network signal in the network-connectable Strongest network is spent to recommend connection network;Or
The network-connectable around first terminal or second terminal is obtained, the safety net in the network-connectable is obtained Network, obtaining network signal intensity in the secure network to be greater than the network of the network signal intensity threshold value is recommendation network;Or Person
The network-connectable around first terminal or second terminal is obtained, the history connection in the network-connectable is obtained Network, obtaining network signal intensity in the web-based history to be greater than the network of the network signal intensity threshold value is recommendation network.
Wherein, the history connection network refers to the network that first terminal or second terminal once connected.
The network for being greater than network signal intensity threshold value by obtaining network signal intensity in secure network is recommendation network, from And safe network can be got, it is connected to first terminal or second terminal in safe network, avoids that there are network securitys Problem.
Further, in an alternative embodiment of the invention, it can also be obtained by recommending module and recommend mobile route, it is described Recommending module is also used to:
It is available around first position and first terminal locating for acquisition first terminal or second terminal or second terminal Connect network;
The second position of connection network can be used described in acquisition;
Using the first position as initial position, the second position obtains the initial position as final position Mobile route between the final position is the recommendation mobile route.
When the distance of first terminal and second terminal is closer with network-connectable, it is strong more to obtain better network signal Degree.For example, can more obtain better network signal intensity when closer apart from router.
In the present embodiment, it obtains and recommends mobile route, first terminal can be conducive to or second terminal is moved, from And make first terminal or second terminal that there is better network signal intensity, be conducive to pass between first terminal and second terminal Transmission rate when defeated voice messaging to be transmitted is higher, and then improves the quality of voice communication.
Optionally, in an alternative embodiment of the invention, described device further includes the second transmission module, the second transmission mould Block is used for:
Judge whether the transmission rate is lower than after Preset Transfer rate, if the transmission rate is higher than the default biography Defeated rate, judges whether the transmission rate is lower than the first transmission rate;
If the transmission rate is lower than first transmission rate, the voice to be transmitted is believed by GIA coding standard Breath is encoded and is transmitted;
If the transmission rate is higher than first transmission rate, judge whether the transmission rate is lower than the second transmission speed Rate;
If the transmission rate is lower than second transmission rate, the voice to be transmitted is believed by GSM coding standard Breath is encoded and is transmitted;
If the transmission rate is higher than two transmission rate, judge whether the transmission rate is lower than third transmission speed Rate;
If the transmission rate is lower than three transmission rate, the voice to be transmitted is believed by G.728 coding standard Breath is encoded and is transmitted;
If the transmission rate is higher than the third transmission rate, judge whether the transmission rate is lower than the 4th transmission speed Rate;
If the transmission rate is lower than the 4th transmission rate, by G.721 coding standard to the voice to be transmitted Information is encoded and is transmitted;
If the transmission rate is higher than the 4th transmission rate, judge whether the transmission rate is lower than the 5th transmission speed Rate;
If the transmission rate is lower than the 5th transmission rate, by G.722 coding standard to the voice to be transmitted Information is encoded and is transmitted;
If the transmission rate is higher than the 5th transmission rate, the voice to be transmitted is believed by MPE coding standard Breath is encoded and is transmitted.
Coding is the process that information is indicated with code, in digital encoding procedure, extract the sound of certain point frequency values and The energy value of the frequency simultaneously passes through digital quantization, and relative to the signal of nature, any digital audio encoding scheme is all damaged , the coding mode of highest fidelity is exactly pcm encoder at present, by pcm encoder can with infinite degree close to original sound, but It is PCM bulky, is unfavorable for transmitting, therefore during audio transmission, we can carry out the coding of other forms to audio, To compress to audio, the fluency of transmission is improved.
In the present embodiment, voice messaging is encoded using different encryption algorithms based on different coding standards.
It is encoded for example, being realized by SB-ADPCM algorithm based on G.722 coding standard, is realized by ADPCM algorithm It is encoded based on G.721 coding standard, is realized by LD-CELP algorithm and encoded based on G.728 coding standard, passed through RPE-LTP algorithm is realized to be encoded based on GSM coding standard, is realized by VSELPC algorithm and is carried out based on GIA coding standard Coding.
In the present embodiment, it under different transmission rate situations, is encoded using different coding standards, thus At different transmission rates in transmission process, more fully voice messaging can be retained as far as possible, improve the quality of sound.
Optionally, the first transmission rate is 13.2kbt/s, and second transmission rate is 16kbt/s, the third transmission Rate is 32kbt/s, and the 4th transmission rate is 64kbt/s, and the 5th transmission rate is 128kbt/s.
A kind of speech transmission device provided by the invention receives the voice communication that first terminal is sent by receiving module and passes Defeated instruction transmits instruction acquisition voice messaging to be transmitted according to the voice communication and receives the voice messaging to be transmitted Second terminal;Obtain transmission rate when module obtains the transmission voice messaging to be transmitted;Judgment module judges the transmission Whether rate is lower than Preset Transfer rate;If the transmission rate be lower than the Preset Transfer rate, identification module to it is described to Transmitting voice information carries out speech recognition, obtains speech recognition result, institute's speech recognition result includes the voice to be transmitted The corresponding text information of information;The text information that institute's speech recognition result is included by coding module carries out voice coding, obtains Target voice information;The target voice information is transmitted to the second terminal by the first transmission module.Due in transmission rate When lower than Preset Transfer rate, the text information for including by speech recognition result is encoded, and remains voice messaging to be transmitted Voice content, the information encoded when reducing voice coding realizes to carry out smooth call when being conducive to voice communication The purpose for improving the quality of voice communication, avoids occurring Caton or dropped calls when voice communication.
The above-mentioned integrated unit realized in the form of software function module, can store and computer-readable deposit at one In storage media.Above-mentioned software function module is stored in a storage medium, including some instructions are used so that a computer It is each that equipment (can be personal computer, server or the network equipment etc.) or processor (processor) execute the present invention The part steps of embodiment the method.
As shown in figure 3, the structure that Fig. 3 is the computer installation for the preferred embodiment that the present invention realizes voice transmission method is shown It is intended to.The computer installation include at least one sending device 31, at least one processor 32, at least one processor 33, At least one reception device 34 and at least one communication bus.Wherein, the communication bus is for realizing between these components Connection communication.
The computer installation be it is a kind of can according to the instruction for being previously set or store, it is automatic carry out numerical value calculate with/ Or the equipment of information processing, hardware include but is not limited to microprocessor, specific integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), number Word processing device (Digital Signal Processor, DSP), embedded device etc..The computer installation may also include net Network equipment and/or user equipment.Wherein, the network equipment includes but is not limited to single network server, multiple network services The server group of device composition or being made of a large amount of hosts or network server based on cloud computing (Cloud Computing) Cloud, wherein cloud computing is one kind of distributed computing, a super virtual computing being made of the computer of a group loose couplings Machine.
The computer installation may be, but not limited to, any one and can be set with user by keyboard, touch tablet or acoustic control The modes such as standby carry out the electronic product of human-computer interaction, for example, the terminals such as tablet computer, smart phone, monitoring device.
Network locating for the computer installation includes, but are not limited to internet, wide area network, Metropolitan Area Network (MAN), local area network, virtual Dedicated network (Virtual Private Network, VPN) etc..
Wherein, the reception device 34 and the sending device 31 can be wired sending port, or wirelessly set It is standby, for example including antenna assembly, for carrying out data communication with other equipment.
The memory 32 is for storing program code.The memory 32, which can be, does not have physical form in integrated circuit The circuit with store function, such as RAM (Random-Access Memory, random access memory), FIFO (First In First Out, push-up storage) etc..Alternatively, the memory 32 is also possible to the memory with physical form, such as Memory bar, TF card (Trans-flash Card), smart media card (smart media card), safe digital card (secure Digital card), storage facilities such as flash memory cards (flash card) etc..
The processor 33 may include one or more microprocessor, digital processing unit.The processor 33 is adjustable With the program code stored in memory 32 to execute relevant function.For example, modules described in Fig. 2 are stored in institute The program code in memory 32 is stated, and as performed by the processor 33, to realize a kind of voice transmission method.The processing Device 33 is also known as central processing unit (CPU, Central Processing Unit), is one piece of ultra-large integrated circuit, is fortune Calculate core (Core) and control core (Control Unit).
In several embodiments provided by the present invention, it should be understood that disclosed system, device and method can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the module It divides, only a kind of logical function partition, there may be another division manner in actual implementation.
The module as illustrated by the separation member may or may not be physically separated, aobvious as module The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.Some or all of the modules therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.
It, can also be in addition, each functional module in each embodiment of the present invention can integrate in one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of hardware adds software function module.
It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie In the case where without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims Variation is included in the present invention.Any attached associated diagram label in claim should not be considered as right involved in limitation to want It asks.Furthermore, it is to be understood that one word of " comprising " does not exclude other units or steps, odd number is not excluded for plural number.It is stated in system claims Multiple units or device can also be implemented through software or hardware by a unit or device.Second equal words are used to table Show title, and does not indicate any particular order.
Finally it should be noted that the above examples are only used to illustrate the technical scheme of the present invention and are not limiting, although reference Preferred embodiment describes the invention in detail, those skilled in the art should understand that, it can be to of the invention Technical solution is modified or equivalent replacement, without departing from the spirit and scope of the technical solution of the present invention.

Claims (10)

1. a kind of voice transmission method, which is characterized in that the described method includes:
It receives the voice communication that first terminal is sent and transmits instruction, instruction is transmitted according to the voice communication and obtains voice to be transmitted Information and the second terminal for receiving the voice messaging to be transmitted;
Obtain transmission rate when transmitting the voice messaging to be transmitted;
Judge whether the transmission rate is lower than Preset Transfer rate;
If the transmission rate is lower than the Preset Transfer rate, speech recognition is carried out to the voice messaging to be transmitted, is obtained Speech recognition result, institute's speech recognition result include the corresponding text information of the voice messaging to be transmitted;
The text information for including by institute's speech recognition result carries out voice coding, obtains target voice information;
The target voice information is transmitted to the second terminal.
2. the method as described in claim 1, which is characterized in that institute's speech recognition result further includes the voice letter to be transmitted The phonetic feature of breath, the phonetic feature include fundamental frequency;
The text information for including by institute's speech recognition result carries out voice coding
The phonetic feature of the corresponding text information of voice messaging to be transmitted and the voice messaging to be transmitted is subjected to language Sound coding.
3. method according to claim 2, which is characterized in that described to judge whether the transmission rate is lower than Preset Transfer speed After rate, the method also includes:
If the transmission rate is higher than the Preset Transfer rate, judge whether the transmission rate is lower than the first transmission rate;
If the transmission rate be lower than first transmission rate, by GIA coding standard to the voice messaging to be transmitted into Row is encoded and is transmitted;
If the transmission rate is higher than first transmission rate, judge whether the transmission rate is lower than the second transmission rate;
If the transmission rate be lower than second transmission rate, by GSM coding standard to the voice messaging to be transmitted into Row is encoded and is transmitted;
If the transmission rate is higher than two transmission rate, judge whether the transmission rate is lower than third transmission rate;
If the transmission rate be lower than three transmission rate, by G.728 coding standard to the voice messaging to be transmitted into Row is encoded and is transmitted;
If the transmission rate is higher than the third transmission rate, judge whether the transmission rate is lower than the 4th transmission rate;
If the transmission rate is lower than the 4th transmission rate, by G.721 coding standard to the voice messaging to be transmitted It is encoded and is transmitted;
If the transmission rate is higher than the 4th transmission rate, judge whether the transmission rate is lower than the 5th transmission rate;
If the transmission rate is lower than the 5th transmission rate, by G.722 coding standard to the voice messaging to be transmitted It is encoded and is transmitted;
If the transmission rate be higher than the 5th transmission rate, by MPE coding standard to the voice messaging to be transmitted into Row is encoded and is transmitted.
4. method as claimed in claim 3, which is characterized in that the Preset Transfer rate is 8kbit/s, first transmission Rate is 13.2kbt/s, second transmission rate be 16kbt/s, the third transmission rate be 32kbt/s, the described 4th Transmission rate is 64kbt/s, and the 5th transmission rate is 128kbt/s.
5. method according to any one of claims 1 to 4, which is characterized in that it is described to the voice messaging to be transmitted into Row speech recognition includes:
The feature for extracting the voice messaging to be transmitted obtains the feature vector for indicating the voice messaging to be transmitted;
Described eigenvector is input to predetermined acoustic model, obtains the corresponding phoneme information of described eigenvector;
The phoneme information is input to preset language model, obtains the element that the phoneme information includes, the element includes The words sequence being made of word or word;
The words sequence is decoded based on pre-set dictionary, obtains the corresponding text information of the voice messaging to be transmitted.
6. method according to any one of claims 1 to 4, which is characterized in that the method also includes:
If the transmission rate is lower than the Preset Transfer rate, Xiang Suoshu first terminal or the second terminal send enhancing net The suggestion message of network signal strength, alternatively, sending to the second terminal, there are the reminder messages of voice transfer.
7. method as claimed in claim 6, which is characterized in that the suggestion message includes recommending connection network or recommending to move Route.
8. a kind of speech transmission device, which is characterized in that described device includes:
Receiving module is transmitted according to the voice communication and is instructed for receiving the voice communication transmission instruction of first terminal transmission It obtains voice messaging to be transmitted and receives the second terminal of the voice messaging to be transmitted;
Module is obtained, the transmission rate when voice messaging to be transmitted is transmitted for obtaining;
Judgment module, for judging whether the transmission rate is lower than Preset Transfer rate;
Identification module carries out the voice messaging to be transmitted if being lower than the Preset Transfer rate for the transmission rate Speech recognition, obtains speech recognition result, and institute's speech recognition result includes the corresponding text letter of the voice messaging to be transmitted Breath;
Coding module, the text information for including by institute's speech recognition result carry out voice coding, obtain target language message Breath;
First transmission module, for the target voice information to be transmitted to the second terminal.
9. a kind of computer installation, which is characterized in that the computer installation includes memory and processor, and the memory is used In storing at least one instruction, the processor is for executing at least one described instruction to realize as appointed in claim 1 to 7 Voice transmission method described in one.
10. a kind of computer readable storage medium, is stored thereon with computer instruction, it is characterised in that: the computer instruction The voice transmission method as described in any one of claims 1 to 7 is realized when being executed by processor.
CN201910459488.7A 2019-05-29 2019-05-29 Voice transmission method, voice transmission device, computer device and storage medium Active CN110364170B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910459488.7A CN110364170B (en) 2019-05-29 2019-05-29 Voice transmission method, voice transmission device, computer device and storage medium
PCT/CN2019/118022 WO2020238058A1 (en) 2019-05-29 2019-11-13 Voice transmission method and apparatus, computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910459488.7A CN110364170B (en) 2019-05-29 2019-05-29 Voice transmission method, voice transmission device, computer device and storage medium

Publications (2)

Publication Number Publication Date
CN110364170A true CN110364170A (en) 2019-10-22
CN110364170B CN110364170B (en) 2024-01-30

Family

ID=68215394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910459488.7A Active CN110364170B (en) 2019-05-29 2019-05-29 Voice transmission method, voice transmission device, computer device and storage medium

Country Status (2)

Country Link
CN (1) CN110364170B (en)
WO (1) WO2020238058A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111199747A (en) * 2020-03-05 2020-05-26 北京花兰德科技咨询服务有限公司 Artificial intelligence communication system and communication method
CN111245868A (en) * 2020-03-10 2020-06-05 诺领科技(南京)有限公司 Narrowband Internet of things voice message communication method and system
CN111785293A (en) * 2020-06-04 2020-10-16 杭州海康威视***技术有限公司 Voice transmission method, device and equipment and storage medium
WO2020238058A1 (en) * 2019-05-29 2020-12-03 平安科技(深圳)有限公司 Voice transmission method and apparatus, computer device and storage medium
CN112202803A (en) * 2020-10-10 2021-01-08 北京字节跳动网络技术有限公司 Audio processing method, device, terminal and storage medium
CN112822297A (en) * 2021-04-01 2021-05-18 深圳市顺易通信息科技有限公司 Parking lot service data transmission method and related equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08116385A (en) * 1994-10-14 1996-05-07 Hitachi Ltd Individual information terminal equipment and voice response system
CN103714823A (en) * 2013-12-19 2014-04-09 同济大学 Integrated speech coding-based adaptive underwater communication method
WO2016119560A1 (en) * 2015-01-29 2016-08-04 ***通信集团公司 Self-adaptive audio transmission method and device
CN106850615A (en) * 2017-01-24 2017-06-13 华为技术有限公司 A kind of method of code rate control, relevant apparatus and system
CN107066477A (en) * 2016-12-13 2017-08-18 合网络技术(北京)有限公司 A kind of method and device of intelligent recommendation video
CN107770387A (en) * 2017-10-31 2018-03-06 珠海市魅族科技有限公司 Communication control method, device, computer installation and computer-readable recording medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080162150A1 (en) * 2006-12-28 2008-07-03 Vianix Delaware, Llc System and Method for a High Performance Audio Codec
CN102790997B (en) * 2011-05-19 2017-05-10 中兴通讯股份有限公司 Method and device for transmission of adaptive multi-rate (AMR) voice data
CN109712631B (en) * 2019-03-28 2019-06-28 南昌黑鲨科技有限公司 Audio data transfer control method, device, system and readable storage medium storing program for executing
CN110364170B (en) * 2019-05-29 2024-01-30 平安科技(深圳)有限公司 Voice transmission method, voice transmission device, computer device and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08116385A (en) * 1994-10-14 1996-05-07 Hitachi Ltd Individual information terminal equipment and voice response system
CN103714823A (en) * 2013-12-19 2014-04-09 同济大学 Integrated speech coding-based adaptive underwater communication method
WO2016119560A1 (en) * 2015-01-29 2016-08-04 ***通信集团公司 Self-adaptive audio transmission method and device
CN107066477A (en) * 2016-12-13 2017-08-18 合网络技术(北京)有限公司 A kind of method and device of intelligent recommendation video
CN106850615A (en) * 2017-01-24 2017-06-13 华为技术有限公司 A kind of method of code rate control, relevant apparatus and system
CN107770387A (en) * 2017-10-31 2018-03-06 珠海市魅族科技有限公司 Communication control method, device, computer installation and computer-readable recording medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020238058A1 (en) * 2019-05-29 2020-12-03 平安科技(深圳)有限公司 Voice transmission method and apparatus, computer device and storage medium
CN111199747A (en) * 2020-03-05 2020-05-26 北京花兰德科技咨询服务有限公司 Artificial intelligence communication system and communication method
CN111245868A (en) * 2020-03-10 2020-06-05 诺领科技(南京)有限公司 Narrowband Internet of things voice message communication method and system
CN111245868B (en) * 2020-03-10 2021-04-13 诺领科技(南京)有限公司 Narrowband Internet of things voice message communication method and system
CN111785293A (en) * 2020-06-04 2020-10-16 杭州海康威视***技术有限公司 Voice transmission method, device and equipment and storage medium
CN111785293B (en) * 2020-06-04 2023-04-25 杭州海康威视***技术有限公司 Voice transmission method, device and equipment and storage medium
CN112202803A (en) * 2020-10-10 2021-01-08 北京字节跳动网络技术有限公司 Audio processing method, device, terminal and storage medium
CN112822297A (en) * 2021-04-01 2021-05-18 深圳市顺易通信息科技有限公司 Parking lot service data transmission method and related equipment

Also Published As

Publication number Publication date
CN110364170B (en) 2024-01-30
WO2020238058A1 (en) 2020-12-03

Similar Documents

Publication Publication Date Title
CN110364170A (en) Voice transmission method, device, computer installation and storage medium
CN110782882B (en) Voice recognition method and device, electronic equipment and storage medium
CN103853703B (en) A kind of information processing method and electronic equipment
CN1983909B (en) Method and device for hiding throw-away frame
CN104781879B (en) Method and apparatus for being encoded to audio signal
CN106504742B (en) Synthesize transmission method, cloud server and the terminal device of voice
CN109599092B (en) Audio synthesis method and device
CN106409283A (en) Audio frequency-based man-machine mixed interaction system and method
CN109473104B (en) Voice recognition network delay optimization method and device
EP4012705A1 (en) Speech transmission method, system, and apparatus, computer readable storage medium, and device
CN113724683B (en) Audio generation method, computer device and computer readable storage medium
CN103915097B (en) Voice signal processing method, device and system
CN114338623B (en) Audio processing method, device, equipment and medium
CN110119514A (en) The instant translation method of information, device and system
CN107731232A (en) Voice translation method and device
CN110797004B (en) Data transmission method and device
CN114333862B (en) Audio encoding method, decoding method, device, equipment, storage medium and product
CN112712793A (en) ASR (error correction) method based on pre-training model under voice interaction and related equipment
RU2005127871A (en) QUANTIZING CLASSES FOR DISTRIBUTED SPEECH RECOGNITION
JP4437011B2 (en) Speech encoding device
CN111862967B (en) Voice recognition method and device, electronic equipment and storage medium
CN115713939A (en) Voice recognition method and device and electronic equipment
CN113658581B (en) Acoustic model training method, acoustic model processing method, acoustic model training device, acoustic model processing equipment and storage medium
CN114842857A (en) Voice processing method, device, system, equipment and storage medium
CN116935851A (en) Method and device for voice conversion, voice conversion system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant