CN106101094A - Audio-frequency processing method, sending ending equipment, receiving device and audio frequency processing system - Google Patents
Audio-frequency processing method, sending ending equipment, receiving device and audio frequency processing system Download PDFInfo
- Publication number
- CN106101094A CN106101094A CN201610404998.0A CN201610404998A CN106101094A CN 106101094 A CN106101094 A CN 106101094A CN 201610404998 A CN201610404998 A CN 201610404998A CN 106101094 A CN106101094 A CN 106101094A
- Authority
- CN
- China
- Prior art keywords
- sentence
- voice
- voice sentence
- audio
- labelling
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/75—Media network packet handling
- H04L65/762—Media network packet handling at the source
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/75—Media network packet handling
- H04L65/764—Media network packet handling at the destination
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Telephonic Communication Services (AREA)
Abstract
Disclosure one audio-frequency processing method, sending ending equipment, receiving device and audio frequency processing system, described method is by when sending ending equipment inputting audio, the voice sentence of typing is marked process, make the audio frequency to be transmitted generated comprises the label information for labelling complete speech sentence, thus follow-up receive after the voice data of sending ending equipment at receiving device, complete voice sentence can be therefrom extracted according to its label information comprised, the continuous broadcasting of complete speech sentence can be carried out on this basis, contribute to user quick, understand the audio-frequency information answered exactly, thus apply the Internet phone-calling efficiency that the application can be effectively improved under complex network environment.
Description
Technical field
The invention belongs to audio signal processing technique field, particularly relate to a kind of audio-frequency processing method, sending ending equipment, receiving terminal
Equipment and audio frequency processing system.
Background technology
At present, network tool is utilized to carry out the application of voice call more and more extensively, as carried out voice by network tool
Meeting etc..
In this type of application scenarios, the speech audio of user often because of the network environment of complex, and by various not
Determine the impact of factor (such as the power of wireless network signal, fire wall, systematic function etc.), thus there will be network audio transmission
Incoherent phenomenon.The discontinuous transmission of network audio, eventually results in the voice sentence (real-time play) that user answers and absolutely continues
Continuous, i.e. cause the discontinuous broadcasting of complete speech sentence, such as user, after hearing the first half sentence of first voice, postpones one section
Time hears the later half sentence of first voice and the first half sentence of second voice, hears the after continuing to postpone a period of time
Later half sentence of two voices etc., can have a strong impact on the transmission efficiency of user, thus understand quickly and accurately by user and answered
Audio-frequency information bring difficulty.
Summary of the invention
In view of this, it is an object of the invention to provide a kind of audio-frequency processing method, sending ending equipment, receiving device and
Audio frequency processing system, it is intended to improve in Internet phone-calling, because of the incoherence of network audio transmission the transmission efficiency that causes low this
One problem.
To this end, the present invention is disclosed directly below technical scheme:
A kind of audio-frequency processing method, is applied to sending ending equipment, and described method includes:
The voice sentence labelling presetting the current speech sentence of real-time typing processes, and obtains comprising corresponding voice sentence labelling
The target audio of information, described voice sentence label information is used for the voice sentence that labelling one is complete;
Send described target audio to receiving device.
Said method, it is preferred that described voice sentence label information includes voice sentence start mark and voice sentence end mark,
The most described voice sentence labelling presetting the current speech sentence of real-time typing processes, and obtains comprising corresponding voice sentence labelling letter
The target audio of breath, including:
Initiate Rule of judgment based on default voice sentence, rise for the voice sentence needed for the current speech sentence generation of real-time typing
Beginning labelling;Wherein, described voice sentence start mark is for the original position of current speech sentence described in labelling;
Terminating Rule of judgment based on default voice sentence, the voice sentence needed for generating for described current speech sentence terminates mark
Note;Wherein, described voice sentence end mark is for the end position of current speech sentence described in labelling.
Said method, it is preferred that the described target audio of described transmission includes to receiving device:
Described target audio is encapsulated as the packets of audio data of corresponding number, and sends each described packets of audio data successively
To described receiving device.
A kind of audio-frequency processing method, is applied to receiving device, and described method includes:
Receiving target audio, described target audio comprises voice sentence label information, and described voice sentence label information is used for marking
Remember complete voice sentence;
Based on described voice sentence label information, from described target audio, extract complete voice sentence;
Play described complete voice sentence.
Said method, it is preferred that described reception target audio includes:
Receive each packets of audio data from sending ending equipment successively.
Said method, it is preferred that described voice sentence label information includes voice sentence start mark and voice sentence end mark,
Based on described voice sentence label information described in then, from described target audio, extract complete voice sentence, including:
Based on the voice sentence start mark in received packets of audio data, the original position of location voice sentence;
Based on voice sentence end mark with described voice sentence start mark adjacent pair in received packets of audio data, fixed
The end position of position voice sentence;
According to described original position and described end position, the audio fragment in splicing respective audio packet, obtain institute
State complete voice sentence.
A kind of sending ending equipment, including:
Labelling processing module, processes for the voice sentence labelling presetting the current speech sentence of real-time typing, obtains
Comprising the target audio of corresponding voice sentence label information, described voice sentence label information is used for the voice sentence that labelling one is complete;
Sending module, is used for sending described target audio to receiving device.
Above-mentioned sending ending equipment, it is preferred that described voice sentence label information includes voice sentence start mark and voice sentence knot
Bundle labelling, the most described labelling processing module includes:
Start mark signal generating unit, for initiateing Rule of judgment based on default voice sentence, for the current language of real-time typing
Voice sentence start mark needed for the generation of sound sentence;Wherein, described voice sentence start mark is for current speech sentence described in labelling
Original position;
End mark signal generating unit, for terminating Rule of judgment based on default voice sentence, raw for described current speech sentence
Voice sentence end mark needed for one-tenth;Wherein, described voice sentence end mark is for the stop bits of current speech sentence described in labelling
Put.
Above-mentioned sending ending equipment, it is preferred that described sending module includes:
Data encapsulation and transmitting element, for described target audio is encapsulated as the packets of audio data of corresponding number, and depend on
Each described packets of audio data of secondary transmission is to described receiving device.
A kind of receiving device, including:
Receiver module, is used for receiving target audio, and described target audio comprises voice sentence label information, described voice sentence mark
Note information is used for the voice sentence that labelling is complete;
Extraction module, for based on described voice sentence label information, extracts complete voice from described target audio
Sentence;
Playing module, for playing described complete voice sentence.
Above-mentioned receiving device, it is preferred that described receiver module includes:
Packet receives unit, for receiving each packets of audio data from sending ending equipment successively.
Above-mentioned receiving device, it is preferred that described voice sentence label information includes voice sentence start mark and voice sentence knot
Bundle labelling, the most described extraction module includes:
First positioning unit, for based on the voice sentence start mark in received packets of audio data, positioning voice sentence
Original position;
Second positioning unit, for based in received packets of audio data with described voice sentence start mark adjacent pair
Voice sentence end mark, the end position of location voice sentence;
Concatenation unit, for according to described original position and described end position, the sound in splicing respective audio packet
Frequently fragment, obtains described complete voice sentence.
A kind of audio frequency processing system, including sending ending equipment as above and receiving device.
From above scheme, audio-frequency processing method disclosed in the present application, by when sending ending equipment inputting audio, right
The voice sentence of typing is marked process so that comprise the labelling for labelling complete speech sentence in the audio frequency to be transmitted generated
Information, thus follow-up receive after the voice data of sending ending equipment at receiving device, the labelling that can comprise according to it
Information therefrom extracts complete voice sentence, can carry out the continuous broadcasting of complete speech sentence on this basis, contribute to user fast
Speed, the audio-frequency information that understanding is answered exactly, thus apply the application can be effectively improved the network under complex network environment and lead to
Words efficiency.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
In having technology to describe, the required accompanying drawing used is briefly described, it should be apparent that, the accompanying drawing in describing below is only this
Inventive embodiment, for those of ordinary skill in the art, on the premise of not paying creative work, it is also possible to according to
The accompanying drawing provided obtains other accompanying drawing.
Fig. 1 is the flow chart of a kind of audio-frequency processing method embodiment one that the application provides;
Fig. 2 is the flow chart of a kind of audio-frequency processing method embodiment two that the application provides;
Fig. 3 is the structural representation of a kind of sending ending equipment embodiment three that the application provides;
Fig. 4 is the structural representation of a kind of sending ending equipment embodiment four that the application provides.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Describe, it is clear that described embodiment is only a part of embodiment of the present invention rather than whole embodiments wholely.Based on
Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under not making creative work premise
Embodiment, broadly falls into the scope of protection of the invention.
Embodiment one
With reference to the flow chart of a kind of audio-frequency processing method embodiment one that Fig. 1, Fig. 1 provide for the application, the present embodiment
Method is applied to sending ending equipment, such as, be particularly applicable to carry out the transmitting terminal call instrument (call pair of voice-over-net call
Can transmitting terminal, receiving terminal each other) in, as it is shown in figure 1, described audio-frequency processing method may comprise steps of:
S101: the voice sentence labelling presetting the current speech sentence of real-time typing processes, and obtains comprising corresponding voice
The target audio of sentence label information, described voice sentence label information is used for the voice sentence that labelling one is complete.
Described voice sentence label information includes voice sentence start mark and voice sentence end mark.
Wherein, specifically can initiate Rule of judgment based on default voice sentence, voice sentence terminates Rule of judgment, set at transmitting terminal
Standby, terminate to detect logic as described transmitting terminal call instrument adds corresponding voice sentence home detection logic and voice sentence, with
This realizes every voice of institute's typing carrying out home detection and terminating detecting.At voice sentence home detection and the base of end detecting
On plinth, can be that voice sentence generates, adds a start mark, knot respectively at the voice sentence original position detected, end position
Bundle labelling.
In actual call scene, people typically can state a complete speech sentence the most continuously, and between different phonetic sentence
Then would generally slightly pause, i.e. between different phonetic sentence, typically can there is certain time-delay, thus, when inputting audio, same language
The audio frequency in sound sentence seriality on time of origin is higher, and continuous on time of origin between the audio frequency of different phonetic sentence
Property then can decrease, based on this feature, and specifically can be using the discontinuous generation/typing feature of voice data as voice sentence
Initial detecting foundation, if i.e. having certain delay, the generation of current speech sentence between current speech sentence and a upper voice sentence
The finish time away from a upper voice sentence in the moment (can be as the criterion to collect the moment of audio frequency) is not less than scheduled duration, then it is believed that
Current speech sentence starts;Accordingly, during inputting audio, if detecting audio frequency typing stopping beyond scheduled duration occurs
, then it is believed that current speech sentence terminates when occurring and pausing.
On the basis of carrying out voice sentence home detection and terminating detecting, can be by straight to voice sentence start mark and end mark
Connect the corresponding position being added on institute's inputting audio data, such as can be directly at the initial mark of voice data head interpolation of voice sentence
Note, adds end mark at its voice data afterbody;Can also joining day axle in voice call instrument in advance, and in the time
On axle, corresponding temporal information safeguarded in every voice for typing, thus, carry out voice sentence home detection and terminate detecting
On the basis of, a pair voice sentence start mark and end mark can be added for each voice sentence on a timeline, every on time shaft
To start mark, end mark, indirectly can reflect corresponding according to the corresponding relation of the voice data of institute's typing Yu temporal information
The original position of voice sentence, end position.
<start mark, the end mark>of each adjacent pair, to the voice sentence complete for labelling one, follow-up passes through
In voice data, extract the voice content between start mark and the end mark of described adjacent pair, obtain one complete
Voice sentence.
S102: send described target audio to receiving device.
The voice data of institute's typing is being carried out after voice sentence labelling processes, can continue the audio frequency carrying label information
Data carry out data encapsulation process, and after completing encapsulation, send each packets of audio data of encapsulation gained to network successively
In receiving device.When voice data being carried out packing encapsulation, specifically can be according to the data volume size of every speech audio
Determining its corresponding packet number, wherein, after being packaged processing, every speech audio may corresponding one or more sounds
Frequently packet.
Embodiment two
With reference to the flow chart of a kind of audio-frequency processing method embodiment two that Fig. 2, Fig. 2 provide for the application, the present embodiment
Method is applied to receiving device, such as, be particularly applicable to carry out in the receiving terminal call instrument of voice-over-net call, such as Fig. 2
Shown in, described audio-frequency processing method may comprise steps of:
S201: receive target audio, described target audio comprises voice sentence label information, and described voice sentence label information is used
In the voice sentence that labelling is complete.
Specifically, receiving terminal call instrument can be by receiving each voice data of sending ending equipment in automatic network successively
Bag, it is achieved required voice data is received.The voice data received carries the start mark of voice sentence, end
Labelling.
S202: based on described voice sentence label information, extract complete voice sentence from described target audio.
Receiving terminal call instrument is being sequentially received after each packets of audio data of sending ending equipment, to each audio frequency
Packet carries out unpacking, resolving, and parses voice sentence start mark therein and voice sentence end mark, and according to the time of reception
Sequencing adjacent start mark, end mark are matched, afterwards, can be based on start mark, the joining of end mark
To situation, the original position of location complete speech sentence and end position, thus on this basis, can according to described original position and
Described end position, the audio content extracting, splicing in respective audio packet, obtain complete voice sentence.
S203: play described complete voice sentence.
Based on voice sentence start mark and end mark, extract from the voice data received complete speech sentence it
After, the complete speech sentence that receiving terminal call instrument can continue being extracted is play continuously.
Wherein, during receiving and parsing through packets of audio data, if the pending packet received (1 or many
Individual) in only exist voice sentence start mark, and do not find matched voice sentence end mark, then need to continue waiting for,
Until there is voice sentence end mark in the packet received, can joining according to described voice sentence start mark and end mark
To situation, carry out extraction and the broadcasting of complete speech sentence.
From above scheme, audio-frequency processing method disclosed in the present application, by when sending ending equipment inputting audio, right
The voice sentence of typing is marked process so that comprise the labelling for labelling complete speech sentence in the audio frequency to be transmitted generated
Information, thus follow-up receive after the voice data of sending ending equipment at receiving device, the labelling that can comprise according to it
Information therefrom extracts complete voice sentence, can carry out the continuous broadcasting of complete speech sentence on this basis, contribute to user fast
Speed, the audio-frequency information that understanding is answered exactly, thus apply the application can be effectively improved the network under complex network environment and lead to
Words efficiency.
Embodiment three
With reference to the structural representation of a kind of sending ending equipment embodiment three that Fig. 3, Fig. 3 provide for the application, described transmission
End equipment specifically could be for carrying out voice-over-net call transmitting terminal call instrument (both call sides can each other transmitting terminal, connect
Receiving end) in, as it is shown on figure 3, described sending ending equipment can include labelling processing module 301 and sending module 302.
Labelling processing module 301, processes for the voice sentence labelling presetting the current speech sentence of real-time typing,
To comprising the target audio of corresponding voice sentence label information, described voice sentence label information is for the complete voice sentence of labelling one.
Described voice sentence label information includes voice sentence start mark and voice sentence end mark, and the most described labelling processes mould
Block 301 includes start mark signal generating unit and end mark signal generating unit.
Start mark signal generating unit, for initiateing Rule of judgment based on default voice sentence, for the current language of real-time typing
Voice sentence start mark needed for the generation of sound sentence;Wherein, described voice sentence start mark is for current speech sentence described in labelling
Original position;
End mark signal generating unit, for terminating Rule of judgment based on default voice sentence, raw for described current speech sentence
Voice sentence end mark needed for one-tenth;Wherein, described voice sentence end mark is for the stop bits of current speech sentence described in labelling
Put.
Wherein, specifically can initiate Rule of judgment based on default voice sentence, voice sentence terminates Rule of judgment, set at transmitting terminal
Standby, terminate to detect logic as described transmitting terminal call instrument adds corresponding voice sentence home detection logic and voice sentence, with
This realizes every voice of institute's typing carrying out home detection and terminating detecting.At voice sentence home detection and the base of end detecting
On plinth, can be that voice sentence generates, adds a start mark, knot respectively at the voice sentence original position detected, end position
Bundle labelling.
In actual call scene, people typically can state a complete speech sentence the most continuously, and between different phonetic sentence
Then would generally slightly pause, i.e. between different phonetic sentence, typically can there is certain time-delay, thus, when inputting audio, same language
The audio frequency in sound sentence seriality on time of origin is higher, and continuous on time of origin between the audio frequency of different phonetic sentence
Property then can decrease, based on this feature, and specifically can be using the discontinuous generation/typing feature of voice data as voice sentence
Initial detecting foundation, if i.e. having certain delay, the generation of current speech sentence between current speech sentence and a upper voice sentence
The finish time away from a upper voice sentence in the moment (can be as the criterion to collect the moment of audio frequency) is not less than scheduled duration, then it is believed that
Current speech sentence starts;Accordingly, during inputting audio, if detecting audio frequency typing stopping beyond scheduled duration occurs
, then it is believed that current speech sentence terminates when occurring and pausing.
On the basis of carrying out voice sentence home detection and terminating detecting, can be by straight to voice sentence start mark and end mark
Connect the corresponding position being added on institute's inputting audio data, such as can be directly at the initial mark of voice data head interpolation of voice sentence
Note, adds end mark at its voice data afterbody;Can also joining day axle in voice call instrument in advance, and in the time
On axle, corresponding temporal information safeguarded in every voice for typing, thus, carry out voice sentence home detection and terminate detecting
On the basis of, a pair voice sentence start mark and end mark can be added for each voice sentence on a timeline, every on time shaft
To start mark, end mark, indirectly can reflect corresponding according to the corresponding relation of the voice data of institute's typing Yu temporal information
The original position of voice sentence, end position.
<start mark, the end mark>of each adjacent pair, to the voice sentence complete for labelling one, follow-up passes through
In voice data, extract the voice content between start mark and the end mark of described adjacent pair, obtain one complete
Voice sentence.
Sending module 302, is used for sending described target audio to receiving device.
Described sending module 302 includes data encapsulation and transmitting element, for described target audio is encapsulated as corresponding
The packets of audio data of number, and send each described packets of audio data successively to described receiving device.
The voice data of institute's typing is being carried out after voice sentence labelling processes, can continue the audio frequency carrying label information
Data carry out data encapsulation process, and after completing encapsulation, send each packets of audio data of encapsulation gained to network successively
In receiving device.When voice data being carried out packing encapsulation, specifically can be according to the data volume size of every speech audio
Determining its corresponding packet number, wherein, after being packaged processing, every speech audio may corresponding one or more sounds
Frequently packet.
Embodiment four
With reference to the structural representation of a kind of receiving device embodiment four that Fig. 4, Fig. 4 provide for the application, described reception
End equipment specifically could be for carrying out the receiving terminal call instrument of voice-over-net call, as shown in Figure 4, described receiving device
Receiver module 401, extraction module 402 and playing module 403 can be included.
Receiver module 401, is used for receiving target audio, and described target audio comprises voice sentence label information, described voice
Sentence label information is used for the voice sentence that labelling is complete.
Described receiver module 401 includes that packet receives unit, for receiving each sound from sending ending equipment successively
Frequently packet.
Specifically, receiving terminal call instrument can be by receiving each voice data of sending ending equipment in automatic network successively
Bag, it is achieved required voice data is received.The voice data received carries the start mark of voice sentence, end
Labelling.
Extraction module 402, for based on described voice sentence label information, extracts complete language from described target audio
Sound sentence.
Described extraction module 402 includes the first positioning unit, the second positioning unit and concatenation unit.
First positioning unit, for based on the voice sentence start mark in received packets of audio data, positioning voice sentence
Original position;
Second positioning unit, for based in received packets of audio data with described voice sentence start mark adjacent pair
Voice sentence end mark, the end position of location voice sentence;
Concatenation unit, for according to described original position and described end position, the sound in splicing respective audio packet
Frequently fragment, obtains described complete voice sentence.
Receiving terminal call instrument is being sequentially received after each packets of audio data of sending ending equipment, to each audio frequency
Packet carries out unpacking, resolving, and parses voice sentence start mark therein and voice sentence end mark, and according to the time of reception
Sequencing adjacent start mark, end mark are matched, afterwards, can be based on start mark, the joining of end mark
To situation, the original position of location complete speech sentence and end position, thus on this basis, can according to described original position and
Described end position, the audio content extracting, splicing in respective audio packet, obtain complete voice sentence.
Playing module 403, for playing described complete voice sentence.
Based on voice sentence start mark and end mark, extract from the voice data received complete speech sentence it
After, the complete speech sentence that receiving terminal call instrument can continue being extracted is play continuously.
Wherein, during receiving and parsing through packets of audio data, if the pending packet received (1 or many
Individual) in only exist voice sentence start mark, and do not find matched voice sentence end mark, then need to continue waiting for,
Until there is voice sentence end mark in the packet received, can joining according to described voice sentence start mark and end mark
To situation, carry out extraction and the broadcasting of complete speech sentence.
From above scheme, the voice sentence of typing, by when sending ending equipment inputting audio, is marked by the application
Note processes so that comprise the label information for labelling complete speech sentence in the audio frequency to be transmitted generated, thus follow-up is connecing
Receiving end equipment receives after the voice data of sending ending equipment, therefrom can extract completely according to its label information comprised
Voice sentence, the continuous broadcasting of complete speech sentence can be carried out on this basis, contribute to user and understand quickly and accurately and answered
Audio-frequency information, thus apply the Internet phone-calling efficiency that the application can be effectively improved under complex network environment.
Embodiment five
The open a kind of audio frequency processing system of the present embodiment five, described system includes that transmitting terminal as disclosed in embodiment three sets
Standby, and the receiving device as disclosed in embodiment four.
Described sending ending equipment, receiving device can be namely for the transmitting terminal call works carrying out voice-over-net call
Tool, receiving terminal call instrument, in actual voice-over-net call scene, the transmission of both call sides voice data the most each other
End, receiving terminal, therefore, generally, the call instrument that both call sides is used, i.e. as described transmission in communication process
End equipment, again as described receiving device.
From above scheme, the voice sentence of typing, by when sending ending equipment inputting audio, is marked by the application
Note processes so that comprise the label information for labelling complete speech sentence in the audio frequency to be transmitted generated, thus follow-up is connecing
Receiving end equipment receives after the voice data of sending ending equipment, therefrom can extract completely according to its label information comprised
Voice sentence, the continuous broadcasting of complete speech sentence can be carried out on this basis, contribute to user and understand quickly and accurately and answered
Audio-frequency information, thus apply the Internet phone-calling efficiency that the application can be effectively improved under complex network environment.
It should be noted that each embodiment in this specification all uses the mode gone forward one by one to describe, each embodiment weight
Point explanation is all the difference with other embodiments, and between each embodiment, identical similar part sees mutually.
For convenience of description, it is divided into various module or unit to be respectively described with function when describing system above or device.
Certainly, the function of each unit can be realized in same or multiple softwares and/or hardware when implementing the application.
As seen through the above description of the embodiments, those skilled in the art it can be understood that to the application can
The mode adding required general hardware platform by software realizes.Based on such understanding, the technical scheme essence of the application
On the part that in other words prior art contributed can embody with the form of software product, this computer software product
Can be stored in storage medium, such as ROM/RAM, magnetic disc, CD etc., including some instructions with so that a computer equipment
(can be personal computer, server, or the network equipment etc.) performs some of each embodiment of the application or embodiment
Method described in part.
Finally, in addition it is also necessary to explanation, in this article, the relational terms of such as first, second, third and fourth or the like
It is used merely to separate an entity or operation with another entity or operating space, and not necessarily requires or imply these
Relation or the order of any this reality is there is between entity or operation.And, term " includes ", " comprising " or it is any
Other variants are intended to comprising of nonexcludability, so that include the process of a series of key element, method, article or equipment
Not only include those key elements, but also include other key elements being not expressly set out, or also include for this process, side
The key element that method, article or equipment are intrinsic.In the case of there is no more restriction, statement " including ... " limit
Key element, it is not excluded that there is also other identical element in including the process of described key element, method, article or equipment.
The above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For Yuan, under the premise without departing from the principles of the invention, it is also possible to make some improvements and modifications, these improvements and modifications also should
It is considered as protection scope of the present invention.
Claims (13)
1. an audio-frequency processing method, it is characterised in that be applied to sending ending equipment, described method includes:
The voice sentence labelling presetting the current speech sentence of real-time typing processes, and obtains comprising corresponding voice sentence label information
Target audio, described voice sentence label information is for the complete voice sentence of labelling one;
Send described target audio to receiving device.
Method the most according to claim 1, it is characterised in that described voice sentence label information includes voice sentence start mark
With voice sentence end mark, the most described voice sentence labelling presetting the current speech sentence of real-time typing processes, and is wrapped
Containing the target audio of corresponding voice sentence label information, including:
Rule of judgment is initiateed, for the initial mark of the voice sentence needed for the current speech sentence generation of real-time typing based on default voice sentence
Note;Wherein, described voice sentence start mark is for the original position of current speech sentence described in labelling;
Rule of judgment is terminated, the voice sentence end mark needed for generating for described current speech sentence based on default voice sentence;Its
In, described voice sentence end mark is for the end position of current speech sentence described in labelling.
Method the most according to claim 1, it is characterised in that the described target audio of described transmission is to receiving device bag
Include:
Described target audio is encapsulated as the packets of audio data of corresponding number, and sends each described packets of audio data successively to institute
State receiving device.
4. an audio-frequency processing method, it is characterised in that be applied to receiving device, described method includes:
Receiving target audio, described target audio comprises voice sentence label information, and described voice sentence label information is complete for labelling
Whole voice sentence;
Based on described voice sentence label information, from described target audio, extract complete voice sentence;
Play described complete voice sentence.
Method the most according to claim 4, it is characterised in that described reception target audio includes:
Receive each packets of audio data from sending ending equipment successively.
Method the most according to claim 5, it is characterised in that described voice sentence label information includes voice sentence start mark
With voice sentence end mark, then described based on described voice sentence label information, from described target audio, extract complete language
Sound sentence, including:
Based on the voice sentence start mark in received packets of audio data, the original position of location voice sentence;
Based on voice sentence end mark with described voice sentence start mark adjacent pair in received packets of audio data, position language
The end position of sound sentence;
According to described original position and described end position, the audio fragment in splicing respective audio packet, obtain described complete
Whole voice sentence.
7. a sending ending equipment, it is characterised in that including:
Labelling processing module, processes for the voice sentence labelling presetting the current speech sentence of real-time typing, is comprised
The target audio of corresponding voice sentence label information, described voice sentence label information is used for the voice sentence that labelling one is complete;
Sending module, is used for sending described target audio to receiving device.
Sending ending equipment the most according to claim 7, it is characterised in that described voice sentence label information includes that voice sentence rises
Beginning labelling and voice sentence end mark, the most described labelling processing module includes:
Start mark signal generating unit, for initiateing Rule of judgment based on default voice sentence, for the current speech sentence of real-time typing
Voice sentence start mark needed for generation;Wherein, described voice sentence start mark initiateing for current speech sentence described in labelling
Position;
End mark signal generating unit, for terminating Rule of judgment based on default voice sentence, generates institute for described current speech sentence
The voice sentence end mark needed;Wherein, described voice sentence end mark is for the end position of current speech sentence described in labelling.
Sending ending equipment the most according to claim 7, it is characterised in that described sending module includes:
Data encapsulation and transmitting element, for described target audio is encapsulated as the packets of audio data of corresponding number, and send out successively
Give each described packets of audio data to described receiving device.
10. a receiving device, it is characterised in that including:
Receiver module, is used for receiving target audio, and described target audio comprises voice sentence label information, and described voice sentence labelling is believed
Breath is used for the voice sentence that labelling is complete;
Extraction module, for based on described voice sentence label information, extracts complete voice sentence from described target audio;
Playing module, for playing described complete voice sentence.
11. receiving devices according to claim 10, it is characterised in that described receiver module includes:
Packet receives unit, for receiving each packets of audio data from sending ending equipment successively.
12. receiving devices according to claim 11, it is characterised in that described voice sentence label information includes voice sentence
Start mark and voice sentence end mark, the most described extraction module includes:
First positioning unit, for based on the voice sentence start mark in received packets of audio data, initiateing of location voice sentence
Position;
Second positioning unit, for based on voice with described voice sentence start mark adjacent pair in received packets of audio data
Sentence end mark, the end position of location voice sentence;
Concatenation unit, for according to described original position and described end position, the audio frequency sheet in splicing respective audio packet
Section, obtains described complete voice sentence.
13. 1 kinds of audio frequency processing systems, it is characterised in that include the sending ending equipment as described in claim 7-9 any one,
And the receiving device as described in claim 10-12 any one.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610404998.0A CN106101094A (en) | 2016-06-08 | 2016-06-08 | Audio-frequency processing method, sending ending equipment, receiving device and audio frequency processing system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610404998.0A CN106101094A (en) | 2016-06-08 | 2016-06-08 | Audio-frequency processing method, sending ending equipment, receiving device and audio frequency processing system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106101094A true CN106101094A (en) | 2016-11-09 |
Family
ID=57228391
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610404998.0A Pending CN106101094A (en) | 2016-06-08 | 2016-06-08 | Audio-frequency processing method, sending ending equipment, receiving device and audio frequency processing system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106101094A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108305628A (en) * | 2017-06-27 | 2018-07-20 | 腾讯科技(深圳)有限公司 | Audio recognition method, device, computer equipment and storage medium |
CN109377998A (en) * | 2018-12-11 | 2019-02-22 | 科大讯飞股份有限公司 | A kind of voice interactive method and device |
CN113192519A (en) * | 2021-04-29 | 2021-07-30 | 北京达佳互联信息技术有限公司 | Audio encoding method and apparatus, and audio decoding method and apparatus |
CN114242120A (en) * | 2021-11-25 | 2022-03-25 | 广东电力信息科技有限公司 | Audio editing method and audio marking method based on DTMF technology |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1262570A (en) * | 1999-01-22 | 2000-08-09 | 摩托罗拉公司 | Communication apparatus and method for breakpoint to speaching mode |
CN101599269A (en) * | 2009-07-02 | 2009-12-09 | 中国农业大学 | Sound end detecting method and device |
CN101834964A (en) * | 2010-03-31 | 2010-09-15 | 耿直 | Voice data transmission processing method and voice data transmission processor |
CN103680500A (en) * | 2012-08-29 | 2014-03-26 | 北京百度网讯科技有限公司 | Speech recognition method and device |
US20140163986A1 (en) * | 2012-12-12 | 2014-06-12 | Electronics And Telecommunications Research Institute | Voice-based captcha method and apparatus |
CN104780263A (en) * | 2015-03-10 | 2015-07-15 | 广东小天才科技有限公司 | Method and device for voice breakpoint extension judgment |
-
2016
- 2016-06-08 CN CN201610404998.0A patent/CN106101094A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1262570A (en) * | 1999-01-22 | 2000-08-09 | 摩托罗拉公司 | Communication apparatus and method for breakpoint to speaching mode |
CN101599269A (en) * | 2009-07-02 | 2009-12-09 | 中国农业大学 | Sound end detecting method and device |
CN101834964A (en) * | 2010-03-31 | 2010-09-15 | 耿直 | Voice data transmission processing method and voice data transmission processor |
CN103680500A (en) * | 2012-08-29 | 2014-03-26 | 北京百度网讯科技有限公司 | Speech recognition method and device |
US20140163986A1 (en) * | 2012-12-12 | 2014-06-12 | Electronics And Telecommunications Research Institute | Voice-based captcha method and apparatus |
CN104780263A (en) * | 2015-03-10 | 2015-07-15 | 广东小天才科技有限公司 | Method and device for voice breakpoint extension judgment |
Non-Patent Citations (1)
Title |
---|
陈宇京: "《语感与乐感-汉语声乐语言人声阐释研究》", 30 September 2008 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108305628A (en) * | 2017-06-27 | 2018-07-20 | 腾讯科技(深圳)有限公司 | Audio recognition method, device, computer equipment and storage medium |
CN108305628B (en) * | 2017-06-27 | 2020-10-02 | 腾讯科技(深圳)有限公司 | Speech recognition method, speech recognition device, computer equipment and storage medium |
CN109377998A (en) * | 2018-12-11 | 2019-02-22 | 科大讯飞股份有限公司 | A kind of voice interactive method and device |
CN113192519A (en) * | 2021-04-29 | 2021-07-30 | 北京达佳互联信息技术有限公司 | Audio encoding method and apparatus, and audio decoding method and apparatus |
CN114242120A (en) * | 2021-11-25 | 2022-03-25 | 广东电力信息科技有限公司 | Audio editing method and audio marking method based on DTMF technology |
CN114242120B (en) * | 2021-11-25 | 2023-11-10 | 广东电力信息科技有限公司 | Audio editing method and audio marking method based on DTMF technology |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106101094A (en) | Audio-frequency processing method, sending ending equipment, receiving device and audio frequency processing system | |
US11211076B2 (en) | Key phrase detection with audio watermarking | |
JP7094485B2 (en) | Business data processing method, equipment and related equipment | |
CN110268469A (en) | Server side hot word | |
CN106486126B (en) | Speech recognition error correction method and device | |
CN104144108B (en) | A kind of message responding method, apparatus and system | |
CN108184135A (en) | Method for generating captions and device, storage medium and electric terminal | |
CN106782551A (en) | A kind of speech recognition system and method | |
CN102708865A (en) | Method, device and system for voice recognition | |
CN106570100A (en) | Information search method and device | |
CN105551480B (en) | Dialect conversion method and device | |
CN108924583B (en) | Video file generation method, device, system and storage medium thereof | |
CN111627463B (en) | Voice VAD tail point determination method and device, electronic equipment and computer readable medium | |
CN103646654B (en) | A kind of recording data sharing method and terminal | |
EP3613041B1 (en) | Handling of poor audio quality in a terminal device | |
CN103327021B (en) | Method, devices and system of multi-device interaction | |
CN107274882A (en) | Data transmission method and device | |
CN101808167B (en) | Method for procedure tracking, device and system | |
CN111698552A (en) | Video resource generation method and device | |
CN110491389A (en) | A kind of method for recognizing sound-groove of telephone traffic system | |
CN106911926A (en) | A kind of video code rate recognition methods and device | |
EP2913822B1 (en) | Speaker recognition | |
JP5479223B2 (en) | Homepage guidance method and system using acoustic communication method | |
CN112712793A (en) | ASR (error correction) method based on pre-training model under voice interaction and related equipment | |
CN104883625A (en) | Information display method, terminal device, server, and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20161109 |