CN106101094A

CN106101094A - Audio-frequency processing method, sending ending equipment, receiving device and audio frequency processing system

Info

Publication number: CN106101094A
Application number: CN201610404998.0A
Authority: CN
Inventors: 王俊雄; 郁凌
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2016-06-08
Filing date: 2016-06-08
Publication date: 2016-11-09

Abstract

Disclosure one audio-frequency processing method, sending ending equipment, receiving device and audio frequency processing system, described method is by when sending ending equipment inputting audio, the voice sentence of typing is marked process, make the audio frequency to be transmitted generated comprises the label information for labelling complete speech sentence, thus follow-up receive after the voice data of sending ending equipment at receiving device, complete voice sentence can be therefrom extracted according to its label information comprised, the continuous broadcasting of complete speech sentence can be carried out on this basis, contribute to user quick, understand the audio-frequency information answered exactly, thus apply the Internet phone-calling efficiency that the application can be effectively improved under complex network environment.

Description

Audio-frequency processing method, sending ending equipment, receiving device and audio frequency processing system

Technical field

The invention belongs to audio signal processing technique field, particularly relate to a kind of audio-frequency processing method, sending ending equipment, receiving terminal Equipment and audio frequency processing system.

Background technology

At present, network tool is utilized to carry out the application of voice call more and more extensively, as carried out voice by network tool Meeting etc..

In this type of application scenarios, the speech audio of user often because of the network environment of complex, and by various not Determine the impact of factor (such as the power of wireless network signal, fire wall, systematic function etc.), thus there will be network audio transmission Incoherent phenomenon.The discontinuous transmission of network audio, eventually results in the voice sentence (real-time play) that user answers and absolutely continues Continuous, i.e. cause the discontinuous broadcasting of complete speech sentence, such as user, after hearing the first half sentence of first voice, postpones one section Time hears the later half sentence of first voice and the first half sentence of second voice, hears the after continuing to postpone a period of time Later half sentence of two voices etc., can have a strong impact on the transmission efficiency of user, thus understand quickly and accurately by user and answered Audio-frequency information bring difficulty.

Summary of the invention

In view of this, it is an object of the invention to provide a kind of audio-frequency processing method, sending ending equipment, receiving device and Audio frequency processing system, it is intended to improve in Internet phone-calling, because of the incoherence of network audio transmission the transmission efficiency that causes low this One problem.

To this end, the present invention is disclosed directly below technical scheme:

A kind of audio-frequency processing method, is applied to sending ending equipment, and described method includes:

The voice sentence labelling presetting the current speech sentence of real-time typing processes, and obtains comprising corresponding voice sentence labelling The target audio of information, described voice sentence label information is used for the voice sentence that labelling one is complete；

Send described target audio to receiving device.

Said method, it is preferred that described voice sentence label information includes voice sentence start mark and voice sentence end mark, The most described voice sentence labelling presetting the current speech sentence of real-time typing processes, and obtains comprising corresponding voice sentence labelling letter The target audio of breath, including:

Initiate Rule of judgment based on default voice sentence, rise for the voice sentence needed for the current speech sentence generation of real-time typing Beginning labelling；Wherein, described voice sentence start mark is for the original position of current speech sentence described in labelling；

Terminating Rule of judgment based on default voice sentence, the voice sentence needed for generating for described current speech sentence terminates mark Note；Wherein, described voice sentence end mark is for the end position of current speech sentence described in labelling.

Said method, it is preferred that the described target audio of described transmission includes to receiving device:

Described target audio is encapsulated as the packets of audio data of corresponding number, and sends each described packets of audio data successively To described receiving device.

A kind of audio-frequency processing method, is applied to receiving device, and described method includes:

Receiving target audio, described target audio comprises voice sentence label information, and described voice sentence label information is used for marking Remember complete voice sentence；

Based on described voice sentence label information, from described target audio, extract complete voice sentence；

Play described complete voice sentence.

Said method, it is preferred that described reception target audio includes:

Receive each packets of audio data from sending ending equipment successively.

Said method, it is preferred that described voice sentence label information includes voice sentence start mark and voice sentence end mark, Based on described voice sentence label information described in then, from described target audio, extract complete voice sentence, including:

Based on the voice sentence start mark in received packets of audio data, the original position of location voice sentence；

Based on voice sentence end mark with described voice sentence start mark adjacent pair in received packets of audio data, fixed The end position of position voice sentence；

According to described original position and described end position, the audio fragment in splicing respective audio packet, obtain institute State complete voice sentence.

A kind of sending ending equipment, including:

Labelling processing module, processes for the voice sentence labelling presetting the current speech sentence of real-time typing, obtains Comprising the target audio of corresponding voice sentence label information, described voice sentence label information is used for the voice sentence that labelling one is complete；

Sending module, is used for sending described target audio to receiving device.

Above-mentioned sending ending equipment, it is preferred that described voice sentence label information includes voice sentence start mark and voice sentence knot Bundle labelling, the most described labelling processing module includes:

Start mark signal generating unit, for initiateing Rule of judgment based on default voice sentence, for the current language of real-time typing Voice sentence start mark needed for the generation of sound sentence；Wherein, described voice sentence start mark is for current speech sentence described in labelling Original position；

End mark signal generating unit, for terminating Rule of judgment based on default voice sentence, raw for described current speech sentence Voice sentence end mark needed for one-tenth；Wherein, described voice sentence end mark is for the stop bits of current speech sentence described in labelling Put.

Above-mentioned sending ending equipment, it is preferred that described sending module includes:

Data encapsulation and transmitting element, for described target audio is encapsulated as the packets of audio data of corresponding number, and depend on Each described packets of audio data of secondary transmission is to described receiving device.

A kind of receiving device, including:

Receiver module, is used for receiving target audio, and described target audio comprises voice sentence label information, described voice sentence mark Note information is used for the voice sentence that labelling is complete；

Extraction module, for based on described voice sentence label information, extracts complete voice from described target audio Sentence；

Playing module, for playing described complete voice sentence.

Above-mentioned receiving device, it is preferred that described receiver module includes:

Packet receives unit, for receiving each packets of audio data from sending ending equipment successively.

Above-mentioned receiving device, it is preferred that described voice sentence label information includes voice sentence start mark and voice sentence knot Bundle labelling, the most described extraction module includes:

First positioning unit, for based on the voice sentence start mark in received packets of audio data, positioning voice sentence Original position；

Second positioning unit, for based in received packets of audio data with described voice sentence start mark adjacent pair Voice sentence end mark, the end position of location voice sentence；

Concatenation unit, for according to described original position and described end position, the sound in splicing respective audio packet Frequently fragment, obtains described complete voice sentence.

A kind of audio frequency processing system, including sending ending equipment as above and receiving device.

From above scheme, audio-frequency processing method disclosed in the present application, by when sending ending equipment inputting audio, right The voice sentence of typing is marked process so that comprise the labelling for labelling complete speech sentence in the audio frequency to be transmitted generated Information, thus follow-up receive after the voice data of sending ending equipment at receiving device, the labelling that can comprise according to it Information therefrom extracts complete voice sentence, can carry out the continuous broadcasting of complete speech sentence on this basis, contribute to user fast Speed, the audio-frequency information that understanding is answered exactly, thus apply the application can be effectively improved the network under complex network environment and lead to Words efficiency.

Accompanying drawing explanation

In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing In having technology to describe, the required accompanying drawing used is briefly described, it should be apparent that, the accompanying drawing in describing below is only this Inventive embodiment, for those of ordinary skill in the art, on the premise of not paying creative work, it is also possible to according to The accompanying drawing provided obtains other accompanying drawing.

Fig. 1 is the flow chart of a kind of audio-frequency processing method embodiment one that the application provides；

Fig. 2 is the flow chart of a kind of audio-frequency processing method embodiment two that the application provides；

Fig. 3 is the structural representation of a kind of sending ending equipment embodiment three that the application provides；

Fig. 4 is the structural representation of a kind of sending ending equipment embodiment four that the application provides.

Detailed description of the invention

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Describe, it is clear that described embodiment is only a part of embodiment of the present invention rather than whole embodiments wholely.Based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under not making creative work premise Embodiment, broadly falls into the scope of protection of the invention.

Embodiment one

With reference to the flow chart of a kind of audio-frequency processing method embodiment one that Fig. 1, Fig. 1 provide for the application, the present embodiment Method is applied to sending ending equipment, such as, be particularly applicable to carry out the transmitting terminal call instrument (call pair of voice-over-net call Can transmitting terminal, receiving terminal each other) in, as it is shown in figure 1, described audio-frequency processing method may comprise steps of:

S101: the voice sentence labelling presetting the current speech sentence of real-time typing processes, and obtains comprising corresponding voice The target audio of sentence label information, described voice sentence label information is used for the voice sentence that labelling one is complete.

Described voice sentence label information includes voice sentence start mark and voice sentence end mark.

Wherein, specifically can initiate Rule of judgment based on default voice sentence, voice sentence terminates Rule of judgment, set at transmitting terminal Standby, terminate to detect logic as described transmitting terminal call instrument adds corresponding voice sentence home detection logic and voice sentence, with This realizes every voice of institute's typing carrying out home detection and terminating detecting.At voice sentence home detection and the base of end detecting On plinth, can be that voice sentence generates, adds a start mark, knot respectively at the voice sentence original position detected, end position Bundle labelling.

In actual call scene, people typically can state a complete speech sentence the most continuously, and between different phonetic sentence Then would generally slightly pause, i.e. between different phonetic sentence, typically can there is certain time-delay, thus, when inputting audio, same language The audio frequency in sound sentence seriality on time of origin is higher, and continuous on time of origin between the audio frequency of different phonetic sentence Property then can decrease, based on this feature, and specifically can be using the discontinuous generation/typing feature of voice data as voice sentence Initial detecting foundation, if i.e. having certain delay, the generation of current speech sentence between current speech sentence and a upper voice sentence The finish time away from a upper voice sentence in the moment (can be as the criterion to collect the moment of audio frequency) is not less than scheduled duration, then it is believed that Current speech sentence starts；Accordingly, during inputting audio, if detecting audio frequency typing stopping beyond scheduled duration occurs , then it is believed that current speech sentence terminates when occurring and pausing.

On the basis of carrying out voice sentence home detection and terminating detecting, can be by straight to voice sentence start mark and end mark Connect the corresponding position being added on institute's inputting audio data, such as can be directly at the initial mark of voice data head interpolation of voice sentence Note, adds end mark at its voice data afterbody；Can also joining day axle in voice call instrument in advance, and in the time On axle, corresponding temporal information safeguarded in every voice for typing, thus, carry out voice sentence home detection and terminate detecting On the basis of, a pair voice sentence start mark and end mark can be added for each voice sentence on a timeline, every on time shaft To start mark, end mark, indirectly can reflect corresponding according to the corresponding relation of the voice data of institute's typing Yu temporal information The original position of voice sentence, end position.

<start mark, the end mark>of each adjacent pair, to the voice sentence complete for labelling one, follow-up passes through In voice data, extract the voice content between start mark and the end mark of described adjacent pair, obtain one complete Voice sentence.

S102: send described target audio to receiving device.

The voice data of institute's typing is being carried out after voice sentence labelling processes, can continue the audio frequency carrying label information Data carry out data encapsulation process, and after completing encapsulation, send each packets of audio data of encapsulation gained to network successively In receiving device.When voice data being carried out packing encapsulation, specifically can be according to the data volume size of every speech audio Determining its corresponding packet number, wherein, after being packaged processing, every speech audio may corresponding one or more sounds Frequently packet.

Embodiment two

With reference to the flow chart of a kind of audio-frequency processing method embodiment two that Fig. 2, Fig. 2 provide for the application, the present embodiment Method is applied to receiving device, such as, be particularly applicable to carry out in the receiving terminal call instrument of voice-over-net call, such as Fig. 2 Shown in, described audio-frequency processing method may comprise steps of:

S201: receive target audio, described target audio comprises voice sentence label information, and described voice sentence label information is used In the voice sentence that labelling is complete.

Specifically, receiving terminal call instrument can be by receiving each voice data of sending ending equipment in automatic network successively Bag, it is achieved required voice data is received.The voice data received carries the start mark of voice sentence, end Labelling.

S202: based on described voice sentence label information, extract complete voice sentence from described target audio.

Receiving terminal call instrument is being sequentially received after each packets of audio data of sending ending equipment, to each audio frequency Packet carries out unpacking, resolving, and parses voice sentence start mark therein and voice sentence end mark, and according to the time of reception Sequencing adjacent start mark, end mark are matched, afterwards, can be based on start mark, the joining of end mark To situation, the original position of location complete speech sentence and end position, thus on this basis, can according to described original position and Described end position, the audio content extracting, splicing in respective audio packet, obtain complete voice sentence.

S203: play described complete voice sentence.

Based on voice sentence start mark and end mark, extract from the voice data received complete speech sentence it After, the complete speech sentence that receiving terminal call instrument can continue being extracted is play continuously.

Wherein, during receiving and parsing through packets of audio data, if the pending packet received (1 or many Individual) in only exist voice sentence start mark, and do not find matched voice sentence end mark, then need to continue waiting for, Until there is voice sentence end mark in the packet received, can joining according to described voice sentence start mark and end mark To situation, carry out extraction and the broadcasting of complete speech sentence.

Embodiment three

With reference to the structural representation of a kind of sending ending equipment embodiment three that Fig. 3, Fig. 3 provide for the application, described transmission End equipment specifically could be for carrying out voice-over-net call transmitting terminal call instrument (both call sides can each other transmitting terminal, connect Receiving end) in, as it is shown on figure 3, described sending ending equipment can include labelling processing module 301 and sending module 302.

Labelling processing module 301, processes for the voice sentence labelling presetting the current speech sentence of real-time typing, To comprising the target audio of corresponding voice sentence label information, described voice sentence label information is for the complete voice sentence of labelling one.

Described voice sentence label information includes voice sentence start mark and voice sentence end mark, and the most described labelling processes mould Block 301 includes start mark signal generating unit and end mark signal generating unit.

Sending module 302, is used for sending described target audio to receiving device.

Described sending module 302 includes data encapsulation and transmitting element, for described target audio is encapsulated as corresponding The packets of audio data of number, and send each described packets of audio data successively to described receiving device.

Embodiment four

With reference to the structural representation of a kind of receiving device embodiment four that Fig. 4, Fig. 4 provide for the application, described reception End equipment specifically could be for carrying out the receiving terminal call instrument of voice-over-net call, as shown in Figure 4, described receiving device Receiver module 401, extraction module 402 and playing module 403 can be included.

Receiver module 401, is used for receiving target audio, and described target audio comprises voice sentence label information, described voice Sentence label information is used for the voice sentence that labelling is complete.

Described receiver module 401 includes that packet receives unit, for receiving each sound from sending ending equipment successively Frequently packet.

Extraction module 402, for based on described voice sentence label information, extracts complete language from described target audio Sound sentence.

Described extraction module 402 includes the first positioning unit, the second positioning unit and concatenation unit.

Playing module 403, for playing described complete voice sentence.

From above scheme, the voice sentence of typing, by when sending ending equipment inputting audio, is marked by the application Note processes so that comprise the label information for labelling complete speech sentence in the audio frequency to be transmitted generated, thus follow-up is connecing Receiving end equipment receives after the voice data of sending ending equipment, therefrom can extract completely according to its label information comprised Voice sentence, the continuous broadcasting of complete speech sentence can be carried out on this basis, contribute to user and understand quickly and accurately and answered Audio-frequency information, thus apply the Internet phone-calling efficiency that the application can be effectively improved under complex network environment.

Embodiment five

The open a kind of audio frequency processing system of the present embodiment five, described system includes that transmitting terminal as disclosed in embodiment three sets Standby, and the receiving device as disclosed in embodiment four.

Described sending ending equipment, receiving device can be namely for the transmitting terminal call works carrying out voice-over-net call Tool, receiving terminal call instrument, in actual voice-over-net call scene, the transmission of both call sides voice data the most each other End, receiving terminal, therefore, generally, the call instrument that both call sides is used, i.e. as described transmission in communication process End equipment, again as described receiving device.

It should be noted that each embodiment in this specification all uses the mode gone forward one by one to describe, each embodiment weight Point explanation is all the difference with other embodiments, and between each embodiment, identical similar part sees mutually.

For convenience of description, it is divided into various module or unit to be respectively described with function when describing system above or device. Certainly, the function of each unit can be realized in same or multiple softwares and/or hardware when implementing the application.

As seen through the above description of the embodiments, those skilled in the art it can be understood that to the application can The mode adding required general hardware platform by software realizes.Based on such understanding, the technical scheme essence of the application On the part that in other words prior art contributed can embody with the form of software product, this computer software product Can be stored in storage medium, such as ROM/RAM, magnetic disc, CD etc., including some instructions with so that a computer equipment (can be personal computer, server, or the network equipment etc.) performs some of each embodiment of the application or embodiment Method described in part.

Finally, in addition it is also necessary to explanation, in this article, the relational terms of such as first, second, third and fourth or the like It is used merely to separate an entity or operation with another entity or operating space, and not necessarily requires or imply these Relation or the order of any this reality is there is between entity or operation.And, term " includes ", " comprising " or it is any Other variants are intended to comprising of nonexcludability, so that include the process of a series of key element, method, article or equipment Not only include those key elements, but also include other key elements being not expressly set out, or also include for this process, side The key element that method, article or equipment are intrinsic.In the case of there is no more restriction, statement " including ... " limit Key element, it is not excluded that there is also other identical element in including the process of described key element, method, article or equipment.

The above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For Yuan, under the premise without departing from the principles of the invention, it is also possible to make some improvements and modifications, these improvements and modifications also should It is considered as protection scope of the present invention.

Claims

1. an audio-frequency processing method, it is characterised in that be applied to sending ending equipment, described method includes:

The voice sentence labelling presetting the current speech sentence of real-time typing processes, and obtains comprising corresponding voice sentence label information Target audio, described voice sentence label information is for the complete voice sentence of labelling one；

Send described target audio to receiving device.

Method the most according to claim 1, it is characterised in that described voice sentence label information includes voice sentence start mark With voice sentence end mark, the most described voice sentence labelling presetting the current speech sentence of real-time typing processes, and is wrapped Containing the target audio of corresponding voice sentence label information, including:

Rule of judgment is initiateed, for the initial mark of the voice sentence needed for the current speech sentence generation of real-time typing based on default voice sentence Note；Wherein, described voice sentence start mark is for the original position of current speech sentence described in labelling；

Rule of judgment is terminated, the voice sentence end mark needed for generating for described current speech sentence based on default voice sentence；Its In, described voice sentence end mark is for the end position of current speech sentence described in labelling.

Method the most according to claim 1, it is characterised in that the described target audio of described transmission is to receiving device bag Include:

Described target audio is encapsulated as the packets of audio data of corresponding number, and sends each described packets of audio data successively to institute State receiving device.

4. an audio-frequency processing method, it is characterised in that be applied to receiving device, described method includes:

Receiving target audio, described target audio comprises voice sentence label information, and described voice sentence label information is complete for labelling Whole voice sentence；

Play described complete voice sentence.

Method the most according to claim 4, it is characterised in that described reception target audio includes:

Receive each packets of audio data from sending ending equipment successively.

Method the most according to claim 5, it is characterised in that described voice sentence label information includes voice sentence start mark With voice sentence end mark, then described based on described voice sentence label information, from described target audio, extract complete language Sound sentence, including:

Based on voice sentence end mark with described voice sentence start mark adjacent pair in received packets of audio data, position language The end position of sound sentence；

According to described original position and described end position, the audio fragment in splicing respective audio packet, obtain described complete Whole voice sentence.

7. a sending ending equipment, it is characterised in that including:

Labelling processing module, processes for the voice sentence labelling presetting the current speech sentence of real-time typing, is comprised The target audio of corresponding voice sentence label information, described voice sentence label information is used for the voice sentence that labelling one is complete；

Sending module, is used for sending described target audio to receiving device.

Sending ending equipment the most according to claim 7, it is characterised in that described voice sentence label information includes that voice sentence rises Beginning labelling and voice sentence end mark, the most described labelling processing module includes:

Start mark signal generating unit, for initiateing Rule of judgment based on default voice sentence, for the current speech sentence of real-time typing Voice sentence start mark needed for generation；Wherein, described voice sentence start mark initiateing for current speech sentence described in labelling Position；

End mark signal generating unit, for terminating Rule of judgment based on default voice sentence, generates institute for described current speech sentence The voice sentence end mark needed；Wherein, described voice sentence end mark is for the end position of current speech sentence described in labelling.

Sending ending equipment the most according to claim 7, it is characterised in that described sending module includes:

Data encapsulation and transmitting element, for described target audio is encapsulated as the packets of audio data of corresponding number, and send out successively Give each described packets of audio data to described receiving device.

10. a receiving device, it is characterised in that including:

Receiver module, is used for receiving target audio, and described target audio comprises voice sentence label information, and described voice sentence labelling is believed Breath is used for the voice sentence that labelling is complete；

Extraction module, for based on described voice sentence label information, extracts complete voice sentence from described target audio；

Playing module, for playing described complete voice sentence.

11. receiving devices according to claim 10, it is characterised in that described receiver module includes:

12. receiving devices according to claim 11, it is characterised in that described voice sentence label information includes voice sentence Start mark and voice sentence end mark, the most described extraction module includes:

First positioning unit, for based on the voice sentence start mark in received packets of audio data, initiateing of location voice sentence Position；

Second positioning unit, for based on voice with described voice sentence start mark adjacent pair in received packets of audio data Sentence end mark, the end position of location voice sentence；

Concatenation unit, for according to described original position and described end position, the audio frequency sheet in splicing respective audio packet Section, obtains described complete voice sentence.

13. 1 kinds of audio frequency processing systems, it is characterised in that include the sending ending equipment as described in claim 7-9 any one, And the receiving device as described in claim 10-12 any one.