CN105336329A

CN105336329A - Speech processing method and system

Info

Publication number: CN105336329A
Application number: CN201510622499.4A
Authority: CN
Inventors: 王少敏; 雷闪耀; 王洪
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2015-09-25
Filing date: 2015-09-25
Publication date: 2016-02-17
Anticipated expiration: 2035-09-25
Also published as: CN105336329B

Abstract

The invention discloses a speech processing method, comprising: collecting initial speech information by a voice collector to form first audio information; subjecting the first audio information to speech recognition and conversion to obtain corresponding text information; determining the attribute of speech conversion; extracting audio information corresponding to the text information from preset audio data resources according to the attribute of the text information and speech conversion, thus forming second audio information; storing or playing the second audio information. The speech of a user can be multiply processed, and the user's experience on voice changer software can be improved. The invention also discloses a speech processing system.

Description

A kind of method of speech processing and system

Technical field

The present invention relates to equipment voice processing technology field, particularly relate to a kind of method of speech processing and system.

Background technology

Along with the progress of electronic technology, the application software be applied in electronic equipment is also more and more abundanter.Such as, various change of voice softwares popular at present, certain specific voice that are converted to of the sound of collection can be play, as " tom cat ", by the collection audio frequency that voice collection device is real-time, then the audio frequency collected is carried out transposition, the simple audio frequency process of speed change, generate voice in fixed form and play.But existing change of voice software, the form of the change of voice is more single, the sound of user can not be carried out the conversion of diversification, makes the experience of user to change of voice software poor.

Summary of the invention

The invention provides a kind of method of speech processing, can realize carrying out diversification process to the voice of user, improve user to the experience of change of voice software.

The invention provides a kind of method of speech processing, described method comprises:

Utilize voice collection device to gather initial speech information, form the first audio-frequency information;

Speech recognition is carried out to described first audio-frequency information, is converted to corresponding Word message;

Determine the attribute of speech conversion;

From preset speech data resource, extract the audio-frequency information corresponding with described Word message with the attribute of speech conversion according to described Word message, form the second audio-frequency information;

Preserve or play described second audio-frequency information.

Preferably, describedly from preset speech data resource, extract the audio-frequency information corresponding with described Word message with the attribute of speech conversion according to described Word message, form the second audio-frequency information, comprising:

According to word or phrase, N number of part is divided into described Word message;

In preset speech data resource, search the audio-frequency information fragment corresponding with part every in described N number of part respectively, obtain N number of audio-frequency information;

Splice described N number of audio-frequency information, obtain described second audio-frequency information.

Read the attribute of preset speech data resource;

Search the attribute of the preset speech data resource of the attribute meeting described speech conversion;

Extract audio-frequency information corresponding with described Word message in the described preset speech data resource found, form the second audio-frequency information.

In preset speech data resource, find out the audio-frequency information that M part in the N number of part be divided into according to word or phrase described Word message matches;

Analyze the audio-frequency information of described M part, obtain audio frequency processing rule; According to described audio frequency processing rule, audio frequency process is carried out to remaining N-M part, obtain N-M audio-frequency information after processing;

N-M audio-frequency information after a described M audio-frequency information and process is combined, forms described second audio frequency.

Preferably, describedly from preset speech data resource, extract the audio-frequency information corresponding with described Word message with the attribute of speech conversion according to described Word message, form the second audio-frequency information, also comprise:

Obtain the 3rd audio-frequency information with the extracted described audio-frequency information corresponding with described Word message with incidence relation; Build obtain the second audio-frequency information based on described 3rd audio-frequency information, the described audio-frequency information corresponding with described Word message.

Preferably, described preset speech data resource comprises:

Voice data resource;

And/or

Include the video data resource of voice data resource.

Preferably, the comprising of described preset speech data resource:

From described video data resource, extract audio-frequency information, obtain the voice data resource comprising described audio-frequency information.

A kind of speech processing system, comprising:

Voice collecting unit, for gathering initial speech information, forms the first audio-frequency information;

Voice recognition unit, for carrying out speech recognition to described first audio-frequency information, is converted to corresponding Word message;

Determining unit, for determining the attribute of speech conversion;

Extraction unit, for extracting the audio-frequency information corresponding with described Word message with the attribute of speech conversion according to described Word message from preset speech data resource, forms the second audio-frequency information;

Broadcast unit, for playing described second audio-frequency information;

And/or

Storage unit, for preserving described second audio-frequency information.

Preferably, described extraction unit comprises:

Division unit, for being divided into N number of part to described Word message according to word or phrase;

First searches unit, in preset speech data resource, searches the audio-frequency information fragment corresponding with part every in described N number of part respectively, obtains N number of audio-frequency information;

Concatenation unit, for splicing described N number of audio-frequency information, obtains described second audio-frequency information.

Preferably, described extraction unit comprises:

Reading unit, for reading the attribute of preset speech data resource;

Second searches unit, for searching the attribute of the preset speech data resource of the attribute meeting described speech conversion;

Component units, for extracting audio-frequency information corresponding with described Word message in the described preset speech data resource that finds, forms the second audio-frequency information.

Preferably, described extraction unit comprises:

3rd searches unit, in preset speech data resource, finds out the audio-frequency information that M part in the N number of part be divided into according to word or phrase described Word message matches;

Analytic unit, for analyzing the audio-frequency information of described M part, obtains audio frequency processing rule;

Processing unit, for according to described audio frequency processing rule, carries out audio frequency process to remaining N-M part, obtains N-M audio-frequency information after processing;

Assembled unit, for being combined by N-M audio-frequency information after a described M audio-frequency information and process, forms described second audio frequency.

Preferably, described system also comprises:

Acquiring unit, for obtaining the 3rd audio-frequency information with the extracted described audio-frequency information corresponding with described Word message with incidence relation;

Construction unit, for building obtain the second audio-frequency information based on described 3rd audio-frequency information, the described audio-frequency information corresponding with described Word message.

Preferably, described preset speech data resource comprises: voice data resource;

And/or

Include the video data resource of voice data resource.

Preferably, the comprising of described preset speech data resource:

From such scheme, a kind of method of speech processing provided by the invention, by gathering initial speech information, and the initial speech information collected is carried out speech recognition conversion is corresponding Word message, from preset speech data resource, corresponding audio-frequency information is extracted according to Word message and the attribute of the speech conversion determined, and preserve or play the audio-frequency information extracted, make it possible to, according to the difference hobby of user, initial speech information is converted to respective audio information, achieve the diversification conversion of initial speech, improve the experience of user.

Accompanying drawing explanation

In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.

The process flow diagram of Fig. 1 a kind of method of speech processing disclosed in the embodiment of the present invention one;

The process flow diagram of Fig. 2 a kind of method of speech processing disclosed in the embodiment of the present invention two;

The process flow diagram of Fig. 3 a kind of method of speech processing disclosed in the embodiment of the present invention three;

The process flow diagram of Fig. 4 a kind of method of speech processing disclosed in the embodiment of the present invention four;

The process flow diagram of Fig. 5 a kind of method of speech processing disclosed in the embodiment of the present invention five;

The structural representation of Fig. 6 a kind of speech processing system disclosed in the embodiment of the present invention six;

The structural representation of Fig. 7 a kind of speech processing system disclosed in the embodiment of the present invention seven;

The structural representation of Fig. 8 a kind of speech processing system disclosed in the embodiment of the present invention eight;

The structural representation of Fig. 9 a kind of speech processing system disclosed in the embodiment of the present invention nine;

The structural representation of Figure 10 a kind of speech processing system disclosed in the embodiment of the present invention ten.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.

As shown in Figure 1, a kind of process flow diagram of method of speech processing disclosed in the embodiment of the present invention one, in the present embodiment, described method can comprise the following steps:

Step 101, utilize voice collection device to gather initial speech information, form the first audio-frequency information;

When needs carry out diversification process to voice, first gather initial speech information by voice collection device, wherein, described voice collection device can such as, for getting the device of voice messaging, phonographic recorder etc.Described initial speech refers to that user wants to carry out the voice of diversification process, and initial speech can be the sound that user oneself sends, and also can be other sound in environment.By initial speech information structure first audio-frequency information collected.

Step 102, speech recognition is carried out to the first audio-frequency information, be converted to corresponding Word message;

By speech recognition technology, speech recognition is carried out to the first audio-frequency information formed, the first audio-frequency information is converted to corresponding Word message.

Step 103, determine the attribute of speech conversion;

After the first audio-frequency information got is converted to corresponding Word message, according to user, voice is carried out to the demand of diversification process, determine the attribute of speech conversion.That is, determine that user needs the voice of speech conversion why type.

Step 104, from preset speech data resource, extract the audio-frequency information corresponding with described Word message with the attribute of speech conversion according to described Word message, form the second audio-frequency information;

The Word message of the first audio-frequency information obtained according to speech recognition conversion and according to user, voice are carried out to the attribute of the speech conversion that diversification processing demands is determined, in the speech data resource pre-set, extract the audio-frequency information corresponding with Word message, form the second audio-frequency information.

Step 105, preserve or play described second audio-frequency information.

The second audio-frequency information extracted can carry out preserving or playing by user.

In sum, in the above-described embodiments, by gathering initial speech information, and the initial speech information collected is carried out speech recognition conversion is corresponding Word message, from preset speech data resource, corresponding audio-frequency information is extracted according to Word message and the attribute of the speech conversion determined, and preserve or play the audio-frequency information extracted, make it possible to, according to the difference hobby of user, initial speech information is converted to respective audio information, achieve the diversification conversion of initial speech, improve the experience of user.

As shown in Figure 2, a kind of process flow diagram of method of speech processing disclosed in the embodiment of the present invention two, in the present embodiment, described method can comprise following steps:

Step 201, utilize voice collection device to gather initial speech information, form the first audio-frequency information;

Step 202, speech recognition is carried out to the first audio-frequency information, be converted to corresponding Word message;

Step 203, determine the attribute of speech conversion;

Step 204, according to word or phrase, N number of part is divided into Word message;

By the corresponding Word message be converted to after speech recognition, split according to word or phrase, obtain word or the phrase of N number of part.

Step 205, in preset speech data resource, search respectively and audio-frequency information fragment that in described N number of part, every part is corresponding, obtain N number of audio-frequency information;

Then, in the speech data resource pre-set, respectively the word of N number of part or the audio-frequency information fragment of phrase are searched, namely find the audio-frequency information fragment corresponding with word or phrase.

Step 206, splice N number of audio-frequency information, obtain the second audio-frequency information;

Then the word of the N number of part found or the audio-frequency information fragment of phrase are spliced according to the order of Word message, form the second audio-frequency information corresponding with Word message.

Step 207, preservation or broadcasting the second audio-frequency information.

In sum, the Word message of the present embodiment by speech recognition conversion is obtained on the basis of above-described embodiment one, split according to the mode of word or phrase, and then find in preset speech data resource with split after the word that obtains or the corresponding audio-frequency information fragment of phrase, then audio-frequency information fragment is spliced, obtain the second audio-frequency information, then the second audio-frequency information is preserved or play, make it possible to, according to the difference hobby of user, initial speech information is converted to respective audio information, achieve the diversification conversion of initial speech, improve the experience of user.

As shown in Figure 3, a kind of process flow diagram of method of speech processing disclosed in the embodiment of the present invention three, in the present embodiment, described method can comprise following steps:

Step 301, utilize voice collection device to gather initial speech information, form the first audio-frequency information;

Step 302, speech recognition is carried out to the first audio-frequency information, be converted to corresponding Word message;

Step 303, determine the attribute of speech conversion;

Step 304, read the attribute of preset speech data resource;

Read the attribute of preset speech data resource, wherein, the attribute of preset speech data resource is given in advance.Such as, when building preset speech data resource, to the corresponding label of speech data resource distribution, wherein, this label includes the brief description to speech data resource; Or arrange filename to speech data resource, described filename can indicate the kind of this speech data resource, or the name etc. of historical romance personnel.Just can obtain the attribute of preset speech data resource by reading preset label or filename, wherein, preset label or filename can generate automatically according to speech data resource, also can be that user is manually for this speech data resource is added.

Step 305, search the attribute of the preset speech data resource of the attribute meeting described speech conversion;

After the attribute of preset speech data resource is read, find in preset speech data resource and meet user carries out the speech conversion that diversification process is determined attribute to voice.Such as, user wants to be the sound in little Shenyang by speech conversion, then in preset speech data resource, find the speech data resource that attribute is little Shenyang.

Audio-frequency information corresponding with described Word message in the preset speech data resource that step 306, extraction find, forms the second audio-frequency information;

The audio-frequency information corresponding with Word message is extracted in the speech data resource of searching.

Step 307, preservation or broadcasting the second audio-frequency information.

In sum, the present embodiment passes through the attribute reading preset speech data resource on the basis of embodiment one, and then in preset speech data resource, find the attribute of speech data resource of the attribute meeting speech conversion, and the audio-frequency information corresponding with Word message is extracted in the attribute of the speech data resource found, form the second audio-frequency information, then the second audio-frequency information is preserved or play, make it possible to, according to the difference hobby of user, initial speech information is converted to respective audio information, achieve the diversification conversion of initial speech, improve the experience of user.

As shown in Figure 4, a kind of process flow diagram of method of speech processing disclosed in the embodiment of the present invention four, in the present embodiment, described method can comprise following steps:

Step 401, utilize voice collection device to gather initial speech information, form the first audio-frequency information;

Step 402, speech recognition is carried out to the first audio-frequency information, be converted to corresponding Word message;

Step 403, determine the attribute of speech conversion;

Step 404, in preset speech data resource, find out the audio-frequency information that M part in the N number of part be divided into according to word or phrase Word message matches;

By the corresponding Word message be converted to after speech recognition, split according to word or phrase, obtain word or the phrase of N number of part.Then in preset speech data resource, find out the audio-frequency information that M part matches, namely split word or the phrase of the N number of part obtained, wherein have the word of M part or phrase can find out the audio-frequency information matched in preset speech data resource.Such as, the corresponding Word message be converted to after speech recognition is " I sees that sideslip worn by aunt's skirt ", and " sideslip " can find out the audio-frequency information matched in preset speech data resource.

The audio-frequency information of step 405, analysis M part, obtains audio frequency processing rule;

The audio-frequency information of M the part found is analyzed, such as, carries out audio sound line law-analysing etc., obtain the audio frequency processing rule of the audio-frequency information of this M part.

Step 406, according to audio frequency processing rule, audio frequency process is carried out to remaining N-M part, obtains N-M audio-frequency information after process;

N-M part of corresponding audio-frequency information will can not be found in preset speech data resource, process accordingly according to the audio frequency processing rule of the audio-frequency information of M part, obtain N-M the audio-frequency information with M part with identical audio frequency processing rule.Such as, " I sees that aunt's skirt is worn " is split the word or phrase that obtain, the audio frequency processing rule of word or the audio-frequency information of phrase in preset speech data resource obtained is split according to " sideslip ", carry out corresponding audio frequency process, make the audio-frequency information of " I sees that aunt's skirt is worn " after processing and the audio-frequency information of " sideslip " have identical audio frequency processing rule.

Step 407, by M audio-frequency information with process after N-M audio-frequency information combine, form described second audio frequency;

Then by M audio-frequency information and N-M audio-frequency information after treatment, combine according to the order corresponding with Word message, form the second audio frequency.

Step 408, preservation or broadcasting the second audio-frequency information.

The second audio-frequency information extracted can carry out preserving or playing by user.In sum, the present embodiment is on the basis of above-described embodiment one, by the corresponding Word message be converted to after speech recognition is split according to word or phrase, and in preset speech data resource, find out the audio-frequency information that a part of word or phrase match, this part audio-frequency information found out is analyzed simultaneously, obtain processing rule, the word do not found or phrase are processed according to corresponding processing rule, obtain corresponding audio-frequency information, finally the audio-frequency information obtained is carried out being spliced to form the second audio-frequency information, then the second audio-frequency information is preserved or play, make it possible to, according to the difference hobby of user, initial speech information is converted to respective audio information, achieve the diversification conversion of initial speech, improve the experience of user.

As described in Figure 5, a kind of process flow diagram of method of speech processing disclosed in the embodiment of the present invention five, in the present embodiment, described method can comprise following steps:

Step 501, utilize voice collection device to gather initial speech information, form the first audio-frequency information;

Step 502, speech recognition is carried out to the first audio-frequency information, be converted to corresponding Word message;

Step 503, determine the attribute of speech conversion;

Step 504, from preset speech data resource, extract the audio-frequency information corresponding with described Word message with the attribute of speech conversion according to described Word message;

The Word message of the first audio-frequency information obtained according to speech recognition conversion and according to user, voice are carried out to the attribute of the speech conversion that diversification processing demands is determined, extracts the audio-frequency information corresponding with Word message in the speech data resource pre-set.

Step 505, obtain, with the extracted described audio-frequency information corresponding with described Word message, there is the 3rd audio-frequency information of incidence relation;

Tone color identification is carried out to the extracted described audio-frequency information corresponding with described Word message, determines the orator of this audio-frequency information, from voice data resource or video data resource, obtain the distinctive audio-frequency information of this orator as the 3rd audio-frequency information.That is, the 3rd audio-frequency information got is same orator with the orator of the described audio-frequency information corresponding with described word of extraction.Or the extracted common broadcasting scene of the described audio-frequency information corresponding with described Word message is analyzed, using the background sound effect of the broadcasting scene of this audio-frequency information as the 3rd audio-frequency information.

Step 506, build obtain the second audio-frequency information based on described 3rd audio-frequency information, the described audio-frequency information corresponding with described Word message;

Step 507, preservation or broadcasting the second audio-frequency information.

Concrete, in the above-described embodiment, described preset speech data resource comprises voice data resource and/or includes the video data resource of voice data resource.Wherein, audio-frequency information can for extract the audio-frequency information obtained from video data resource.

As shown in Figure 6, a kind of structural representation of speech processing system disclosed in the embodiment of the present invention six, in the present embodiment, described system can comprise:

Voice collecting unit 601, for gathering initial speech information, forms the first audio-frequency information;

When needs carry out diversification process to voice, first gather initial speech information by voice collecting unit 601, wherein, described voice collecting unit 601 can such as, for getting the device of voice messaging, phonographic recorder etc.Described initial speech refers to that user wants to carry out the voice of diversification process, and initial speech can be the sound that user oneself sends, and also can be other sound in environment.By initial speech information structure first audio-frequency information collected.

Voice recognition unit 602, for carrying out speech recognition to the first audio-frequency information, is converted to corresponding Word message;

By voice recognition unit 602, by speech recognition technology, speech recognition is carried out to the first audio-frequency information formed, the first audio-frequency information is converted to corresponding Word message.

Determining unit 603, for determining the attribute of speech conversion;

After the first audio-frequency information got is converted to corresponding Word message, according to user, voice is carried out to the demand of diversification process, determined the attribute of speech conversion by determining unit 603.That is, determine that user needs the voice of speech conversion why type.

Extraction unit 604, for extracting the audio-frequency information corresponding with described Word message with the attribute of speech conversion according to described Word message from preset speech data resource, forms the second audio-frequency information;

The Word message of the first audio-frequency information obtained according to speech recognition conversion and according to user, voice are carried out to the attribute of the speech conversion that diversification processing demands is determined, in the speech data resource pre-set, extract the audio-frequency information corresponding with Word message by extraction unit 604, form the second audio-frequency information.

Broadcast unit 605, for playing described second audio-frequency information;

And/or

Storage unit 606, for preserving the second audio-frequency information.

As shown in Figure 7, a kind of structural representation of speech processing system disclosed in the embodiment of the present invention seven, in the present embodiment, described system can comprise:

Voice collecting unit 701, for gathering initial speech information, forms the first audio-frequency information;

When needs carry out diversification process to voice, first gather initial speech information by voice collecting unit 701, wherein, described voice collecting unit 701 can such as, for getting the device of voice messaging, phonographic recorder etc.Described initial speech refers to that user wants to carry out the voice of diversification process, and initial speech can be the sound that user oneself sends, and also can be other sound in environment.By initial speech information structure first audio-frequency information collected.

Voice recognition unit 702, for carrying out speech recognition to the first audio-frequency information, is converted to corresponding Word message;

By voice recognition unit 702, by speech recognition technology, speech recognition is carried out to the first audio-frequency information formed, the first audio-frequency information is converted to corresponding Word message.

Determining unit 703, for determining the attribute of speech conversion;

After the first audio-frequency information got is converted to corresponding Word message, according to user, voice is carried out to the demand of diversification process, determined the attribute of speech conversion by determining unit 703.That is, determine that user needs the voice of speech conversion why type.

Division unit 704, for being divided into N number of part to Word message according to word or phrase;

By the corresponding Word message be converted to after speech recognition, split according to word or phrase by division unit 704, obtain word or the phrase of N number of part.

First searches unit 705, in preset speech data resource, searches the audio-frequency information fragment corresponding with part every in described N number of part respectively, obtains N number of audio-frequency information;

Then search unit 705 in the speech data resource pre-set by first, respectively the word of N number of part or the audio-frequency information fragment of phrase are searched, namely find the audio-frequency information fragment corresponding with word or phrase.

Concatenation unit 706, for splicing N number of audio-frequency information, obtains the second audio-frequency information;

Then by concatenation unit 706, the word of the N number of part found or the audio-frequency information fragment of phrase are spliced according to the order of Word message, form the second audio-frequency information corresponding with Word message.

Broadcast unit 707, for playing described second audio-frequency information;

And/or

Storage unit 708, for preserving the second audio-frequency information.

In sum, the Word message of the present embodiment by speech recognition conversion is obtained on the basis of above-described embodiment six, split according to the mode of word or phrase, and then find in preset speech data resource with split after the word that obtains or the corresponding audio-frequency information fragment of phrase, then audio-frequency information fragment is spliced, obtain the second audio-frequency information, then the second audio-frequency information is preserved or play, make it possible to, according to the difference hobby of user, initial speech information is converted to respective audio information, achieve the diversification conversion of initial speech, improve the experience of user.

As shown in Figure 8, a kind of structural representation of speech processing system disclosed in the embodiment of the present invention eight, in the present embodiment, described system can comprise:

Voice collecting unit 801, for gathering initial speech information, forms the first audio-frequency information;

When needs carry out diversification process to voice, first gather initial speech information by voice collecting unit 801, wherein, described voice collecting unit 801 can such as, for getting the device of voice messaging, phonographic recorder etc.Described initial speech refers to that user wants to carry out the voice of diversification process, and initial speech can be the sound that user oneself sends, and also can be other sound in environment.By initial speech information structure first audio-frequency information collected.

Voice recognition unit 802, for carrying out speech recognition to the first audio-frequency information, is converted to corresponding Word message;

By voice recognition unit 802, by speech recognition technology, speech recognition is carried out to the first audio-frequency information formed, the first audio-frequency information is converted to corresponding Word message.

Determining unit 803, for determining the attribute of speech conversion;

After the first audio-frequency information got is converted to corresponding Word message, according to user, voice is carried out to the demand of diversification process, determined the attribute of speech conversion by determining unit 803.That is, determine that user needs the voice of speech conversion why type.

Reading unit 804, for reading the attribute of preset speech data resource;

Read wherein by the attribute of reading unit 804 to preset speech data resource, the attribute of preset speech data resource is given in advance.Such as, when building preset speech data resource, to the corresponding label of speech data resource distribution, wherein, this label includes the brief description to speech data resource; Or arrange filename to speech data resource, described filename can indicate the kind of this speech data resource, or the name etc. of historical romance personnel.Just can obtain the attribute of preset speech data resource by reading preset label or filename, wherein, preset label or filename can generate automatically according to speech data resource, also can be that user is manually for this speech data resource is added.

Second searches unit 805, for searching the attribute of the preset speech data resource of the attribute meeting described speech conversion;

After the attribute of preset speech data resource is read, search unit 805 by second and find in preset speech data resource and meet user to carry out the speech conversion that diversification process is determined attribute to voice.Such as, user wants to be the sound in little Shenyang by speech conversion, then in preset speech data resource, find the speech data resource that attribute is little Shenyang.

Component units 806, for extracting audio-frequency information corresponding with described Word message in the preset speech data resource that finds, forms the second audio-frequency information;

In the speech data resource of searching, extract the audio-frequency information corresponding with Word message by Component units 806, form the second audio-frequency information.

Broadcast unit 807, for playing described second audio-frequency information;

And/or

Storage unit 808, for preserving the second audio-frequency information.

In sum, the present embodiment passes through the attribute reading preset speech data resource on the basis of embodiment six, and then in preset speech data resource, find the attribute of speech data resource of the attribute meeting speech conversion, and the audio-frequency information corresponding with Word message is extracted in the attribute of the speech data resource found, form the second audio-frequency information, then the second audio-frequency information is preserved or play, make it possible to, according to the difference hobby of user, initial speech information is converted to respective audio information, achieve the diversification conversion of initial speech, improve the experience of user.

As shown in Figure 9, a kind of structural representation of speech processing system disclosed in the embodiment of the present invention nine, in the present embodiment, described system can comprise:

Voice collecting unit 901, for gathering initial speech information, forms the first audio-frequency information;

When needs carry out diversification process to voice, first gather initial speech information by voice collecting unit 901, wherein, described voice collecting unit 901 can such as, for getting the device of voice messaging, phonographic recorder etc.Described initial speech refers to that user wants to carry out the voice of diversification process, and initial speech can be the sound that user oneself sends, and also can be other sound in environment.By initial speech information structure first audio-frequency information collected.

Voice recognition unit 902, for carrying out speech recognition to the first audio-frequency information, is converted to corresponding Word message;

By voice recognition unit 902, by speech recognition technology, speech recognition is carried out to the first audio-frequency information formed, the first audio-frequency information is converted to corresponding Word message.

Determining unit 903, for determining the attribute of speech conversion;

After the first audio-frequency information got is converted to corresponding Word message, according to user, voice is carried out to the demand of diversification process, determined the attribute of speech conversion by determining unit 903.That is, determine that user needs the voice of speech conversion why type.

3rd searches unit 904, in preset speech data resource, finds out the audio-frequency information that M part in the N number of part be divided into according to word or phrase Word message matches;

By the corresponding Word message be converted to after speech recognition, split according to word or phrase, obtain word or the phrase of N number of part.Then search unit 904 by the 3rd in preset speech data resource, find out M the audio-frequency information partly matched, namely split word or the phrase of the N number of part obtained, wherein have the word of M part or phrase can find out the audio-frequency information matched in preset speech data resource.Such as, the corresponding Word message be converted to after speech recognition is " I sees that sideslip worn by aunt's skirt ", and " sideslip " can find out the audio-frequency information matched in preset speech data resource.

Analytic unit 905, for analyzing the audio-frequency information of M part, obtains audio frequency processing rule;

Analyzed by the audio-frequency information of analytic unit 905 to the M found part, such as, carry out audio sound line law-analysing etc., obtain the audio frequency processing rule of the audio-frequency information of this M part.

Processing unit 906, for according to audio frequency processing rule, carries out audio frequency process to remaining N-M part, obtains N-M audio-frequency information after processing;

N-M part of corresponding audio-frequency information will can not be found in preset speech data resource, processed accordingly by the audio frequency processing rule of processing unit 906 according to the audio-frequency information of M part, obtain N-M the audio-frequency information with M part with identical audio frequency processing rule.Such as, " I sees that aunt's skirt is worn " is split the word or phrase that obtain, the audio frequency processing rule of word or the audio-frequency information of phrase in preset speech data resource obtained is split according to " sideslip ", carry out corresponding audio frequency process, make the audio-frequency information of " I sees that aunt's skirt is worn " after processing and the audio-frequency information of " sideslip " have identical audio frequency processing rule.

Assembled unit 907, for being combined by N-M audio-frequency information after M audio-frequency information and process, forms described second audio frequency;

Then pass through assembled unit 907 by M audio-frequency information and N-M audio-frequency information after treatment, combine according to the order corresponding with Word message, form the second audio frequency.

Broadcast unit 908, for playing described second audio-frequency information;

And/or

Storage unit 909, for preserving the second audio-frequency information.

In sum, the present embodiment is on the basis of above-described embodiment six, by the corresponding Word message be converted to after speech recognition is split according to word or phrase, and in preset speech data resource, find out the audio-frequency information that a part of word or phrase match, this part audio-frequency information found out is analyzed simultaneously, obtain processing rule, the word do not found or phrase are processed according to corresponding processing rule, obtain corresponding audio-frequency information, finally the audio-frequency information obtained is carried out being spliced to form the second audio-frequency information, then the second audio-frequency information is preserved or play, make it possible to, according to the difference hobby of user, initial speech information is converted to respective audio information, achieve the diversification conversion of initial speech, improve the experience of user.

As shown in Figure 10, a kind of structural representation of speech processing system disclosed in the embodiment of the present invention ten, in the present embodiment, described system can comprise:

Voice collecting unit 1001, for gathering initial speech information, forms the first audio-frequency information;

When needs carry out diversification process to voice, first gather initial speech information by voice collecting unit 1001, wherein, described voice collecting unit 1001 can such as, for getting the device of voice messaging, phonographic recorder etc.Described initial speech refers to that user wants to carry out the voice of diversification process, and initial speech can be the sound that user oneself sends, and also can be other sound in environment.By initial speech information structure first audio-frequency information collected.

Voice recognition unit 1002, for carrying out speech recognition to the first audio-frequency information, is converted to corresponding Word message;

By voice recognition unit 1002, by speech recognition technology, speech recognition is carried out to the first audio-frequency information formed, the first audio-frequency information is converted to corresponding Word message.

Determining unit 1003, for determining the attribute of speech conversion;

After the first audio-frequency information got is converted to corresponding Word message, according to user, voice is carried out to the demand of diversification process, determined the attribute of speech conversion by determining unit 1003.That is, determine that user needs the voice of speech conversion why type.

Extraction unit 1004, for extracting the audio-frequency information corresponding with described Word message with the attribute of speech conversion according to described Word message from preset speech data resource;

The Word message of the first audio-frequency information obtained according to speech recognition conversion and according to user, voice are carried out to the attribute of the speech conversion that diversification processing demands is determined, extracts the audio-frequency information corresponding with Word message by extraction unit 1001 in the speech data resource pre-set.

Acquiring unit 1005, for obtaining the 3rd audio-frequency information with the extracted described audio-frequency information corresponding with described Word message with incidence relation;

Construction unit 1006, for building obtain described second audio-frequency information based on described 3rd audio-frequency information, the described audio-frequency information corresponding with described Word message;

Broadcast unit 1007, for playing described second audio-frequency information;

And/or

Storage unit 1008, for preserving the second audio-frequency information.

If the function described in the present embodiment method using the form of SFU software functional unit realize and as independently production marketing or use time, can be stored in a computing equipment read/write memory medium.Based on such understanding, the part of the part that the embodiment of the present invention contributes to prior art or this technical scheme can embody with the form of software product, this software product is stored in a storage medium, comprising some instructions in order to make a computing equipment (can be personal computer, server, mobile computing device or the network equipment etc.) perform all or part of step of method described in each embodiment of the present invention.And aforesaid storage medium comprises: USB flash disk, portable hard drive, ROM (read-only memory) (ROM, Read-OnlyMemory), random access memory (RAM, RandomAccessMemory), magnetic disc or CD etc. various can be program code stored medium.

In this instructions, each embodiment adopts the mode of going forward one by one to describe, and what each embodiment stressed is the difference with other embodiment, between each embodiment same or similar part mutually see.

To the above-mentioned explanation of the disclosed embodiments, professional and technical personnel in the field are realized or uses the present invention.To be apparent for those skilled in the art to the multiple amendment of these embodiments, General Principle as defined herein can without departing from the spirit or scope of the present invention, realize in other embodiments.Therefore, the present invention can not be restricted to these embodiments shown in this article, but will meet the widest scope consistent with principle disclosed herein and features of novelty.

Claims

1. a method of speech processing, is characterized in that, described method comprises:

Determine the attribute of speech conversion;

Preserve or play described second audio-frequency information.

2. method according to claim 1, is characterized in that, describedly from preset speech data resource, extracts the audio-frequency information corresponding with described Word message with the attribute of speech conversion according to described Word message, forms the second audio-frequency information, comprising:

3. method according to claim 1, is characterized in that, describedly from preset speech data resource, extracts the audio-frequency information corresponding with described Word message with the attribute of speech conversion according to described Word message, forms the second audio-frequency information, comprising:

Read the attribute of preset speech data resource;

4. method according to claim 1 and 2, is characterized in that, describedly from preset speech data resource, extracts the audio-frequency information corresponding with described Word message with the attribute of speech conversion according to described Word message, forms the second audio-frequency information, comprising:

5. method according to claim 1, is characterized in that, describedly from preset speech data resource, extracts the audio-frequency information corresponding with described Word message with the attribute of speech conversion according to described Word message, forms the second audio-frequency information, also comprises:

6. method according to claim 1, is characterized in that, described preset speech data resource comprises:

Voice data resource;

And/or

Include the video data resource of voice data resource.

7. method according to claim 6, is characterized in that, comprising of described preset speech data resource:

8. a speech processing system, is characterized in that, comprising:

Determining unit, for determining the attribute of speech conversion;

Broadcast unit, for playing described second audio-frequency information;

And/or

Storage unit, for preserving described second audio-frequency information.

9. system according to claim 8, is characterized in that, described extraction unit comprises:

10. system according to claim 8, is characterized in that, described extraction unit comprises:

Reading unit, for reading the attribute of preset speech data resource;

11. systems according to claim 8 or claim 9, it is characterized in that, described extraction unit comprises:

12. systems according to claim 8, is characterized in that, also comprise:

13. systems according to claim 8, is characterized in that, described preset speech data resource comprises: voice data resource;

And/or

Include the video data resource of voice data resource.

14. systems according to claim 13, is characterized in that, comprising of described preset speech data resource: