CN105336329A - Speech processing method and system - Google Patents

Speech processing method and system Download PDF

Info

Publication number
CN105336329A
CN105336329A CN201510622499.4A CN201510622499A CN105336329A CN 105336329 A CN105336329 A CN 105336329A CN 201510622499 A CN201510622499 A CN 201510622499A CN 105336329 A CN105336329 A CN 105336329A
Authority
CN
China
Prior art keywords
audio
frequency information
data resource
speech
word message
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510622499.4A
Other languages
Chinese (zh)
Other versions
CN105336329B (en
Inventor
王少敏
雷闪耀
王洪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN201510622499.4A priority Critical patent/CN105336329B/en
Publication of CN105336329A publication Critical patent/CN105336329A/en
Application granted granted Critical
Publication of CN105336329B publication Critical patent/CN105336329B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a speech processing method, comprising: collecting initial speech information by a voice collector to form first audio information; subjecting the first audio information to speech recognition and conversion to obtain corresponding text information; determining the attribute of speech conversion; extracting audio information corresponding to the text information from preset audio data resources according to the attribute of the text information and speech conversion, thus forming second audio information; storing or playing the second audio information. The speech of a user can be multiply processed, and the user's experience on voice changer software can be improved. The invention also discloses a speech processing system.

Description

A kind of method of speech processing and system
Technical field
The present invention relates to equipment voice processing technology field, particularly relate to a kind of method of speech processing and system.
Background technology
Along with the progress of electronic technology, the application software be applied in electronic equipment is also more and more abundanter.Such as, various change of voice softwares popular at present, certain specific voice that are converted to of the sound of collection can be play, as " tom cat ", by the collection audio frequency that voice collection device is real-time, then the audio frequency collected is carried out transposition, the simple audio frequency process of speed change, generate voice in fixed form and play.But existing change of voice software, the form of the change of voice is more single, the sound of user can not be carried out the conversion of diversification, makes the experience of user to change of voice software poor.
Summary of the invention
The invention provides a kind of method of speech processing, can realize carrying out diversification process to the voice of user, improve user to the experience of change of voice software.
The invention provides a kind of method of speech processing, described method comprises:
Utilize voice collection device to gather initial speech information, form the first audio-frequency information;
Speech recognition is carried out to described first audio-frequency information, is converted to corresponding Word message;
Determine the attribute of speech conversion;
From preset speech data resource, extract the audio-frequency information corresponding with described Word message with the attribute of speech conversion according to described Word message, form the second audio-frequency information;
Preserve or play described second audio-frequency information.
Preferably, describedly from preset speech data resource, extract the audio-frequency information corresponding with described Word message with the attribute of speech conversion according to described Word message, form the second audio-frequency information, comprising:
According to word or phrase, N number of part is divided into described Word message;
In preset speech data resource, search the audio-frequency information fragment corresponding with part every in described N number of part respectively, obtain N number of audio-frequency information;
Splice described N number of audio-frequency information, obtain described second audio-frequency information.
Preferably, describedly from preset speech data resource, extract the audio-frequency information corresponding with described Word message with the attribute of speech conversion according to described Word message, form the second audio-frequency information, comprising:
Read the attribute of preset speech data resource;
Search the attribute of the preset speech data resource of the attribute meeting described speech conversion;
Extract audio-frequency information corresponding with described Word message in the described preset speech data resource found, form the second audio-frequency information.
Preferably, describedly from preset speech data resource, extract the audio-frequency information corresponding with described Word message with the attribute of speech conversion according to described Word message, form the second audio-frequency information, comprising:
In preset speech data resource, find out the audio-frequency information that M part in the N number of part be divided into according to word or phrase described Word message matches;
Analyze the audio-frequency information of described M part, obtain audio frequency processing rule; According to described audio frequency processing rule, audio frequency process is carried out to remaining N-M part, obtain N-M audio-frequency information after processing;
N-M audio-frequency information after a described M audio-frequency information and process is combined, forms described second audio frequency.
Preferably, describedly from preset speech data resource, extract the audio-frequency information corresponding with described Word message with the attribute of speech conversion according to described Word message, form the second audio-frequency information, also comprise:
Obtain the 3rd audio-frequency information with the extracted described audio-frequency information corresponding with described Word message with incidence relation; Build obtain the second audio-frequency information based on described 3rd audio-frequency information, the described audio-frequency information corresponding with described Word message.
Preferably, described preset speech data resource comprises:
Voice data resource;
And/or
Include the video data resource of voice data resource.
Preferably, the comprising of described preset speech data resource:
From described video data resource, extract audio-frequency information, obtain the voice data resource comprising described audio-frequency information.
A kind of speech processing system, comprising:
Voice collecting unit, for gathering initial speech information, forms the first audio-frequency information;
Voice recognition unit, for carrying out speech recognition to described first audio-frequency information, is converted to corresponding Word message;
Determining unit, for determining the attribute of speech conversion;
Extraction unit, for extracting the audio-frequency information corresponding with described Word message with the attribute of speech conversion according to described Word message from preset speech data resource, forms the second audio-frequency information;
Broadcast unit, for playing described second audio-frequency information;
And/or
Storage unit, for preserving described second audio-frequency information.
Preferably, described extraction unit comprises:
Division unit, for being divided into N number of part to described Word message according to word or phrase;
First searches unit, in preset speech data resource, searches the audio-frequency information fragment corresponding with part every in described N number of part respectively, obtains N number of audio-frequency information;
Concatenation unit, for splicing described N number of audio-frequency information, obtains described second audio-frequency information.
Preferably, described extraction unit comprises:
Reading unit, for reading the attribute of preset speech data resource;
Second searches unit, for searching the attribute of the preset speech data resource of the attribute meeting described speech conversion;
Component units, for extracting audio-frequency information corresponding with described Word message in the described preset speech data resource that finds, forms the second audio-frequency information.
Preferably, described extraction unit comprises:
3rd searches unit, in preset speech data resource, finds out the audio-frequency information that M part in the N number of part be divided into according to word or phrase described Word message matches;
Analytic unit, for analyzing the audio-frequency information of described M part, obtains audio frequency processing rule;
Processing unit, for according to described audio frequency processing rule, carries out audio frequency process to remaining N-M part, obtains N-M audio-frequency information after processing;
Assembled unit, for being combined by N-M audio-frequency information after a described M audio-frequency information and process, forms described second audio frequency.
Preferably, described system also comprises:
Acquiring unit, for obtaining the 3rd audio-frequency information with the extracted described audio-frequency information corresponding with described Word message with incidence relation;
Construction unit, for building obtain the second audio-frequency information based on described 3rd audio-frequency information, the described audio-frequency information corresponding with described Word message.
Preferably, described preset speech data resource comprises: voice data resource;
And/or
Include the video data resource of voice data resource.
Preferably, the comprising of described preset speech data resource:
From described video data resource, extract audio-frequency information, obtain the voice data resource comprising described audio-frequency information.
From such scheme, a kind of method of speech processing provided by the invention, by gathering initial speech information, and the initial speech information collected is carried out speech recognition conversion is corresponding Word message, from preset speech data resource, corresponding audio-frequency information is extracted according to Word message and the attribute of the speech conversion determined, and preserve or play the audio-frequency information extracted, make it possible to, according to the difference hobby of user, initial speech information is converted to respective audio information, achieve the diversification conversion of initial speech, improve the experience of user.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
The process flow diagram of Fig. 1 a kind of method of speech processing disclosed in the embodiment of the present invention one;
The process flow diagram of Fig. 2 a kind of method of speech processing disclosed in the embodiment of the present invention two;
The process flow diagram of Fig. 3 a kind of method of speech processing disclosed in the embodiment of the present invention three;
The process flow diagram of Fig. 4 a kind of method of speech processing disclosed in the embodiment of the present invention four;
The process flow diagram of Fig. 5 a kind of method of speech processing disclosed in the embodiment of the present invention five;
The structural representation of Fig. 6 a kind of speech processing system disclosed in the embodiment of the present invention six;
The structural representation of Fig. 7 a kind of speech processing system disclosed in the embodiment of the present invention seven;
The structural representation of Fig. 8 a kind of speech processing system disclosed in the embodiment of the present invention eight;
The structural representation of Fig. 9 a kind of speech processing system disclosed in the embodiment of the present invention nine;
The structural representation of Figure 10 a kind of speech processing system disclosed in the embodiment of the present invention ten.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
As shown in Figure 1, a kind of process flow diagram of method of speech processing disclosed in the embodiment of the present invention one, in the present embodiment, described method can comprise the following steps:
Step 101, utilize voice collection device to gather initial speech information, form the first audio-frequency information;
When needs carry out diversification process to voice, first gather initial speech information by voice collection device, wherein, described voice collection device can such as, for getting the device of voice messaging, phonographic recorder etc.Described initial speech refers to that user wants to carry out the voice of diversification process, and initial speech can be the sound that user oneself sends, and also can be other sound in environment.By initial speech information structure first audio-frequency information collected.
Step 102, speech recognition is carried out to the first audio-frequency information, be converted to corresponding Word message;
By speech recognition technology, speech recognition is carried out to the first audio-frequency information formed, the first audio-frequency information is converted to corresponding Word message.
Step 103, determine the attribute of speech conversion;
After the first audio-frequency information got is converted to corresponding Word message, according to user, voice is carried out to the demand of diversification process, determine the attribute of speech conversion.That is, determine that user needs the voice of speech conversion why type.
Step 104, from preset speech data resource, extract the audio-frequency information corresponding with described Word message with the attribute of speech conversion according to described Word message, form the second audio-frequency information;
The Word message of the first audio-frequency information obtained according to speech recognition conversion and according to user, voice are carried out to the attribute of the speech conversion that diversification processing demands is determined, in the speech data resource pre-set, extract the audio-frequency information corresponding with Word message, form the second audio-frequency information.
Step 105, preserve or play described second audio-frequency information.
The second audio-frequency information extracted can carry out preserving or playing by user.
In sum, in the above-described embodiments, by gathering initial speech information, and the initial speech information collected is carried out speech recognition conversion is corresponding Word message, from preset speech data resource, corresponding audio-frequency information is extracted according to Word message and the attribute of the speech conversion determined, and preserve or play the audio-frequency information extracted, make it possible to, according to the difference hobby of user, initial speech information is converted to respective audio information, achieve the diversification conversion of initial speech, improve the experience of user.
As shown in Figure 2, a kind of process flow diagram of method of speech processing disclosed in the embodiment of the present invention two, in the present embodiment, described method can comprise following steps:
Step 201, utilize voice collection device to gather initial speech information, form the first audio-frequency information;
When needs carry out diversification process to voice, first gather initial speech information by voice collection device, wherein, described voice collection device can such as, for getting the device of voice messaging, phonographic recorder etc.Described initial speech refers to that user wants to carry out the voice of diversification process, and initial speech can be the sound that user oneself sends, and also can be other sound in environment.By initial speech information structure first audio-frequency information collected.
Step 202, speech recognition is carried out to the first audio-frequency information, be converted to corresponding Word message;
By speech recognition technology, speech recognition is carried out to the first audio-frequency information formed, the first audio-frequency information is converted to corresponding Word message.
Step 203, determine the attribute of speech conversion;
After the first audio-frequency information got is converted to corresponding Word message, according to user, voice is carried out to the demand of diversification process, determine the attribute of speech conversion.That is, determine that user needs the voice of speech conversion why type.
Step 204, according to word or phrase, N number of part is divided into Word message;
By the corresponding Word message be converted to after speech recognition, split according to word or phrase, obtain word or the phrase of N number of part.
Step 205, in preset speech data resource, search respectively and audio-frequency information fragment that in described N number of part, every part is corresponding, obtain N number of audio-frequency information;
Then, in the speech data resource pre-set, respectively the word of N number of part or the audio-frequency information fragment of phrase are searched, namely find the audio-frequency information fragment corresponding with word or phrase.
Step 206, splice N number of audio-frequency information, obtain the second audio-frequency information;
Then the word of the N number of part found or the audio-frequency information fragment of phrase are spliced according to the order of Word message, form the second audio-frequency information corresponding with Word message.
Step 207, preservation or broadcasting the second audio-frequency information.
The second audio-frequency information extracted can carry out preserving or playing by user.
In sum, the Word message of the present embodiment by speech recognition conversion is obtained on the basis of above-described embodiment one, split according to the mode of word or phrase, and then find in preset speech data resource with split after the word that obtains or the corresponding audio-frequency information fragment of phrase, then audio-frequency information fragment is spliced, obtain the second audio-frequency information, then the second audio-frequency information is preserved or play, make it possible to, according to the difference hobby of user, initial speech information is converted to respective audio information, achieve the diversification conversion of initial speech, improve the experience of user.
As shown in Figure 3, a kind of process flow diagram of method of speech processing disclosed in the embodiment of the present invention three, in the present embodiment, described method can comprise following steps:
Step 301, utilize voice collection device to gather initial speech information, form the first audio-frequency information;
When needs carry out diversification process to voice, first gather initial speech information by voice collection device, wherein, described voice collection device can such as, for getting the device of voice messaging, phonographic recorder etc.Described initial speech refers to that user wants to carry out the voice of diversification process, and initial speech can be the sound that user oneself sends, and also can be other sound in environment.By initial speech information structure first audio-frequency information collected.
Step 302, speech recognition is carried out to the first audio-frequency information, be converted to corresponding Word message;
By speech recognition technology, speech recognition is carried out to the first audio-frequency information formed, the first audio-frequency information is converted to corresponding Word message.
Step 303, determine the attribute of speech conversion;
After the first audio-frequency information got is converted to corresponding Word message, according to user, voice is carried out to the demand of diversification process, determine the attribute of speech conversion.That is, determine that user needs the voice of speech conversion why type.
Step 304, read the attribute of preset speech data resource;
Read the attribute of preset speech data resource, wherein, the attribute of preset speech data resource is given in advance.Such as, when building preset speech data resource, to the corresponding label of speech data resource distribution, wherein, this label includes the brief description to speech data resource; Or arrange filename to speech data resource, described filename can indicate the kind of this speech data resource, or the name etc. of historical romance personnel.Just can obtain the attribute of preset speech data resource by reading preset label or filename, wherein, preset label or filename can generate automatically according to speech data resource, also can be that user is manually for this speech data resource is added.
Step 305, search the attribute of the preset speech data resource of the attribute meeting described speech conversion;
After the attribute of preset speech data resource is read, find in preset speech data resource and meet user carries out the speech conversion that diversification process is determined attribute to voice.Such as, user wants to be the sound in little Shenyang by speech conversion, then in preset speech data resource, find the speech data resource that attribute is little Shenyang.
Audio-frequency information corresponding with described Word message in the preset speech data resource that step 306, extraction find, forms the second audio-frequency information;
The audio-frequency information corresponding with Word message is extracted in the speech data resource of searching.
Step 307, preservation or broadcasting the second audio-frequency information.
The second audio-frequency information extracted can carry out preserving or playing by user.
In sum, the present embodiment passes through the attribute reading preset speech data resource on the basis of embodiment one, and then in preset speech data resource, find the attribute of speech data resource of the attribute meeting speech conversion, and the audio-frequency information corresponding with Word message is extracted in the attribute of the speech data resource found, form the second audio-frequency information, then the second audio-frequency information is preserved or play, make it possible to, according to the difference hobby of user, initial speech information is converted to respective audio information, achieve the diversification conversion of initial speech, improve the experience of user.
As shown in Figure 4, a kind of process flow diagram of method of speech processing disclosed in the embodiment of the present invention four, in the present embodiment, described method can comprise following steps:
Step 401, utilize voice collection device to gather initial speech information, form the first audio-frequency information;
When needs carry out diversification process to voice, first gather initial speech information by voice collection device, wherein, described voice collection device can such as, for getting the device of voice messaging, phonographic recorder etc.Described initial speech refers to that user wants to carry out the voice of diversification process, and initial speech can be the sound that user oneself sends, and also can be other sound in environment.By initial speech information structure first audio-frequency information collected.
Step 402, speech recognition is carried out to the first audio-frequency information, be converted to corresponding Word message;
By speech recognition technology, speech recognition is carried out to the first audio-frequency information formed, the first audio-frequency information is converted to corresponding Word message.
Step 403, determine the attribute of speech conversion;
After the first audio-frequency information got is converted to corresponding Word message, according to user, voice is carried out to the demand of diversification process, determine the attribute of speech conversion.That is, determine that user needs the voice of speech conversion why type.
Step 404, in preset speech data resource, find out the audio-frequency information that M part in the N number of part be divided into according to word or phrase Word message matches;
By the corresponding Word message be converted to after speech recognition, split according to word or phrase, obtain word or the phrase of N number of part.Then in preset speech data resource, find out the audio-frequency information that M part matches, namely split word or the phrase of the N number of part obtained, wherein have the word of M part or phrase can find out the audio-frequency information matched in preset speech data resource.Such as, the corresponding Word message be converted to after speech recognition is " I sees that sideslip worn by aunt's skirt ", and " sideslip " can find out the audio-frequency information matched in preset speech data resource.
The audio-frequency information of step 405, analysis M part, obtains audio frequency processing rule;
The audio-frequency information of M the part found is analyzed, such as, carries out audio sound line law-analysing etc., obtain the audio frequency processing rule of the audio-frequency information of this M part.
Step 406, according to audio frequency processing rule, audio frequency process is carried out to remaining N-M part, obtains N-M audio-frequency information after process;
N-M part of corresponding audio-frequency information will can not be found in preset speech data resource, process accordingly according to the audio frequency processing rule of the audio-frequency information of M part, obtain N-M the audio-frequency information with M part with identical audio frequency processing rule.Such as, " I sees that aunt's skirt is worn " is split the word or phrase that obtain, the audio frequency processing rule of word or the audio-frequency information of phrase in preset speech data resource obtained is split according to " sideslip ", carry out corresponding audio frequency process, make the audio-frequency information of " I sees that aunt's skirt is worn " after processing and the audio-frequency information of " sideslip " have identical audio frequency processing rule.
Step 407, by M audio-frequency information with process after N-M audio-frequency information combine, form described second audio frequency;
Then by M audio-frequency information and N-M audio-frequency information after treatment, combine according to the order corresponding with Word message, form the second audio frequency.
Step 408, preservation or broadcasting the second audio-frequency information.
The second audio-frequency information extracted can carry out preserving or playing by user.In sum, the present embodiment is on the basis of above-described embodiment one, by the corresponding Word message be converted to after speech recognition is split according to word or phrase, and in preset speech data resource, find out the audio-frequency information that a part of word or phrase match, this part audio-frequency information found out is analyzed simultaneously, obtain processing rule, the word do not found or phrase are processed according to corresponding processing rule, obtain corresponding audio-frequency information, finally the audio-frequency information obtained is carried out being spliced to form the second audio-frequency information, then the second audio-frequency information is preserved or play, make it possible to, according to the difference hobby of user, initial speech information is converted to respective audio information, achieve the diversification conversion of initial speech, improve the experience of user.
As described in Figure 5, a kind of process flow diagram of method of speech processing disclosed in the embodiment of the present invention five, in the present embodiment, described method can comprise following steps:
Step 501, utilize voice collection device to gather initial speech information, form the first audio-frequency information;
When needs carry out diversification process to voice, first gather initial speech information by voice collection device, wherein, described voice collection device can such as, for getting the device of voice messaging, phonographic recorder etc.Described initial speech refers to that user wants to carry out the voice of diversification process, and initial speech can be the sound that user oneself sends, and also can be other sound in environment.By initial speech information structure first audio-frequency information collected.
Step 502, speech recognition is carried out to the first audio-frequency information, be converted to corresponding Word message;
By speech recognition technology, speech recognition is carried out to the first audio-frequency information formed, the first audio-frequency information is converted to corresponding Word message.
Step 503, determine the attribute of speech conversion;
After the first audio-frequency information got is converted to corresponding Word message, according to user, voice is carried out to the demand of diversification process, determine the attribute of speech conversion.That is, determine that user needs the voice of speech conversion why type.
Step 504, from preset speech data resource, extract the audio-frequency information corresponding with described Word message with the attribute of speech conversion according to described Word message;
The Word message of the first audio-frequency information obtained according to speech recognition conversion and according to user, voice are carried out to the attribute of the speech conversion that diversification processing demands is determined, extracts the audio-frequency information corresponding with Word message in the speech data resource pre-set.
Step 505, obtain, with the extracted described audio-frequency information corresponding with described Word message, there is the 3rd audio-frequency information of incidence relation;
Tone color identification is carried out to the extracted described audio-frequency information corresponding with described Word message, determines the orator of this audio-frequency information, from voice data resource or video data resource, obtain the distinctive audio-frequency information of this orator as the 3rd audio-frequency information.That is, the 3rd audio-frequency information got is same orator with the orator of the described audio-frequency information corresponding with described word of extraction.Or the extracted common broadcasting scene of the described audio-frequency information corresponding with described Word message is analyzed, using the background sound effect of the broadcasting scene of this audio-frequency information as the 3rd audio-frequency information.
Step 506, build obtain the second audio-frequency information based on described 3rd audio-frequency information, the described audio-frequency information corresponding with described Word message;
Step 507, preservation or broadcasting the second audio-frequency information.
The second audio-frequency information extracted can carry out preserving or playing by user.
Concrete, in the above-described embodiment, described preset speech data resource comprises voice data resource and/or includes the video data resource of voice data resource.Wherein, audio-frequency information can for extract the audio-frequency information obtained from video data resource.
As shown in Figure 6, a kind of structural representation of speech processing system disclosed in the embodiment of the present invention six, in the present embodiment, described system can comprise:
Voice collecting unit 601, for gathering initial speech information, forms the first audio-frequency information;
When needs carry out diversification process to voice, first gather initial speech information by voice collecting unit 601, wherein, described voice collecting unit 601 can such as, for getting the device of voice messaging, phonographic recorder etc.Described initial speech refers to that user wants to carry out the voice of diversification process, and initial speech can be the sound that user oneself sends, and also can be other sound in environment.By initial speech information structure first audio-frequency information collected.
Voice recognition unit 602, for carrying out speech recognition to the first audio-frequency information, is converted to corresponding Word message;
By voice recognition unit 602, by speech recognition technology, speech recognition is carried out to the first audio-frequency information formed, the first audio-frequency information is converted to corresponding Word message.
Determining unit 603, for determining the attribute of speech conversion;
After the first audio-frequency information got is converted to corresponding Word message, according to user, voice is carried out to the demand of diversification process, determined the attribute of speech conversion by determining unit 603.That is, determine that user needs the voice of speech conversion why type.
Extraction unit 604, for extracting the audio-frequency information corresponding with described Word message with the attribute of speech conversion according to described Word message from preset speech data resource, forms the second audio-frequency information;
The Word message of the first audio-frequency information obtained according to speech recognition conversion and according to user, voice are carried out to the attribute of the speech conversion that diversification processing demands is determined, in the speech data resource pre-set, extract the audio-frequency information corresponding with Word message by extraction unit 604, form the second audio-frequency information.
Broadcast unit 605, for playing described second audio-frequency information;
And/or
Storage unit 606, for preserving the second audio-frequency information.
The second audio-frequency information extracted can carry out preserving or playing by user.
In sum, in the above-described embodiments, by gathering initial speech information, and the initial speech information collected is carried out speech recognition conversion is corresponding Word message, from preset speech data resource, corresponding audio-frequency information is extracted according to Word message and the attribute of the speech conversion determined, and preserve or play the audio-frequency information extracted, make it possible to, according to the difference hobby of user, initial speech information is converted to respective audio information, achieve the diversification conversion of initial speech, improve the experience of user.
As shown in Figure 7, a kind of structural representation of speech processing system disclosed in the embodiment of the present invention seven, in the present embodiment, described system can comprise:
Voice collecting unit 701, for gathering initial speech information, forms the first audio-frequency information;
When needs carry out diversification process to voice, first gather initial speech information by voice collecting unit 701, wherein, described voice collecting unit 701 can such as, for getting the device of voice messaging, phonographic recorder etc.Described initial speech refers to that user wants to carry out the voice of diversification process, and initial speech can be the sound that user oneself sends, and also can be other sound in environment.By initial speech information structure first audio-frequency information collected.
Voice recognition unit 702, for carrying out speech recognition to the first audio-frequency information, is converted to corresponding Word message;
By voice recognition unit 702, by speech recognition technology, speech recognition is carried out to the first audio-frequency information formed, the first audio-frequency information is converted to corresponding Word message.
Determining unit 703, for determining the attribute of speech conversion;
After the first audio-frequency information got is converted to corresponding Word message, according to user, voice is carried out to the demand of diversification process, determined the attribute of speech conversion by determining unit 703.That is, determine that user needs the voice of speech conversion why type.
Division unit 704, for being divided into N number of part to Word message according to word or phrase;
By the corresponding Word message be converted to after speech recognition, split according to word or phrase by division unit 704, obtain word or the phrase of N number of part.
First searches unit 705, in preset speech data resource, searches the audio-frequency information fragment corresponding with part every in described N number of part respectively, obtains N number of audio-frequency information;
Then search unit 705 in the speech data resource pre-set by first, respectively the word of N number of part or the audio-frequency information fragment of phrase are searched, namely find the audio-frequency information fragment corresponding with word or phrase.
Concatenation unit 706, for splicing N number of audio-frequency information, obtains the second audio-frequency information;
Then by concatenation unit 706, the word of the N number of part found or the audio-frequency information fragment of phrase are spliced according to the order of Word message, form the second audio-frequency information corresponding with Word message.
Broadcast unit 707, for playing described second audio-frequency information;
And/or
Storage unit 708, for preserving the second audio-frequency information.
The second audio-frequency information extracted can carry out preserving or playing by user.
In sum, the Word message of the present embodiment by speech recognition conversion is obtained on the basis of above-described embodiment six, split according to the mode of word or phrase, and then find in preset speech data resource with split after the word that obtains or the corresponding audio-frequency information fragment of phrase, then audio-frequency information fragment is spliced, obtain the second audio-frequency information, then the second audio-frequency information is preserved or play, make it possible to, according to the difference hobby of user, initial speech information is converted to respective audio information, achieve the diversification conversion of initial speech, improve the experience of user.
As shown in Figure 8, a kind of structural representation of speech processing system disclosed in the embodiment of the present invention eight, in the present embodiment, described system can comprise:
Voice collecting unit 801, for gathering initial speech information, forms the first audio-frequency information;
When needs carry out diversification process to voice, first gather initial speech information by voice collecting unit 801, wherein, described voice collecting unit 801 can such as, for getting the device of voice messaging, phonographic recorder etc.Described initial speech refers to that user wants to carry out the voice of diversification process, and initial speech can be the sound that user oneself sends, and also can be other sound in environment.By initial speech information structure first audio-frequency information collected.
Voice recognition unit 802, for carrying out speech recognition to the first audio-frequency information, is converted to corresponding Word message;
By voice recognition unit 802, by speech recognition technology, speech recognition is carried out to the first audio-frequency information formed, the first audio-frequency information is converted to corresponding Word message.
Determining unit 803, for determining the attribute of speech conversion;
After the first audio-frequency information got is converted to corresponding Word message, according to user, voice is carried out to the demand of diversification process, determined the attribute of speech conversion by determining unit 803.That is, determine that user needs the voice of speech conversion why type.
Reading unit 804, for reading the attribute of preset speech data resource;
Read wherein by the attribute of reading unit 804 to preset speech data resource, the attribute of preset speech data resource is given in advance.Such as, when building preset speech data resource, to the corresponding label of speech data resource distribution, wherein, this label includes the brief description to speech data resource; Or arrange filename to speech data resource, described filename can indicate the kind of this speech data resource, or the name etc. of historical romance personnel.Just can obtain the attribute of preset speech data resource by reading preset label or filename, wherein, preset label or filename can generate automatically according to speech data resource, also can be that user is manually for this speech data resource is added.
Second searches unit 805, for searching the attribute of the preset speech data resource of the attribute meeting described speech conversion;
After the attribute of preset speech data resource is read, search unit 805 by second and find in preset speech data resource and meet user to carry out the speech conversion that diversification process is determined attribute to voice.Such as, user wants to be the sound in little Shenyang by speech conversion, then in preset speech data resource, find the speech data resource that attribute is little Shenyang.
Component units 806, for extracting audio-frequency information corresponding with described Word message in the preset speech data resource that finds, forms the second audio-frequency information;
In the speech data resource of searching, extract the audio-frequency information corresponding with Word message by Component units 806, form the second audio-frequency information.
Broadcast unit 807, for playing described second audio-frequency information;
And/or
Storage unit 808, for preserving the second audio-frequency information.
The second audio-frequency information extracted can carry out preserving or playing by user.
In sum, the present embodiment passes through the attribute reading preset speech data resource on the basis of embodiment six, and then in preset speech data resource, find the attribute of speech data resource of the attribute meeting speech conversion, and the audio-frequency information corresponding with Word message is extracted in the attribute of the speech data resource found, form the second audio-frequency information, then the second audio-frequency information is preserved or play, make it possible to, according to the difference hobby of user, initial speech information is converted to respective audio information, achieve the diversification conversion of initial speech, improve the experience of user.
As shown in Figure 9, a kind of structural representation of speech processing system disclosed in the embodiment of the present invention nine, in the present embodiment, described system can comprise:
Voice collecting unit 901, for gathering initial speech information, forms the first audio-frequency information;
When needs carry out diversification process to voice, first gather initial speech information by voice collecting unit 901, wherein, described voice collecting unit 901 can such as, for getting the device of voice messaging, phonographic recorder etc.Described initial speech refers to that user wants to carry out the voice of diversification process, and initial speech can be the sound that user oneself sends, and also can be other sound in environment.By initial speech information structure first audio-frequency information collected.
Voice recognition unit 902, for carrying out speech recognition to the first audio-frequency information, is converted to corresponding Word message;
By voice recognition unit 902, by speech recognition technology, speech recognition is carried out to the first audio-frequency information formed, the first audio-frequency information is converted to corresponding Word message.
Determining unit 903, for determining the attribute of speech conversion;
After the first audio-frequency information got is converted to corresponding Word message, according to user, voice is carried out to the demand of diversification process, determined the attribute of speech conversion by determining unit 903.That is, determine that user needs the voice of speech conversion why type.
3rd searches unit 904, in preset speech data resource, finds out the audio-frequency information that M part in the N number of part be divided into according to word or phrase Word message matches;
By the corresponding Word message be converted to after speech recognition, split according to word or phrase, obtain word or the phrase of N number of part.Then search unit 904 by the 3rd in preset speech data resource, find out M the audio-frequency information partly matched, namely split word or the phrase of the N number of part obtained, wherein have the word of M part or phrase can find out the audio-frequency information matched in preset speech data resource.Such as, the corresponding Word message be converted to after speech recognition is " I sees that sideslip worn by aunt's skirt ", and " sideslip " can find out the audio-frequency information matched in preset speech data resource.
Analytic unit 905, for analyzing the audio-frequency information of M part, obtains audio frequency processing rule;
Analyzed by the audio-frequency information of analytic unit 905 to the M found part, such as, carry out audio sound line law-analysing etc., obtain the audio frequency processing rule of the audio-frequency information of this M part.
Processing unit 906, for according to audio frequency processing rule, carries out audio frequency process to remaining N-M part, obtains N-M audio-frequency information after processing;
N-M part of corresponding audio-frequency information will can not be found in preset speech data resource, processed accordingly by the audio frequency processing rule of processing unit 906 according to the audio-frequency information of M part, obtain N-M the audio-frequency information with M part with identical audio frequency processing rule.Such as, " I sees that aunt's skirt is worn " is split the word or phrase that obtain, the audio frequency processing rule of word or the audio-frequency information of phrase in preset speech data resource obtained is split according to " sideslip ", carry out corresponding audio frequency process, make the audio-frequency information of " I sees that aunt's skirt is worn " after processing and the audio-frequency information of " sideslip " have identical audio frequency processing rule.
Assembled unit 907, for being combined by N-M audio-frequency information after M audio-frequency information and process, forms described second audio frequency;
Then pass through assembled unit 907 by M audio-frequency information and N-M audio-frequency information after treatment, combine according to the order corresponding with Word message, form the second audio frequency.
Broadcast unit 908, for playing described second audio-frequency information;
And/or
Storage unit 909, for preserving the second audio-frequency information.
The second audio-frequency information extracted can carry out preserving or playing by user.
In sum, the present embodiment is on the basis of above-described embodiment six, by the corresponding Word message be converted to after speech recognition is split according to word or phrase, and in preset speech data resource, find out the audio-frequency information that a part of word or phrase match, this part audio-frequency information found out is analyzed simultaneously, obtain processing rule, the word do not found or phrase are processed according to corresponding processing rule, obtain corresponding audio-frequency information, finally the audio-frequency information obtained is carried out being spliced to form the second audio-frequency information, then the second audio-frequency information is preserved or play, make it possible to, according to the difference hobby of user, initial speech information is converted to respective audio information, achieve the diversification conversion of initial speech, improve the experience of user.
As shown in Figure 10, a kind of structural representation of speech processing system disclosed in the embodiment of the present invention ten, in the present embodiment, described system can comprise:
Voice collecting unit 1001, for gathering initial speech information, forms the first audio-frequency information;
When needs carry out diversification process to voice, first gather initial speech information by voice collecting unit 1001, wherein, described voice collecting unit 1001 can such as, for getting the device of voice messaging, phonographic recorder etc.Described initial speech refers to that user wants to carry out the voice of diversification process, and initial speech can be the sound that user oneself sends, and also can be other sound in environment.By initial speech information structure first audio-frequency information collected.
Voice recognition unit 1002, for carrying out speech recognition to the first audio-frequency information, is converted to corresponding Word message;
By voice recognition unit 1002, by speech recognition technology, speech recognition is carried out to the first audio-frequency information formed, the first audio-frequency information is converted to corresponding Word message.
Determining unit 1003, for determining the attribute of speech conversion;
After the first audio-frequency information got is converted to corresponding Word message, according to user, voice is carried out to the demand of diversification process, determined the attribute of speech conversion by determining unit 1003.That is, determine that user needs the voice of speech conversion why type.
Extraction unit 1004, for extracting the audio-frequency information corresponding with described Word message with the attribute of speech conversion according to described Word message from preset speech data resource;
The Word message of the first audio-frequency information obtained according to speech recognition conversion and according to user, voice are carried out to the attribute of the speech conversion that diversification processing demands is determined, extracts the audio-frequency information corresponding with Word message by extraction unit 1001 in the speech data resource pre-set.
Acquiring unit 1005, for obtaining the 3rd audio-frequency information with the extracted described audio-frequency information corresponding with described Word message with incidence relation;
Tone color identification is carried out to the extracted described audio-frequency information corresponding with described Word message, determines the orator of this audio-frequency information, from voice data resource or video data resource, obtain the distinctive audio-frequency information of this orator as the 3rd audio-frequency information.That is, the 3rd audio-frequency information got is same orator with the orator of the described audio-frequency information corresponding with described word of extraction.Or the extracted common broadcasting scene of the described audio-frequency information corresponding with described Word message is analyzed, using the background sound effect of the broadcasting scene of this audio-frequency information as the 3rd audio-frequency information.
Construction unit 1006, for building obtain described second audio-frequency information based on described 3rd audio-frequency information, the described audio-frequency information corresponding with described Word message;
Broadcast unit 1007, for playing described second audio-frequency information;
And/or
Storage unit 1008, for preserving the second audio-frequency information.
The second audio-frequency information extracted can carry out preserving or playing by user.
Concrete, in the above-described embodiment, described preset speech data resource comprises voice data resource and/or includes the video data resource of voice data resource.Wherein, audio-frequency information can for extract the audio-frequency information obtained from video data resource.
If the function described in the present embodiment method using the form of SFU software functional unit realize and as independently production marketing or use time, can be stored in a computing equipment read/write memory medium.Based on such understanding, the part of the part that the embodiment of the present invention contributes to prior art or this technical scheme can embody with the form of software product, this software product is stored in a storage medium, comprising some instructions in order to make a computing equipment (can be personal computer, server, mobile computing device or the network equipment etc.) perform all or part of step of method described in each embodiment of the present invention.And aforesaid storage medium comprises: USB flash disk, portable hard drive, ROM (read-only memory) (ROM, Read-OnlyMemory), random access memory (RAM, RandomAccessMemory), magnetic disc or CD etc. various can be program code stored medium.
In this instructions, each embodiment adopts the mode of going forward one by one to describe, and what each embodiment stressed is the difference with other embodiment, between each embodiment same or similar part mutually see.
To the above-mentioned explanation of the disclosed embodiments, professional and technical personnel in the field are realized or uses the present invention.To be apparent for those skilled in the art to the multiple amendment of these embodiments, General Principle as defined herein can without departing from the spirit or scope of the present invention, realize in other embodiments.Therefore, the present invention can not be restricted to these embodiments shown in this article, but will meet the widest scope consistent with principle disclosed herein and features of novelty.

Claims (14)

1. a method of speech processing, is characterized in that, described method comprises:
Utilize voice collection device to gather initial speech information, form the first audio-frequency information;
Speech recognition is carried out to described first audio-frequency information, is converted to corresponding Word message;
Determine the attribute of speech conversion;
From preset speech data resource, extract the audio-frequency information corresponding with described Word message with the attribute of speech conversion according to described Word message, form the second audio-frequency information;
Preserve or play described second audio-frequency information.
2. method according to claim 1, is characterized in that, describedly from preset speech data resource, extracts the audio-frequency information corresponding with described Word message with the attribute of speech conversion according to described Word message, forms the second audio-frequency information, comprising:
According to word or phrase, N number of part is divided into described Word message;
In preset speech data resource, search the audio-frequency information fragment corresponding with part every in described N number of part respectively, obtain N number of audio-frequency information;
Splice described N number of audio-frequency information, obtain described second audio-frequency information.
3. method according to claim 1, is characterized in that, describedly from preset speech data resource, extracts the audio-frequency information corresponding with described Word message with the attribute of speech conversion according to described Word message, forms the second audio-frequency information, comprising:
Read the attribute of preset speech data resource;
Search the attribute of the preset speech data resource of the attribute meeting described speech conversion;
Extract audio-frequency information corresponding with described Word message in the described preset speech data resource found, form the second audio-frequency information.
4. method according to claim 1 and 2, is characterized in that, describedly from preset speech data resource, extracts the audio-frequency information corresponding with described Word message with the attribute of speech conversion according to described Word message, forms the second audio-frequency information, comprising:
In preset speech data resource, find out the audio-frequency information that M part in the N number of part be divided into according to word or phrase described Word message matches;
Analyze the audio-frequency information of described M part, obtain audio frequency processing rule; According to described audio frequency processing rule, audio frequency process is carried out to remaining N-M part, obtain N-M audio-frequency information after processing;
N-M audio-frequency information after a described M audio-frequency information and process is combined, forms described second audio frequency.
5. method according to claim 1, is characterized in that, describedly from preset speech data resource, extracts the audio-frequency information corresponding with described Word message with the attribute of speech conversion according to described Word message, forms the second audio-frequency information, also comprises:
Obtain the 3rd audio-frequency information with the extracted described audio-frequency information corresponding with described Word message with incidence relation; Build obtain the second audio-frequency information based on described 3rd audio-frequency information, the described audio-frequency information corresponding with described Word message.
6. method according to claim 1, is characterized in that, described preset speech data resource comprises:
Voice data resource;
And/or
Include the video data resource of voice data resource.
7. method according to claim 6, is characterized in that, comprising of described preset speech data resource:
From described video data resource, extract audio-frequency information, obtain the voice data resource comprising described audio-frequency information.
8. a speech processing system, is characterized in that, comprising:
Voice collecting unit, for gathering initial speech information, forms the first audio-frequency information;
Voice recognition unit, for carrying out speech recognition to described first audio-frequency information, is converted to corresponding Word message;
Determining unit, for determining the attribute of speech conversion;
Extraction unit, for extracting the audio-frequency information corresponding with described Word message with the attribute of speech conversion according to described Word message from preset speech data resource, forms the second audio-frequency information;
Broadcast unit, for playing described second audio-frequency information;
And/or
Storage unit, for preserving described second audio-frequency information.
9. system according to claim 8, is characterized in that, described extraction unit comprises:
Division unit, for being divided into N number of part to described Word message according to word or phrase;
First searches unit, in preset speech data resource, searches the audio-frequency information fragment corresponding with part every in described N number of part respectively, obtains N number of audio-frequency information;
Concatenation unit, for splicing described N number of audio-frequency information, obtains described second audio-frequency information.
10. system according to claim 8, is characterized in that, described extraction unit comprises:
Reading unit, for reading the attribute of preset speech data resource;
Second searches unit, for searching the attribute of the preset speech data resource of the attribute meeting described speech conversion;
Component units, for extracting audio-frequency information corresponding with described Word message in the described preset speech data resource that finds, forms the second audio-frequency information.
11. systems according to claim 8 or claim 9, it is characterized in that, described extraction unit comprises:
3rd searches unit, in preset speech data resource, finds out the audio-frequency information that M part in the N number of part be divided into according to word or phrase described Word message matches;
Analytic unit, for analyzing the audio-frequency information of described M part, obtains audio frequency processing rule;
Processing unit, for according to described audio frequency processing rule, carries out audio frequency process to remaining N-M part, obtains N-M audio-frequency information after processing;
Assembled unit, for being combined by N-M audio-frequency information after a described M audio-frequency information and process, forms described second audio frequency.
12. systems according to claim 8, is characterized in that, also comprise:
Acquiring unit, for obtaining the 3rd audio-frequency information with the extracted described audio-frequency information corresponding with described Word message with incidence relation;
Construction unit, for building obtain the second audio-frequency information based on described 3rd audio-frequency information, the described audio-frequency information corresponding with described Word message.
13. systems according to claim 8, is characterized in that, described preset speech data resource comprises: voice data resource;
And/or
Include the video data resource of voice data resource.
14. systems according to claim 13, is characterized in that, comprising of described preset speech data resource:
From described video data resource, extract audio-frequency information, obtain the voice data resource comprising described audio-frequency information.
CN201510622499.4A 2015-09-25 2015-09-25 Voice processing method and system Active CN105336329B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510622499.4A CN105336329B (en) 2015-09-25 2015-09-25 Voice processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510622499.4A CN105336329B (en) 2015-09-25 2015-09-25 Voice processing method and system

Publications (2)

Publication Number Publication Date
CN105336329A true CN105336329A (en) 2016-02-17
CN105336329B CN105336329B (en) 2021-07-16

Family

ID=55286816

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510622499.4A Active CN105336329B (en) 2015-09-25 2015-09-25 Voice processing method and system

Country Status (1)

Country Link
CN (1) CN105336329B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107086040A (en) * 2017-06-23 2017-08-22 歌尔股份有限公司 Speech recognition capabilities method of testing and device
CN107484016A (en) * 2017-09-05 2017-12-15 深圳Tcl新技术有限公司 Video dubs switching method, television set and computer-readable recording medium
WO2018023520A1 (en) * 2016-08-04 2018-02-08 易晓阳 Voice control-based music local playback device
WO2018023519A1 (en) * 2016-08-04 2018-02-08 易晓阳 Voice control method for music local playback
CN107786427A (en) * 2017-09-28 2018-03-09 努比亚技术有限公司 A kind of information interacting method, terminal and computer-readable recording medium
CN107818794A (en) * 2017-10-25 2018-03-20 北京奇虎科技有限公司 audio conversion method and device based on rhythm
CN108877767A (en) * 2018-06-12 2018-11-23 浙江吉利控股集团有限公司 A kind of intelligent voice prompt system and method
CN109215681A (en) * 2018-10-24 2019-01-15 北京疯景科技有限公司 Applied to the voice interactive method and system in intelligent doorbell
CN109257490A (en) * 2018-08-30 2019-01-22 Oppo广东移动通信有限公司 Audio-frequency processing method, device, wearable device and storage medium
CN111599341A (en) * 2020-05-13 2020-08-28 北京百度网讯科技有限公司 Method and apparatus for generating speech
CN113823281A (en) * 2020-11-24 2021-12-21 北京沃东天骏信息技术有限公司 Voice signal processing method, device, medium and electronic equipment

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6334104B1 (en) * 1998-09-04 2001-12-25 Nec Corporation Sound effects affixing system and sound effects affixing method
CN1737901A (en) * 2004-08-16 2006-02-22 华为技术有限公司 System for realizing voice service to syncretize background music and its method
CN101604521A (en) * 2008-06-12 2009-12-16 Lg电子株式会社 Portable terminal and the method that is used to discern its voice
CN101694772A (en) * 2009-10-21 2010-04-14 北京中星微电子有限公司 Method for converting text into rap music and device thereof
CN101896803A (en) * 2007-12-12 2010-11-24 诺基亚公司 Methods, apparatuses, and computer program products for semantic media conversion from source data to audio/video data
CN101923861A (en) * 2009-06-12 2010-12-22 傅可庭 Audio synthesizer capable of converting voices to songs
CN102402982A (en) * 2010-09-14 2012-04-04 盛乐信息技术(上海)有限公司 Loud reading system with selectable background sounds and realization method of system
CN103686050A (en) * 2012-09-18 2014-03-26 联想(北京)有限公司 Method and electronic equipment for simulating call scenes,
CN104008750A (en) * 2013-02-26 2014-08-27 霍尼韦尔国际公司 System and method for correcting accent induced speech transmission problems
CN104123932A (en) * 2014-07-29 2014-10-29 科大讯飞股份有限公司 Voice conversion system and method
US20150149178A1 (en) * 2013-11-22 2015-05-28 At&T Intellectual Property I, L.P. System and method for data-driven intonation generation
CN104866275A (en) * 2015-03-25 2015-08-26 百度在线网络技术(北京)有限公司 Image information acquisition method and device

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6334104B1 (en) * 1998-09-04 2001-12-25 Nec Corporation Sound effects affixing system and sound effects affixing method
CN1737901A (en) * 2004-08-16 2006-02-22 华为技术有限公司 System for realizing voice service to syncretize background music and its method
CN101896803A (en) * 2007-12-12 2010-11-24 诺基亚公司 Methods, apparatuses, and computer program products for semantic media conversion from source data to audio/video data
CN101604521A (en) * 2008-06-12 2009-12-16 Lg电子株式会社 Portable terminal and the method that is used to discern its voice
CN101923861A (en) * 2009-06-12 2010-12-22 傅可庭 Audio synthesizer capable of converting voices to songs
CN101694772A (en) * 2009-10-21 2010-04-14 北京中星微电子有限公司 Method for converting text into rap music and device thereof
CN102402982A (en) * 2010-09-14 2012-04-04 盛乐信息技术(上海)有限公司 Loud reading system with selectable background sounds and realization method of system
CN103686050A (en) * 2012-09-18 2014-03-26 联想(北京)有限公司 Method and electronic equipment for simulating call scenes,
CN104008750A (en) * 2013-02-26 2014-08-27 霍尼韦尔国际公司 System and method for correcting accent induced speech transmission problems
US20150149178A1 (en) * 2013-11-22 2015-05-28 At&T Intellectual Property I, L.P. System and method for data-driven intonation generation
CN104123932A (en) * 2014-07-29 2014-10-29 科大讯飞股份有限公司 Voice conversion system and method
CN104866275A (en) * 2015-03-25 2015-08-26 百度在线网络技术(北京)有限公司 Image information acquisition method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
雷学平: ""基于图像的实景虚拟校园***的设计与实现"", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018023520A1 (en) * 2016-08-04 2018-02-08 易晓阳 Voice control-based music local playback device
WO2018023519A1 (en) * 2016-08-04 2018-02-08 易晓阳 Voice control method for music local playback
CN107086040A (en) * 2017-06-23 2017-08-22 歌尔股份有限公司 Speech recognition capabilities method of testing and device
CN107484016A (en) * 2017-09-05 2017-12-15 深圳Tcl新技术有限公司 Video dubs switching method, television set and computer-readable recording medium
CN107786427B (en) * 2017-09-28 2021-07-16 努比亚技术有限公司 Information interaction method, terminal and computer readable storage medium
CN107786427A (en) * 2017-09-28 2018-03-09 努比亚技术有限公司 A kind of information interacting method, terminal and computer-readable recording medium
CN107818794A (en) * 2017-10-25 2018-03-20 北京奇虎科技有限公司 audio conversion method and device based on rhythm
CN108877767A (en) * 2018-06-12 2018-11-23 浙江吉利控股集团有限公司 A kind of intelligent voice prompt system and method
CN109257490A (en) * 2018-08-30 2019-01-22 Oppo广东移动通信有限公司 Audio-frequency processing method, device, wearable device and storage medium
CN109257490B (en) * 2018-08-30 2021-07-09 Oppo广东移动通信有限公司 Audio processing method and device, wearable device and storage medium
CN109215681A (en) * 2018-10-24 2019-01-15 北京疯景科技有限公司 Applied to the voice interactive method and system in intelligent doorbell
CN111599341A (en) * 2020-05-13 2020-08-28 北京百度网讯科技有限公司 Method and apparatus for generating speech
CN111599341B (en) * 2020-05-13 2023-06-20 北京百度网讯科技有限公司 Method and device for generating voice
CN113823281A (en) * 2020-11-24 2021-12-21 北京沃东天骏信息技术有限公司 Voice signal processing method, device, medium and electronic equipment
CN113823281B (en) * 2020-11-24 2024-04-05 北京沃东天骏信息技术有限公司 Voice signal processing method, device, medium and electronic equipment

Also Published As

Publication number Publication date
CN105336329B (en) 2021-07-16

Similar Documents

Publication Publication Date Title
CN105336329A (en) Speech processing method and system
CN106486128B (en) Method and device for processing double-sound-source audio data
CN102222103B (en) Method and device for processing matching relationship of video content
CN112015949A (en) Video generation method and device, storage medium and electronic equipment
CN108520046B (en) Method and device for searching chat records
CN108259971A (en) Subtitle adding method, device, server and storage medium
CN104168433B (en) A kind of media content processing method and system
CN107577773A (en) Audio matching method and device and electronic equipment
CN107133256A (en) The content consumption interface of customization
CN105975568B (en) Audio processing method and device
CN108182945A (en) Voiceprint feature-based multi-person voice separation method and device
CN109005451B (en) Video strip splitting method based on deep learning
CN108615532B (en) Classification method and device applied to sound scene
CN112053692B (en) Speech recognition processing method, device and storage medium
CN105788610A (en) Audio processing method and device
CN111770359B (en) Event video clipping method, system and computer readable storage medium
CN105845158A (en) Information processing method and client
CN112231748A (en) Desensitization processing method and apparatus, storage medium, and electronic apparatus
CN106550268B (en) Video processing method and video processing device
CN113782026A (en) Information processing method, device, medium and equipment
CN110019923A (en) The lookup method and device of speech message
CN106653029A (en) Audio batch segmentation method and device
CN105550308A (en) Information processing method, retrieval method and electronic device
CN111429200B (en) Content association method and device, storage medium and computer equipment
CN112543371A (en) Video data processing method, device, terminal and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant