CN106935240A

CN106935240A - Voice translation method, device, terminal device and cloud server based on artificial intelligence

Info

Publication number: CN106935240A
Application number: CN201710183965.2A
Authority: CN
Inventors: 周奇; 刁伟卓; 徐鸣
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2017-03-24
Filing date: 2017-03-24
Publication date: 2017-07-07

Abstract

The application proposes a kind of voice translation method based on artificial intelligence, device, terminal device and cloud server, and the above-mentioned voice translation method based on artificial intelligence includes：Receive the voice of the source languages that user is input into by terminal device；The voice of the source languages is sent to cloud server；Receive the audio file of the target language that the cloud server sends；Play the audio file of the target language.The application can realize the real time translation of voice, meet the translation demand of overseas trip scene, and it is higher to translate accuracy rate.

Description

Voice translation method, device, terminal device and cloud server based on artificial intelligence

Technical field

The application is related to voice processing technology field, more particularly to a kind of voice translation method based on artificial intelligence, dress Put, terminal device and cloud server.

Background technology

At present in overseas trip market, translation software is based substantially on mobile phone terminal, although the language that can solve some scenes is handed over Stream problem, map and/or takes pictures to wait and applies but because in usage scenario overseas, the language translation degree of accuracy is low, and during trip (Application；Hereinafter referred to as：APP the viscosity for) using is higher, APP is translated due to needing switching APP to call, in instantaneity It is defective in satisfaction.Meanwhile, increasing person in middle and old age go on a tour crowd, higher using educational costs for cell phone software, for " point-and-shoot " is there is tight demand by translator i.e..

But existing translation class hardware product, the substantially modification of electronic dictionary, mostly text query, voice are turned in real time The product translated is little, and accuracy rate is relatively low.In addition, existing translation class hardware product is the language learning demand that solves mostly, it is right Not high in the translation support of overseas trip scene, translation accuracy rate is relatively low.

The content of the invention

The purpose of the application is intended at least solve to a certain extent one of technical problem in correlation technique.

Therefore, first purpose of the application is to propose a kind of voice translation method based on artificial intelligence.The method The real time translation of voice can be realized, the translation demand of overseas trip scene is met, and it is higher to translate accuracy rate.

Second purpose of the application is to propose a kind of speech translation apparatus based on artificial intelligence.

3rd purpose of the application is to propose a kind of terminal device.

4th purpose of the application is to propose a kind of cloud server.

5th purpose of the application is to propose a kind of storage medium comprising computer executable instructions.

To achieve these goals, the voice translation method based on artificial intelligence of the application first aspect embodiment, bag Include：Receive the voice of the source languages that user is input into by terminal device；The voice of the source languages is sent to cloud server； The audio file of the target language that the cloud server sends is received, the audio file of the target language is the high in the clouds clothes Business device carries out speech recognition to the voice of the source languages, it is determined that being at least two target languages by the voiced translation of the source languages After at least one target language in kind in addition to the source languages, the text that speech recognition is obtained is translated into the mesh of determination The text of poster kind, and the text of the target language to translating into carries out what is obtained after phonetic synthesis；Play the target language The audio file planted.

In the voice translation method based on artificial intelligence of the embodiment of the present application, receive what user was input into by terminal device After the voice of source languages, the voice of above-mentioned source languages is sent to cloud server, then receive above-mentioned cloud server hair The audio file of the target language for sending, finally plays the audio file of above-mentioned target language, such that it is able to realize the real-time of voice Translation, meets the translation demand of overseas trip scene, and it is higher to translate accuracy rate.

To achieve these goals, the voice translation method based on artificial intelligence of the application second aspect embodiment, bag Include：The voice of the source languages that receiving terminal apparatus send；Voice to the source languages carries out speech recognition, by the source languages Voice be converted into the text of source languages；It is determined that the voiced translation of the source languages is described to be removed at least two target languages At least one target language outside the languages of source；The text of the source languages is translated into the text of the target language of determination, it is right The text of the target language translated into carries out phonetic synthesis, obtains the audio file of target language；By the sound of the target language Frequency file is sent to the terminal device, so that the terminal device is played.

In the voice translation method based on artificial intelligence of the embodiment of the present application, the source languages that receiving terminal apparatus send After voice, the voice to above-mentioned source languages carries out speech recognition, and the voice of above-mentioned source languages is converted into the text of source languages, It is determined that at least one mesh in being at least two target languages by the voiced translation of above-mentioned source languages in addition to above-mentioned source languages After poster kind, the text of above-mentioned source languages is translated into the text of the target language of determination, and the target language to translating into Text carry out phonetic synthesis, obtain the audio file of target language, finally the audio file of above-mentioned target language is sent to Above-mentioned terminal device, so that above-mentioned terminal device is played, such that it is able to realize the real time translation of voice, meets overseas trip scene Translation demand, and it is higher to translate accuracy rate.

To achieve these goals, the speech translation apparatus based on artificial intelligence of the application third aspect embodiment, if Put on the terminal device, the speech translation apparatus based on artificial intelligence include：Receiver module, for receiving user by eventually The voice of the source languages of end equipment input；Sending module, for the voice of the source languages to be sent to cloud server；It is described Receiver module, is additionally operable to receive the audio file of the target language that the cloud server sends, the audio of the target language File is that the cloud server carries out speech recognition to the voice of the source languages, it is determined that by the voiced translation of the source languages After at least one target language at least two target languages in addition to the source languages, the text that speech recognition is obtained Originally the text of the target language of determination is translated into, and the text of the target language to translating into carries out acquisition after phonetic synthesis 's；Playing module, the audio file for playing the target language.

In the speech translation apparatus based on artificial intelligence of the embodiment of the present application, receiver module is received user and is set by terminal After the voice of the source languages of standby input, sending module sends to cloud server the voice of above-mentioned source languages, then receives Module receives the audio file of the target language that above-mentioned cloud server sends, and last playing module plays above-mentioned target language Audio file, such that it is able to realize the real time translation of voice, meets the translation demand of overseas trip scene, and translate accuracy rate compared with It is high.

To achieve these goals, the speech translation apparatus based on artificial intelligence of the application fourth aspect embodiment, if Put on server beyond the clouds, the speech translation apparatus based on artificial intelligence include：Receiver module, for receiving terminal apparatus The voice of the source languages of transmission；Sound identification module, speech recognition is carried out for the voice to the source languages, by the source language The voice planted is converted into the text of source languages；Determining module, for determining that by the voiced translation of the source languages be at least two At least one target language in target language in addition to the source languages；Translation module, for by the text of the source languages Translate into the text of the target language that the determining module determines；Voice synthetic module, for being translated into the translation module The text of target language carry out phonetic synthesis, obtain the audio file of target language；Sending module, for the voice to be closed The audio file of the target language obtained into module is sent to the terminal device, so that the terminal device is played.

In the speech translation apparatus based on artificial intelligence of the embodiment of the present application, what receiver module receiving terminal apparatus sent After the voice of source languages, sound identification module carries out speech recognition to the voice of above-mentioned source languages, by the language of above-mentioned source languages Sound is converted into the text of source languages, in being at least two target languages by the voiced translation of above-mentioned source languages in determining module determination After at least one target language in addition to above-mentioned source languages, the text of above-mentioned source languages is translated into determination by translation module The text of target language, the text of target language of the voice synthetic module to translating into carries out phonetic synthesis, obtains target language Audio file, the audio file of above-mentioned target language is sent to above-mentioned terminal device by last sending module, for above-mentioned end End equipment is played, and such that it is able to realize the real time translation of voice, meets the translation demand of overseas trip scene, and translate accuracy rate It is higher.

To achieve these goals, the terminal device of the aspect embodiment of the application the 5th, including：One or more treatment Device；Memory, for storing one or more programs；Receiver, for receiving the source languages that user is input into by terminal device Voice；And send to cloud server the voice of the source languages in transmitter, receive the cloud server The audio file of the target language of transmission, the audio file of the target language is the cloud server to the source languages Voice carries out speech recognition, it is determined that by the voiced translation of the source languages be at least two target languages in except the source languages it After outer at least one target language, the text that speech recognition is obtained is translated into the text of the target language of determination, and The text of the target language to translating into carries out what is obtained after phonetic synthesis；The transmitter, for by the language of the source languages Sound is sent to cloud server；When one or more of programs are by one or more of computing devices so that described one Individual or multiple processors realize method as described above.

To achieve these goals, the aspect embodiment of the application the 6th provides a kind of depositing comprising computer executable instructions Storage media, the computer executable instructions are used to perform method as described above when being performed by computer processor.

To achieve these goals, the cloud server of the aspect embodiment of the application the 7th, including：One or more treatment Device；Memory, for storing one or more programs；Receiver, the voice of the source languages sent for receiving terminal apparatus；Hair Device is sent, for the audio file of target language to be sent into the terminal device, so that the terminal device is played；When described one Individual or multiple programs are by one or more of computing devices so that one or more of processors are realized as described above Method.

To achieve these goals, the application eighth aspect embodiment provides a kind of depositing comprising computer executable instructions Storage media, the computer executable instructions are used to perform method as described above when being performed by computer processor.

The aspect and advantage that the application is added will be set forth in part in the description, and will partly become from the following description Substantially, or recognized by the practice of the application.

Brief description of the drawings

The above-mentioned and/or additional aspect of the application and advantage will become from the following description of the accompanying drawings of embodiments Substantially and be readily appreciated that, wherein：

Fig. 1 is the flow chart of voice translation method one embodiment that the application is based on artificial intelligence；

Fig. 2 is the flow chart that the application is based on another embodiment of the voice translation method of artificial intelligence；

Fig. 3 is the schematic diagram that the application is based on terminal device one embodiment in the voice translation method of artificial intelligence；

Fig. 4 is the flow chart of the voice translation method further embodiment that the application is based on artificial intelligence；

Fig. 5 is the flow chart of the voice translation method further embodiment that the application is based on artificial intelligence；

Fig. 6 is the flow chart of the voice translation method further embodiment that the application is based on artificial intelligence；

Fig. 7 is the flow chart of the voice translation method further embodiment that the application is based on artificial intelligence；

Fig. 8 is the flow chart of the voice translation method further embodiment that the application is based on artificial intelligence；

Fig. 9 is the flow chart of the voice translation method further embodiment that the application is based on artificial intelligence；

Figure 10 is the structural representation of speech translation apparatus one embodiment that the application is based on artificial intelligence；

Figure 11 is the structural representation that the application is based on another embodiment of the speech translation apparatus of artificial intelligence；

Figure 12 is the structural representation of the speech translation apparatus further embodiment that the application is based on artificial intelligence；

Figure 13 is the structural representation of the speech translation apparatus further embodiment that the application is based on artificial intelligence；

Figure 14 is the structural representation of the application terminal device one embodiment；

Figure 15 is the structural representation of the application cloud server one embodiment.

Specific embodiment

Embodiments herein is described below in detail, the example of the embodiment is shown in the drawings, wherein from start to finish Same or similar label represents same or similar element or the element with same or like function.Below with reference to attached It is exemplary to scheme the embodiment of description, is only used for explaining the application, and it is not intended that limitation to the application.Conversely, this The embodiment of application includes all changes fallen into the range of the spiritual and intension of attached claims, modification and is equal to Thing.

Artificial intelligence (Artificial Intelligence；Hereinafter referred to as：AI it is) to study, be developed for simulating, extend One new technological sciences of intelligent theory, method, technology and application system with extension people.Artificial intelligence is computer section The branch learned, it attempts to understand essence of intelligence, and produces and a kind of new can be made in the similar mode of human intelligence The intelligence machine of reaction, the research in the field includes robot, language identification, image recognition, natural language processing and expert system System etc..

Fig. 1 is the flow chart of voice translation method one embodiment that the application is based on artificial intelligence, as shown in figure 1, on Stating the voice translation method based on artificial intelligence can include：

Step 101, receives the voice of the source languages that user is input into by terminal device.

Step 102, the voice of above-mentioned source languages is sent to cloud server.

Specifically, terminal device can be by pulse code modulation (Pulse Code Modulation；Hereinafter referred to as： PCM) voice of above-mentioned source languages is uploaded to cloud server by form.

Step 103, receives the audio file of the target language that above-mentioned cloud server sends, the audio of above-mentioned target language File is that cloud server carries out speech recognition to the voice of above-mentioned source languages, it is determined that by the voiced translation of above-mentioned source languages be to After at least one target language in few two kinds of target languages in addition to above-mentioned source languages, the text that speech recognition is obtained is turned over It is translated into the text of the target language of determination, and the text of the target language to translating into carries out what is obtained after phonetic synthesis.

Specifically, the audio file that cloud server is sent to the target language of terminal device is also the file of PCM format.

The text of target language of the cloud server to translating into carries out phonetic synthesis and uses from Text To Speech (Text To Speech；Hereinafter referred to as：TTS) service.

Step 104, plays the audio file of above-mentioned target language.

In the above-mentioned voice translation method based on artificial intelligence, the language of the source languages that user is input into by terminal device is received After sound, the voice of above-mentioned source languages is sent to cloud server, then receive the target language that above-mentioned cloud server sends The audio file planted, finally plays the audio file of above-mentioned target language, such that it is able to realize the real time translation of voice, meets and The translation demand of scene is swum in border, and it is higher to translate accuracy rate.

Fig. 2 is the flow chart that the application is based on another embodiment of the voice translation method of artificial intelligence, as shown in Fig. 2 Step 101 can be in the application embodiment illustrated in fig. 1：

Step 201, receives user after the translation button for triggering above-mentioned terminal device by the wheat of above-mentioned terminal device The voice of the source languages of gram wind input.

In the present embodiment, the translation button of above-mentioned terminal device can be the mechanical key set on above-mentioned terminal device, Can also be the virtual key set on above-mentioned terminal device, the present embodiment is not construed as limiting to the form of above-mentioned translation case, but The present embodiment is illustrated so that above-mentioned translation button is the mechanical key set on above-mentioned terminal device as an example.

Wherein, the mode for triggering above-mentioned translation button can be pressed for length, or single click on or double-click etc., this reality Example is applied to be not construed as limiting the mode for triggering above-mentioned translation button, the present embodiment to trigger above-mentioned translation button in the way of for grow by for Example is illustrated.

It should be noted that in the present embodiment, the quantity of the translation button of above-mentioned terminal device is 1, as shown in figure 3, Fig. 3 is the schematic diagram that the application is based on terminal device one embodiment in the voice translation method of artificial intelligence.

That is, the terminal device of the present embodiment is in hardware design, speech recognition+translation can be with a mechanical keys Triggering.When user uses, as long as long by translation button, the voice for wanting translation is said to microphone, for example：" I thinks recently Subway station ", then unclamp translation button, above-mentioned terminal device will play out " I want to go to the nearest The sound result of subway station ", it is achieved thereby that real-time " key " translation of voice.

Fig. 4 is the flow chart of the voice translation method further embodiment that the application is based on artificial intelligence, as shown in figure 4, In the application embodiment illustrated in fig. 1, before step 103, can also include：

Step 401, obtains the target language that above-mentioned user is set, and the target language that above-mentioned user is set is uploaded to above-mentioned Cloud server, the mark and above-mentioned target language of above-mentioned terminal device, above-mentioned mesh are preserved so as to above-mentioned cloud server correspondence Poster kind includes at least two languages, and above-mentioned at least two languages include above-mentioned source languages.

Specifically, after user sets target language on the terminal device, terminal device is obtained with above-mentioned user and sets The target language put, then terminal device the target language that above-mentioned user is set is uploaded to above-mentioned cloud server, by high in the clouds Server correspondence preserves the mark and above-mentioned target language of above-mentioned terminal device, wherein, being designated for above-mentioned terminal device can be with The information of the above-mentioned terminal device of unique mark, for example：The device number of above-mentioned terminal device, the present embodiment is to above-mentioned terminal device The form of mark is not construed as limiting.

Above-mentioned target language can include at least two languages, and above-mentioned at least two languages include above-mentioned source languages, That is, in the present embodiment, terminal device can realize voice mutual translation at least two target languages that user is set.Citing For, it is assumed that the target language that user is set is " China and Britain ", then user pin unique translation of above-mentioned terminal device by After key, it is right literary " I thinks nearest subway station " to above-mentioned terminal device, then unclamps translation button, is carried by the application The voice translation method based on artificial intelligence for supplying, above-mentioned terminal device will play out " I want to go to the The sound result of nearest subway station "；And if user pin unique translation of above-mentioned terminal device by After key, English " I want to go to the nearest subway station " is said to above-mentioned terminal device, then Translation button is unclamped, the voice translation method based on artificial intelligence provided by the application, above-mentioned terminal device will be played out The sound result of " I thinks nearest subway station ".

Similarly, above-mentioned target language can also be " middle Britain and Japan ", then above-mentioned terminal device will be in Chinese, English and Japanese Between realize voice mutual translation, if user be input into a Chinese, then terminal device translates the Japanese for playing this Chinese successively Text and English translation, if one English of input, then terminal device will successively play the Chinese translation and Japanese of this English Translation, by that analogy, will not be repeated here.

By above description as can be seen that in the present embodiment, it is English either to need translator of Chinese, or by English Text is translated as Chinese, is all the input of the voice by same translation button trigger source languages, and this improves terminal in the application The ease for use of equipment, it is user-friendly.

Fig. 5 is the flow chart of the voice translation method further embodiment that the application is based on artificial intelligence, in the present embodiment, Above-mentioned user includes first user and second user, and above-mentioned target language includes the first languages and the second languages；As shown in figure 5, The above-mentioned voice translation method based on artificial intelligence can include：

Step 501, receives the voice of the source languages that first user is input into by terminal device.

Step 502, the voice of above-mentioned source languages is sent to cloud server.

Step 503, receives the audio file of the second languages that above-mentioned cloud server sends, the audio of above-mentioned second languages File is that cloud server carries out speech recognition and Application on Voiceprint Recognition to the voice of above-mentioned source languages, determines the voice of above-mentioned source languages It is the voice of the first languages that first user is input into by above-mentioned terminal device, and the voice of above-mentioned first languages is turned in determination Be translated into after the second languages, the text that speech recognition is obtained translated into the text of the second languages, and to translate into second The text of languages carries out what is obtained after phonetic synthesis.

Step 504, plays the audio file of above-mentioned second languages.

Step 505, receives the voice of another source languages that second user is input into by above-mentioned terminal device.

Step 506, the voice of above-mentioned another source languages is sent to cloud server.

Step 507, receives the audio file of the first languages that above-mentioned cloud server sends, the audio of above-mentioned first languages File is that cloud server carries out speech recognition and Application on Voiceprint Recognition to the voice of above-mentioned another source languages, determines above-mentioned another source language The voice of the second languages that the voice planted is input into for second user by above-mentioned terminal device, and determine above-mentioned second languages Voiced translation it is after the first languages, the text that speech recognition is obtained to be translated into the text of above-mentioned first languages and right The text of the first languages translated into carries out what is obtained after phonetic synthesis.

Step 508, plays the audio file of above-mentioned first languages.

As described above, the present embodiment can realize many wheel voice mutual translations, still so that target language is for Chinese and English as an example, The scene such as entry and exit customs, checkout of ordering, shopping are knocked down-price and/or hotel occupancy is checked out, first user can be grown by above-mentioned terminal The translation button of equipment, one section of Chinese speech is input into by the microphone of above-mentioned terminal device to above-mentioned terminal device, is then pressed According to the voice translation method based on artificial intelligence that the application is provided, it is corresponding that above-mentioned terminal device will obtain above-mentioned Chinese speech The voice of English translation, and play back, second user has been listened after this section of voice of English translation, still can be grown by above-mentioned The translation button of terminal device, is input into one section of English voice, so by the microphone of above-mentioned terminal device to above-mentioned terminal device The voice translation method based on artificial intelligence for being provided according to the application afterwards, above-mentioned terminal device will obtain above-mentioned English voice pair The voice of the Chinese translation answered, and play back, so, first user just can be real by above-mentioned terminal device with second user Existing smooth communication, can fully meet the translation demand of the scenes such as overseas trip.

Further, in the voice translation method based on artificial intelligence that the application is provided, above-mentioned terminal can also be set Standby wireless communication signals are supplied to another terminal device, so that above-mentioned another terminal device is connected to internet.Specifically, on The wireless communication signals for stating terminal device can be Wireless Fidelity (Wireless Fidel ity；Hereinafter referred to as：WiFi) signal, That is, in the present embodiment, above-mentioned terminal device is also equipped with WiFi function, user can be searched and be connected by wireless network To the WiFi that above-mentioned terminal device is provided, at least one online demand of electronic equipment such as mobile phone and/or computer is met, and with Mobile phone cellular network roaming overseas compare cheaper, and signal is more stable.

With original carry-on WiFi function be combined real-time voice interpretative function by the terminal device in above-described embodiment, both Network can be freely enjoyed, the voice real time translation of 26 state's languages can be called by a key when needed again.In commercial affairs exchange, multilingual The scenes such as habit, entry and exit tourism and/or sight spot guide to visitors can efficiently meet online and the translation demand of user, improve user's body Test.

Fig. 6 is the flow chart of the voice translation method further embodiment that the application is based on artificial intelligence, as shown in fig. 6, The above-mentioned voice translation method based on artificial intelligence can include：

Step 601, the voice of the source languages that receiving terminal apparatus send.

Specifically, the voice of above-mentioned source languages is PCM format.

Step 602, the voice to above-mentioned source languages carries out speech recognition, and the voice of above-mentioned source languages is converted into source languages Text.

Step 603, it is determined that by the voiced translation of above-mentioned source languages be at least two target languages in except above-mentioned source languages it Outer at least one target language.

Step 604, the text of above-mentioned source languages is translated into the text of the target language of determination, to the target language translated into The text planted carries out phonetic synthesis, obtains the audio file of target language.

Specifically, the text of target language that can be by TTS service to translating into carries out phonetic synthesis, obtains target language The audio file planted.

Step 605, above-mentioned terminal device is sent to by the audio file of above-mentioned target language, so that above-mentioned terminal device is broadcast Put.

In the above-mentioned voice translation method based on artificial intelligence, after the voice of the source languages that receiving terminal apparatus send, Voice to above-mentioned source languages carries out speech recognition, and the voice of above-mentioned source languages is converted into the text of source languages, it is determined that will The voiced translation of above-mentioned source languages be at least one target language at least two target languages in addition to above-mentioned source languages it Afterwards, the text of above-mentioned source languages is translated into the text of the target language of determination, and the text of the target language to translating into enters Row phonetic synthesis, obtains the audio file of target language, and the audio file of above-mentioned target language finally is sent into above-mentioned terminal Equipment, so that above-mentioned terminal device is played, such that it is able to realize the real time translation of voice, meets the translation need of overseas trip scene Ask, and it is higher to translate accuracy rate.

Fig. 7 is the flow chart of the voice translation method further embodiment that the application is based on artificial intelligence, in the present embodiment, Above-mentioned target language can include the first languages and the second languages；As shown in fig. 7, in the application embodiment illustrated in fig. 6, step 603 can include：

Step 701, the voice to above-mentioned source languages carries out Application on Voiceprint Recognition, and the voice for determining above-mentioned source languages is first user The voice of the first languages being input into by above-mentioned terminal device.

Step 702, the corresponding target language of mark according to the above-mentioned terminal device for pre-saving, it is determined that by above-mentioned first The voiced translation of languages is the audio file of the second languages.

That is, in the present embodiment, cloud server is first by Application on Voiceprint Recognition, the voice for determining above-mentioned source languages After the voice of the first languages of user input, the mark of identifier lookup to the above-mentioned terminal device according to above-mentioned terminal device is right The target language answered includes the first languages and the second languages, and because source languages are the first languages, therefore cloud server can be true Surely need audio file that the voiced translation of above-mentioned first languages is the second languages.

Fig. 8 is the flow chart of the voice translation method further embodiment that the application is based on artificial intelligence, as shown in figure 8, Can include：

Step 801, the voice of the source languages that receiving terminal apparatus send.

Step 802, the voice to above-mentioned source languages carries out speech recognition, and the voice of above-mentioned source languages is converted into source languages Text.

Step 803, the voice to above-mentioned source languages carries out Application on Voiceprint Recognition, and the voice for determining above-mentioned source languages is first user The voice of the first languages being input into by above-mentioned terminal device.

Step 804, the corresponding target language of mark according to the above-mentioned terminal device for pre-saving, it is determined that by above-mentioned first The voiced translation of languages is the audio file of the second languages.

Step 805, the text of above-mentioned source languages is translated into the text of the second languages, the text of the second languages to translating into Originally phonetic synthesis is carried out, the audio file of the second languages is obtained.

Step 806, above-mentioned terminal device is sent to by the audio file of above-mentioned second languages, so that above-mentioned terminal device is broadcast Put.

Step 807, the voice of another source languages that receiving terminal apparatus send.

Step 808, the voice to above-mentioned another source languages carries out Application on Voiceprint Recognition, and the voice for determining above-mentioned another source languages is The voice of the second languages that second user is input into by above-mentioned terminal device.

Step 809, the voice to above-mentioned second languages carries out speech recognition, and the voice of above-mentioned second languages is converted into The text of two languages.

Step 810, the corresponding target language of mark according to the above-mentioned terminal device for pre-saving, it is determined that by above-mentioned second The voiced translation of languages is the audio file of the first languages.

Step 811, the text of above-mentioned second languages is translated into the text of the first languages, to the text of above-mentioned first languages Phonetic synthesis is carried out, the audio file of the first languages is obtained.

Step 812, above-mentioned terminal device is sent to by the audio file of above-mentioned first languages, so that above-mentioned terminal device is broadcast Put.

As described above, the present embodiment can realize many wheel voice mutual translations, still so that target language is for Chinese and English as an example, The scene such as entry and exit customs, checkout of ordering, shopping are knocked down-price and/or hotel occupancy is checked out, first user can be grown by above-mentioned terminal The translation button of equipment, one section of Chinese speech, Ran Houshang are input into by the microphone of above-mentioned terminal device to above-mentioned terminal device State terminal device and this section of Chinese speech be sent to cloud server, cloud server according to the application provide based on artificial intelligence Can voice translation method by above-mentioned Chinese speech be translated as English audio file, then by it is translated English audio file Terminal device is sent to, and is played back by terminal device, second user has been listened after this section of English voice, still can be grown and be pressed The translation button of above-mentioned terminal device, one section of English language is input into by the microphone of above-mentioned terminal device to above-mentioned terminal device Sound, then above-mentioned terminal device this section of English voice is sent to cloud server, cloud server is provided according to the application Voice translation method based on artificial intelligence by the audio file that above-mentioned English voiced translation is Chinese, then by translated Chinese Audio file be sent to terminal device, and played back by terminal device, so, first user is with second user by above-mentioned Terminal device can just realize smooth communication, can fully meet the translation demand of the scenes such as overseas trip.

Fig. 9 is the flow chart of the voice translation method further embodiment that the application is based on artificial intelligence, as shown in figure 9, In the application embodiment illustrated in fig. 6, before step 604, can also include：

Step 901, receives the target language that above-mentioned terminal device is uploaded, correspondence preserve the mark of above-mentioned terminal device with it is upper Target language is stated, above-mentioned target language includes at least two languages, and above-mentioned at least two languages include above-mentioned source languages.

So, step 604 can be：

Step 902, according to the mark of above-mentioned terminal device, calls the corresponding target language of mark of above-mentioned terminal device Corpus, the text of above-mentioned source languages is translated into the text of the target language of determination, the text of the target language to translating into Phonetic synthesis is carried out, the audio file of target language is obtained.

In the present embodiment, terminal device is obtained after the target language that above-mentioned user is set, and can be set above-mentioned user Target language be uploaded to above-mentioned cloud server, the mark and above-mentioned mesh of above-mentioned terminal device are preserved by cloud server correspondence Poster kind, wherein, being designated for above-mentioned terminal device can be with the information of the above-mentioned terminal device of unique mark, such as：Above-mentioned terminal The device number of equipment, the present embodiment is not construed as limiting to the form of the mark of above-mentioned terminal device.

Above-mentioned target language can include at least two languages, and above-mentioned at least two languages include above-mentioned source languages, That is, the present embodiment can realize voice mutual translation at least two target languages that user is set.As an example it is assumed that with Family set target language be " China and Britain ", then user after the unique translation button for pinning above-mentioned terminal device, upwards State terminal device to be right literary " I thinks nearest subway station ", then unclamp translation button, by the application offer based on people The voice translation method of work intelligence, above-mentioned terminal device will play out " I want to go to the nearest subway The sound result of station "；And if user is after the unique translation button for pinning above-mentioned terminal device, to above-mentioned end End equipment says English " I want to go to the nearest subway station ", then unclamps translation button, leads to The voice translation method based on artificial intelligence of the application offer is crossed, above-mentioned terminal device will play out that " I thinks nearest ground The sound result at iron station ".

Similarly, above-mentioned target language can also be " middle Britain and Japan ", then the present embodiment will be between Chinese, English and Japanese Voice mutual translation is realized, if user is input into a Chinese, then the voiced translation based on artificial intelligence provided by the application Method, terminal device will successively play the Japanese Translation and English translation of this Chinese, if one English of input, then pass through The application provide the voice translation method based on artificial intelligence, terminal device will play successively this English Chinese translation and Japanese Translation, by that analogy, will not be repeated here.

Figure 10 is the structural representation of speech translation apparatus one embodiment that the application is based on artificial intelligence, the present embodiment In the speech translation apparatus based on artificial intelligence can set realize on the terminal device shown in the application Fig. 1~Fig. 5 implement The method that example is provided.Above-mentioned terminal device can be the interpreting equipment for being integrated with WiFi function, and the present embodiment sets to above-mentioned terminal Standby form is not construed as limiting.

As shown in Figure 10, the above-mentioned speech translation apparatus based on artificial intelligence can include：Receiver module 1001, transmission mould Block 1002 and playing module 1003；

Wherein, receiver module 1001, the voice for receiving the source languages that user is input into by terminal device；

Sending module 1002, for the voice of above-mentioned source languages to be sent to cloud server；Specifically, sending module The voice of above-mentioned source languages can be uploaded to cloud server by 1002 by PCM format.

Receiver module 1001, is additionally operable to receive the audio file of the target language that above-mentioned cloud server sends, above-mentioned mesh The audio file of poster kind is that cloud server carries out speech recognition to the voice of above-mentioned source languages, it is determined that by above-mentioned source languages After voiced translation is at least one target language at least two target languages in addition to above-mentioned source languages, by speech recognition The text of acquisition translates into the text of the target language of determination, and the text of the target language to translating into carries out phonetic synthesis Obtain afterwards；Specifically, the audio file that cloud server is sent to the target language of terminal device is also the text of PCM format Part.

The text of target language of the cloud server to translating into carries out phonetic synthesis and uses TTS service.

Playing module 1003, the audio file for playing above-mentioned target language.

In the above-mentioned speech translation apparatus based on artificial intelligence, receiver module 1001 is received user and is input into by terminal device Source languages voice after, sending module 1002 sends to cloud server the voice of above-mentioned source languages, then receives mould Block 1001 receives the audio file of the target language that above-mentioned cloud server sends, and last playing module 1003 plays above-mentioned target The audio file of languages, such that it is able to realize the real time translation of voice, meets the translation demand of overseas trip scene, and translate standard True rate is higher.

Figure 11 is the structural representation that the application is based on another embodiment of the speech translation apparatus of artificial intelligence, this implementation In example, receiver module 1001 passes through above-mentioned end specifically for receiving user after the translation button for triggering above-mentioned terminal device The voice of the source languages of the microphone input of end equipment.In the present embodiment, the translation button of above-mentioned terminal device can be above-mentioned The mechanical key set on terminal device, or the virtual key set on above-mentioned terminal device, the present embodiment is to above-mentioned The form for translating case is not construed as limiting, but the present embodiment take above-mentioned translation button as the mechanical key set on above-mentioned terminal device As a example by illustrate.

It should be noted that in the present embodiment, the quantity of the translation button of above-mentioned terminal device is 1, as shown in Figure 3.

Further, the above-mentioned speech translation apparatus based on artificial intelligence can also include：Obtain module 1004；

Module 1004 is obtained, the audio for receiving the target language that above-mentioned cloud server sends in receiver module 1001 Before file, the target language that above-mentioned user is set is obtained；

Sending module 1002, is additionally operable to for the target language that above-mentioned user is set to be uploaded to above-mentioned cloud server, so as to Above-mentioned cloud server correspondence preserves the mark and above-mentioned target language of above-mentioned terminal device, and above-mentioned target language includes at least two Languages are planted, above-mentioned at least two languages include above-mentioned source languages.

Specifically, after user sets target language on the terminal device, obtain module 1004 and be obtained with above-mentioned use Family set target language, then sending module 1002 target language that above-mentioned user is set is uploaded to above-mentioned cloud service Device, the mark and above-mentioned target language of above-mentioned terminal device are preserved by cloud server correspondence, wherein, the mark of above-mentioned terminal device Know for can be with the information of the above-mentioned terminal device of unique mark, such as：The device number of above-mentioned terminal device, the present embodiment is to above-mentioned end The form of the mark of end equipment is not construed as limiting.

Above-mentioned target language can include at least two languages, and above-mentioned at least two languages include above-mentioned source languages, That is, the present embodiment can realize voice mutual translation at least two target languages that user is set.As an example it is assumed that with Family set target language be " China and Britain ", then user after the unique translation button for pinning above-mentioned terminal device, upwards State terminal device to be right literary " I thinks nearest subway station ", then unclamp translation button, playing module 1003 will play out " I The sound result of want to go to the nearest subway station "；And if user is pinning above-mentioned terminal After unique translation button of equipment, English " I want to go to the nearest are said to above-mentioned terminal device Subway station ", then unclamp translation button, and playing module 1003 will play out the language of " I thinks nearest subway station " Sound result.

Similarly, above-mentioned target language can also be " middle Britain and Japan ", then above-mentioned terminal device will be in Chinese, English and Japanese Between realize voice mutual translation, if user be input into a Chinese, then playing module 1003 will successively play the day of this Chinese Literary translation and English translation, if one English of input, then playing module 1003 translates the Chinese for playing this English successively Text and Japanese Translation, by that analogy, will not be repeated here.

In the present embodiment, above-mentioned user include first user and second user, above-mentioned target language include the first languages and Second languages；

The audio file of above-mentioned target language includes the audio file of above-mentioned second languages, the audio text of above-mentioned second languages Part is that cloud server carries out speech recognition and Application on Voiceprint Recognition to the voice of above-mentioned source languages, and the voice for determining above-mentioned source languages is The voice of the first languages that first user is input into by above-mentioned terminal device, and determine the voiced translation of above-mentioned first languages After for the second languages, the text that speech recognition is obtained is translated into the text of the second languages, and the second language to translating into Kind text carry out what is obtained after phonetic synthesis；

Receiver module 1001, is additionally operable to after the audio file that playing module 1003 plays above-mentioned target language, receives The voice of another source languages that second user is input into by above-mentioned terminal device；

Sending module 1002, is additionally operable to send the voice of above-mentioned another source languages to cloud server；

Receiver module 1001, is additionally operable to receive the audio file of the first languages that above-mentioned cloud server sends, and above-mentioned the The audio file of one languages is that cloud server carries out speech recognition and Application on Voiceprint Recognition to the voice of above-mentioned another source languages, it is determined that The voice of above-mentioned another source languages is the voice of the second languages that second user is input into by above-mentioned terminal device, and determination will The voiced translation of above-mentioned second languages be the first languages after, by speech recognition obtain text translate into above-mentioned first languages Text, and the text of the first languages to translating into carries out what is obtained after phonetic synthesis；

Playing module 1003, is additionally operable to play the audio file of above-mentioned first languages.

As described above, the speech translation apparatus based on artificial intelligence of the present embodiment can realize many wheel voice mutual translations, still By target language for Chinese and English as a example by, entry and exit customs, checkout of ordering, shopping knock down-price and/or hotel occupancy check out etc. Scene, first user can be grown by the translation button of above-mentioned terminal device, by the microphone of above-mentioned terminal device to above-mentioned end End equipment is input into one section of Chinese speech, and then the above-mentioned speech translation apparatus based on artificial intelligence obtain above-mentioned Chinese speech correspondence English translation voice, and play back, second user has been listened after this section of voice of English translation, still can be grown by upper The translation button of terminal device is stated, one section of English voice is input into above-mentioned terminal device by the microphone of above-mentioned terminal device, Then the above-mentioned speech translation apparatus based on artificial intelligence obtain the voice of the corresponding Chinese translation of above-mentioned English voice, and play Out, so, first user can just be realized smooth with second user by the above-mentioned speech translation apparatus based on artificial intelligence Link up, can fully meet the translation demand of the scenes such as overseas trip.

Further, the above-mentioned speech translation apparatus based on artificial intelligence can also include：

Wireless signal provides module 1005, for being supplied to another terminal to set the wireless communication signals of above-mentioned terminal device It is standby, so that above-mentioned another terminal device is connected to internet.

Specifically, the wireless communication signals of above-mentioned terminal device can be WiFi signal, that is to say, that in the present embodiment, The above-mentioned speech translation apparatus based on artificial intelligence are also equipped with WiFi function, and user can be searched and be connected to by wireless network The WiFi that the above-mentioned speech translation apparatus based on artificial intelligence are provided, meets at least one electronic equipment such as mobile phone and/or computer Online demand, and the cheaper compared with mobile phone cellular network roaming overseas, signal is more stable.

Speech translation apparatus based on artificial intelligence in above-described embodiment by real-time voice interpretative function with it is original with Body WiFi function is combined, and can freely enjoy network, can call the voice real time translation of 26 state's languages by a key when needed again. The scenes such as commercial exchange, multilingual study, entry and exit tourism and/or sight spot guide to visitors can efficiently meet the online and translation of user Demand, improves Consumer's Experience.

Figure 12 is the structural representation of the speech translation apparatus further embodiment that the application is based on artificial intelligence, this implementation The speech translation apparatus based on artificial intelligence in example can be realized as cloud server, or a part for cloud server The voice translation method based on artificial intelligence shown in the application Fig. 6~Fig. 9 embodiments.

As shown in figure 12, the above-mentioned speech translation apparatus based on artificial intelligence can include：Receiver module 1201, voice is known Other module 1202, determining module 1203, translation module 1204, voice synthetic module 1205 and sending module 1206；

Wherein, receiver module 1201, the voice of the source languages sent for receiving terminal apparatus；Specifically, above-mentioned source language The voice planted is PCM format.

Sound identification module 1202, the voice of the source languages for being received to receiver module 1201 carries out speech recognition, will The voice of above-mentioned source languages is converted into the text of source languages；

Determining module 1203, it is for determining that the voiced translation of above-mentioned source languages is above-mentioned to be removed at least two target languages At least one target language outside the languages of source；

Translation module 1204, for the text of above-mentioned source languages to be translated into the target language that determining module 1203 determines Text；

Voice synthetic module 1205, the text of the target language for being translated into translation module 1204 carries out voice conjunction Into the audio file of acquisition target language；Specifically, the mesh that voice synthetic module 1205 can be by TTS service to translating into The text of poster kind carries out phonetic synthesis, obtains the audio file of target language.

Sending module 1206, the audio file of the target language for voice synthetic module 1205 to be obtained is sent to above-mentioned Terminal device, so that above-mentioned terminal device is played.

In the above-mentioned speech translation apparatus based on artificial intelligence, the source languages that the receiving terminal apparatus of receiver module 1201 send Voice after, sound identification module 1202 carries out speech recognition to the voice of above-mentioned source languages, by the voice of above-mentioned source languages The text of source languages is converted into, determines that by the voiced translation of above-mentioned source languages be at least two target languages in determining module 1203 In after at least one target language in addition to above-mentioned source languages, translation module 1204 translates into the text of above-mentioned source languages The text of the target language of determination, and carry out voice conjunction by 1205 pairs of texts of the target language translated into of voice synthetic module Into, the audio file of target language is obtained, be sent to for the audio file of above-mentioned target language above-mentioned by last sending module 1206 Terminal device, so that above-mentioned terminal device is played, such that it is able to realize the real time translation of voice, meets the translation of overseas trip scene Demand, and it is higher to translate accuracy rate.

Figure 13 is the structural representation of the speech translation apparatus further embodiment that the application is based on artificial intelligence, this implementation In example, above-mentioned target language includes the first languages and the second languages；

Determining module 1203, Application on Voiceprint Recognition is carried out specifically for the voice to above-mentioned source languages, determines above-mentioned source languages Voice is the voice of the first languages that first user is input into by above-mentioned terminal device；And according to the above-mentioned terminal for pre-saving The corresponding target language of mark of equipment, it is determined that by audio that the voiced translation of above-mentioned first languages is above-mentioned second languages text Part.

That is, in the present embodiment, determining module 1203 determines that the voice of above-mentioned source languages is the by Application on Voiceprint Recognition After the voice of the first languages of one user input, the mark of identifier lookup according to above-mentioned terminal device to above-mentioned terminal device Corresponding target language includes the first languages and the second languages, because source languages are the first languages, it is thus determined that module 1203 can To determine to need the audio file by the voiced translation of above-mentioned first languages is the second languages.

In the present embodiment, the audio file of above-mentioned target language includes the audio file of the second languages；

Receiver module 1201, is additionally operable to that the audio file of above-mentioned target language is sent into above-mentioned end in sending module 1206 End equipment, after being played for above-mentioned terminal device, the voice of another source languages that receiving terminal apparatus send；

Determining module 1203, is additionally operable to carry out Application on Voiceprint Recognition to the voice of above-mentioned another source languages, determines above-mentioned another source The voice of languages is the voice of the second languages that second user is input into by above-mentioned terminal device；

Sound identification module 1202, is additionally operable to carry out speech recognition to the voice of above-mentioned second languages, by above-mentioned second language The voice planted is converted into the text of the second languages；

Determining module 1203, is additionally operable to the corresponding target language of mark according to the above-mentioned terminal device for pre-saving, really It is fixed by audio file that the voiced translation of above-mentioned second languages is the first languages；

Translation module 1204, is additionally operable to translate into the text of the second languages the text of the first languages；

Voice synthetic module 1205, is additionally operable to carry out the text of above-mentioned first languages phonetic synthesis, obtains the first languages Audio file；

Sending module 1206, is additionally operable to for the audio file of above-mentioned first languages to be sent to above-mentioned terminal device, for upper State terminal device broadcasting.

As described above, the present embodiment can realize many wheel voice mutual translations, still so that target language is for Chinese and English as an example, The scene such as entry and exit customs, checkout of ordering, shopping are knocked down-price and/or hotel occupancy is checked out, first user can be grown by above-mentioned terminal The translation button of equipment, one section of Chinese speech, Ran Houshang are input into by the microphone of above-mentioned terminal device to above-mentioned terminal device State terminal device and this section of Chinese speech is sent to the speech translation apparatus based on artificial intelligence, the above-mentioned language based on artificial intelligence Above-mentioned Chinese speech is translated as English by sound translating equipment according to the voice translation method based on artificial intelligence that the application is provided Audio file, then the audio file of translated English is sent to terminal device, and played back by terminal device, second User has been listened after this section of English voice, still can be grown by the translation button of above-mentioned terminal device, by above-mentioned terminal device Microphone be input into one section of English voice to above-mentioned terminal device, then this section of English voice is sent to base by above-mentioned terminal device In the speech translation apparatus of artificial intelligence, the above-mentioned speech translation apparatus based on artificial intelligence according to the application provide based on people Work intelligence voice translation method by above-mentioned English voiced translation be Chinese audio file, then by it is translated Chinese audio File is sent to terminal device, and is played back by terminal device, and so, first user is set with second user by above-mentioned terminal It is standby just to realize smooth communication, can fully meet the translation demand of the scenes such as overseas trip.

Further, the above-mentioned speech translation apparatus based on artificial intelligence can also include：Preserving module 1207；

Receiver module 1201, is additionally operable to translate into the text of above-mentioned source languages in translation module 1204 target language of determination Before the text planted, the target language that above-mentioned terminal device is uploaded is received；

Preserving module 1207, the mark and above-mentioned target language of above-mentioned terminal device, above-mentioned target language are preserved for correspondence Planting includes at least two languages, and above-mentioned at least two languages include above-mentioned source languages.

In the present embodiment, translation module 1204, specifically for the mark according to above-mentioned terminal device, calls above-mentioned terminal to set The corpus of the standby corresponding target language of mark, the text of above-mentioned source languages is translated into the text of the target language of determination.

In the present embodiment, terminal device is obtained after the target language that above-mentioned user is set, and can be set above-mentioned user Target language be uploaded to the above-mentioned speech translation apparatus based on artificial intelligence, above-mentioned terminal is preserved by the correspondence of preserving module 1207 The mark of equipment and above-mentioned target language, wherein, being designated for above-mentioned terminal device can be with the above-mentioned terminal device of unique mark Information, for example：The device number of above-mentioned terminal device, the present embodiment is not construed as limiting to the form of the mark of above-mentioned terminal device.

Above-mentioned target language can include at least two languages, and above-mentioned at least two languages include above-mentioned source languages, That is, the present embodiment can realize voice mutual translation at least two target languages that user is set.As an example it is assumed that with Family set target language be " China and Britain ", then user after the unique translation button for pinning above-mentioned terminal device, upwards State terminal device to be right literary " I thinks nearest subway station ", then unclamp translation button, above-mentioned terminal device will play out " I The sound result of want to go to the nearest subway station "；And if user is pinning above-mentioned terminal After unique translation button of equipment, English " I want to go to the nearest are said to above-mentioned terminal device Subway station ", then unclamp translation button, and above-mentioned terminal device will play out the language of " I thinks nearest subway station " Sound result.

Similarly, above-mentioned target language can also be " middle Britain and Japan ", then the present embodiment will be between Chinese, English and Japanese Voice mutual translation is realized, if user is input into a Chinese, then the speech translation apparatus based on artificial intelligence that the application is provided To be successively Japanese and English by this translator of Chinese, then play the Japanese Translation and English of this Chinese successively by terminal device Literary translation, if one English of input, then the speech translation apparatus based on artificial intelligence that the application is provided will successively by this Sentence translator of English is Chinese and Japanese, then plays the Chinese translation and Japanese Translation of this English successively by terminal device, with This analogizes, and will not be repeated here.

Figure 14 is the structural representation of the application terminal device one embodiment, and the terminal device in the present embodiment can be real The method that existing the application Fig. 1~embodiment illustrated in fig. 5 is provided, above-mentioned terminal device can include：One or more processors；Deposit Reservoir, for storing one or more programs；Receiver, the language for receiving the source languages that user is input into by terminal device Sound；And send to cloud server the voice of above-mentioned source languages in transmitter, receive above-mentioned cloud server and send Target language audio file, the audio file of above-mentioned target language is that cloud server is carried out to the voice of above-mentioned source languages Speech recognition, it is determined that by the voiced translation of above-mentioned source languages be at least two target languages in addition to above-mentioned source languages at least After a kind of target language, the text that speech recognition is obtained is translated into the text of the target language of determination, and to translating into The text of target language carry out what is obtained after phonetic synthesis；Transmitter, for the voice of above-mentioned source languages to be sent to high in the clouds Server；When said one or multiple programs are by said one or multiple computing devices so that said one or multiple treatment Device realizes the method that the application Fig. 1~embodiment illustrated in fig. 5 is provided.

Figure 14 shows the block diagram for being suitable to the exemplary terminal equipment 12 for realizing the application implementation method.Figure 14 shows Terminal device 12 be only an example, should not to the function of the embodiment of the present application and using range band any limitation.

As shown in figure 14, terminal device 12 is showed in the form of universal computing device.The component of terminal device 12 can be wrapped Include but be not limited to：One or more processor or processing unit 16, system storage 28, connection different system component (bag Include system storage 28 and processing unit 16) bus 18.

Bus 18 represents one or more in a few class bus structures, including memory bus or Memory Controller, Peripheral bus, AGP, processor or the local bus using any bus structures in various bus structures.Lift For example, these architectures include but is not limited to industry standard architecture (Industry Standard Architecture；Hereinafter referred to as：ISA) bus, MCA (Micro Channel Architecture；Below Referred to as：MAC) bus, enhanced isa bus, VESA (Video Electronics Standards Association；Hereinafter referred to as：VESA) local bus and periphery component interconnection (Peripheral Component Interconnection；Hereinafter referred to as：PCI) bus.

Terminal device 12 typically comprises various computing systems computer-readable recording medium.These media can be it is any can be by end The usable medium that end equipment 12 is accessed, including volatibility and non-volatile media, moveable and immovable medium.

System storage 28 can include the computer system readable media of form of volatile memory, such as arbitrary access Memory (Random Access Memory；Hereinafter referred to as：RAM) 30 and/or cache memory 32.Terminal device 12 can To further include other removable/nonremovable, volatile/non-volatile computer system storage mediums.Only as act Example, storage system 34 can be used for the immovable, non-volatile magnetic media of read-write, and (Figure 14 does not show that commonly referred to " hard disk drives Dynamic device ").Although not shown in Figure 14, can provide for the magnetic to may move non-volatile magnetic disk (such as " floppy disk ") read-write Disk drive, and to removable anonvolatile optical disk (for example：Compact disc read-only memory (Compact Disc Read Only Memory；Hereinafter referred to as：CD-ROM), digital multi read-only optical disc (Digital Video Disc Read Only Memory；Hereinafter referred to as：DVD-ROM) or other optical mediums) read-write CD drive.In these cases, each driving Device can be connected by one or more data media interfaces with bus 18.Memory 28 can be produced including at least one program Product, the program product has one group of (for example, at least one) program module, and it is each that these program modules are configured to perform the application The function of embodiment.

With one group of program/utility 40 of (at least one) program module 42, can store in such as memory 28 In, such program module 42 includes --- but being not limited to --- operating system, one or more application program, other programs Module and routine data, potentially include the realization of network environment in each or certain combination in these examples.Program mould Block 42 generally performs function and/or method in embodiments described herein.

Terminal device 12 can also be with one or more external equipments 14 (such as keyboard, sensing equipment, display 24 etc.) Communication, can also enable a user to the equipment communication that interact with the terminal device 12 with one or more, and/or with so that the end Any equipment (such as network interface card, modem etc.) that end equipment 12 can be communicated with one or more of the other computing device Communication.This communication can be carried out by input/output (I/O) interface 22.Also, terminal device 12 can also be suitable by network Orchestration 20 and one or more network (such as LAN (Local Area Network；Hereinafter referred to as：LAN), wide area network (Wide Area Network；Hereinafter referred to as：WAN) and/or public network, such as internet) communication.As shown in figure 14, network Adapter 20 is communicated by bus 18 with other modules of terminal device 12.Although it should be understood that not shown in Figure 14, Ke Yijie Close terminal device 12 and use other hardware and/or software module, including but not limited to：Microcode, device driver, redundancy treatment Unit, external disk drive array, RAID system, tape drive and data backup storage system etc..

Processing unit 16 by running program of the storage in system storage 28 so that perform various function application and Data processing, for example, realize the voice translation method based on artificial intelligence that the application Fig. 1~embodiment illustrated in fig. 5 is provided.

The application also provides a kind of storage medium comprising computer executable instructions, and above computer executable instruction exists For performing the voice based on artificial intelligence that the application Fig. 1~embodiment illustrated in fig. 5 is provided when being performed by computer processor Interpretation method.

The above-mentioned storage medium comprising computer executable instructions can use one or more computer-readable media Any combination.Computer-readable medium can be computer-readable signal media or computer-readable recording medium.Calculate Machine readable storage medium storing program for executing for example may be-but not limited to-electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, Device or device, or any combination above.The more specifically example (non exhaustive list) of computer-readable recording medium Including：Electrical connection, portable computer diskette, hard disk, random access memory (RAM) with one or more wires, only Read memory (Read Only Memory；Hereinafter referred to as：ROM), erasable programmable read only memory (Erasable Programmable Read Only Memory；Hereinafter referred to as：) or flash memory, optical fiber, portable compact disc are read-only deposits EPROM Reservoir (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer Readable storage medium storing program for executing can be it is any comprising or storage program tangible medium, the program can be commanded execution system, device Or device is used or in connection.

Computer-readable signal media can include the data-signal propagated in a base band or as a carrier wave part, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including --- but It is not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be Any computer-readable medium beyond computer-readable recording medium, the computer-readable medium can send, propagate or Transmit for being used or program in connection by instruction execution system, device or device.

The program code included on computer-readable medium can be transmitted with any appropriate medium, including --- but do not limit In --- wireless, electric wire, optical cable, RF etc., or above-mentioned any appropriate combination.

Computer for performing the application operation can be write with one or more programming language or its combination Program code, described program design language includes object oriented program language-such as Java, Smalltalk, C++, Also including conventional procedural programming language-such as " C " language or similar programming language.Program code can be with Fully perform on the user computer, partly perform on the user computer, performed as an independent software kit, portion Part on the user computer is divided to perform on the remote computer or performed on remote computer or server completely. It is related in the situation of remote computer, remote computer can be by the network of any kind --- including LAN (Local Area Network；Hereinafter referred to as：) or wide area network (Wide Area Network LAN；Hereinafter referred to as：WAN) it is connected to user Computer, or, it may be connected to outer computer (such as using ISP come by Internet connection).

Figure 15 is the structural representation of the application cloud server one embodiment, and the cloud server in the present embodiment can To realize the flow of the application Fig. 6~embodiment illustrated in fig. 9, above-mentioned cloud server can include：One or more processors； Memory, for storing one or more programs；Receiver, the voice of the source languages sent for receiving terminal apparatus；Send Device, for the audio file of target language to be sent into above-mentioned terminal device, so that above-mentioned terminal device is played；Work as said one Or multiple programs are by said one or multiple computing devices so that said one or multiple processors realize the application Fig. 6~ The voice translation method based on artificial intelligence that embodiment illustrated in fig. 9 is provided.

Figure 15 shows the block diagram for being suitable to the exemplary cloud server 10 for realizing the application implementation method.Figure 15 shows The cloud server 10 for showing is only an example, should not carry out any limit to the function of the embodiment of the present application and using range band System.

As shown in figure 15, cloud server 10 is showed in the form of universal computing device.The component of cloud server 10 can To include but is not limited to：One or more processor or processing unit 160, system storage 280 connect different system group The bus 180 of part (including system storage 280 and processing unit 160).

Bus 180 represents one or more in a few class bus structures, including memory bus or Memory Controller, Peripheral bus, AGP, processor or the local bus using any bus structures in various bus structures.Lift For example, these architectures include but is not limited to industry standard architecture (Industry Standard Architecture；Hereinafter referred to as：ISA) bus, MCA (Micro Channel Architecture；Below Referred to as：MAC) bus, enhanced isa bus, VESA (Video Electronics Standards Association；Hereinafter referred to as：VESA) local bus and periphery component interconnection (Peripheral Component Interconnection；Hereinafter referred to as：PCI) bus.

Cloud server 10 typically comprises various computing systems computer-readable recording medium.These media can be it is any can be by The usable medium that cloud server 10 is accessed, including volatibility and non-volatile media, moveable and immovable medium.

System storage 280 can include the computer system readable media of form of volatile memory, for example, deposit at random Access to memory (Random Access Memory；Hereinafter referred to as：RAM) 300 and/or cache memory 320.Cloud service Device 10 may further include other removable/nonremovable, volatile/non-volatile computer system storage mediums.Only As an example, storage system 340 can be used for reading and writing immovable, non-volatile magnetic media that (Figure 15 do not show, commonly referred to " hard disk drive ").Although not shown in Figure 15, can provide for reading removable non-volatile magnetic disk (such as " floppy disk ") The disc driver write, and to removable anonvolatile optical disk (for example：Compact disc read-only memory (Compact Disc Read Only Memory；Hereinafter referred to as：CD-ROM), digital multi read-only optical disc (Digital Video Disc Read Only Memory；Hereinafter referred to as：DVD-ROM) or other optical mediums) read-write CD drive.In these cases, each driving Device can be connected by one or more data media interfaces with bus 180.Memory 280 can include at least one program Product, the program product has one group of (for example, at least one) program module, and these program modules are configured to perform the application The function of each embodiment.

With one group of program/utility 400 of (at least one) program module 420, can store in such as memory In 280, operating system that such program module 420 includes --- but being not limited to ---, one or more application program, other Program module and routine data, potentially include the realization of network environment in each or certain combination in these examples.Journey Sequence module 420 generally performs function and/or method in embodiments described herein.

Cloud server 10 can also be with one or more external equipment 140 (such as keyboard, sensing equipment, displays 240 Deng) communication, can also enable a user to the equipment communication that is interacted with the cloud server 10 with one or more, and/or with make Obtain any equipment (such as network interface card, modulatedemodulate that the cloud server 10 can be communicated with one or more of the other computing device Adjust device etc.) communication.This communication can be carried out by input/output (I/O) interface 220.Also, cloud server 10 may be used also With by network adapter 200 and one or more network (such as LAN (Local Area Network；Hereinafter referred to as： LAN), wide area network (Wide Area Network；Hereinafter referred to as：WAN) and/or public network, such as internet) communication.As schemed Shown in 15, network adapter 200 is communicated by bus 180 with other modules of cloud server 10.Although it should be understood that Figure 15 Not shown in, cloud server 10 can be combined and use other hardware and/or software module, including but not limited to：Microcode, set Standby driver, redundant processing unit, external disk drive array, RAID system, tape drive and data backup storage system System etc..

Processing unit 160 by running program of the storage in system storage 280 so that perform various function application with And data processing, for example realize the voice translation method based on artificial intelligence that the application Fig. 6~embodiment illustrated in fig. 9 is provided.

The application also provides a kind of storage medium comprising computer executable instructions, and above computer executable instruction exists For performing the voice based on artificial intelligence that the application Fig. 6~embodiment illustrated in fig. 9 is provided when being performed by computer processor Interpretation method.

It should be noted that in the description of the present application, term " first ", " second " etc. are only used for describing purpose, without It is understood that to indicate or implying relative importance.Additionally, in the description of the present application, unless otherwise indicated, the implication of " multiple " It is two or more.

Any process described otherwise above or method description in flow chart or herein is construed as, and expression includes It is one or more for realizing specific logical function or process the step of the module of code of executable instruction, fragment or portion Point, and the scope of the preferred embodiment of the application includes other realization, wherein can not press shown or discussion suitable Sequence, including function involved by basis by it is basic simultaneously in the way of or in the opposite order, carry out perform function, this should be by the application Embodiment person of ordinary skill in the field understood.

It should be appreciated that each several part of the application can be realized with hardware, software, firmware or combinations thereof.Above-mentioned In implementation method, the software that multiple steps or method can in memory and by suitable instruction execution system be performed with storage Or firmware is realized.If for example, realized with hardware, and in another embodiment, can be with well known in the art Any one of row technology or their combination are realized：With the logic gates for realizing logic function to data-signal Discrete logic, the application specific integrated circuit with suitable combinational logic gate circuit, programmable gate array (Programmable Gate Array；Hereinafter referred to as：PGA), field programmable gate array (Field Programmable Gate Array；Hereinafter referred to as：FPGA) etc..

Those skilled in the art are appreciated that to realize all or part of step that above-described embodiment method is carried The rapid hardware that can be by program to instruct correlation is completed, and described program can be stored in a kind of computer-readable storage medium In matter, the program upon execution, including one or a combination set of the step of embodiment of the method.

Additionally, each functional module in the application each embodiment can be integrated in a processing module, or Modules are individually physically present, it is also possible to which two or more modules are integrated in a module.Above-mentioned integrated module Both can be realized in the form of hardware, it would however also be possible to employ the form of software function module is realized.If the integrated module Realized in the form of using software function module and as independent production marketing or when using, it is also possible to which storage can in a computer In reading storage medium.

Storage medium mentioned above can be read-only storage, disk or CD etc..

In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means to combine specific features, structure, material or spy that the embodiment or example are described Point is contained at least one embodiment of the application or example.In this manual, to the schematic representation of above-mentioned term not Necessarily refer to identical embodiment or example.And, the specific features of description, structure, material or feature can be any One or more embodiments or example in combine in an appropriate manner.

Although embodiments herein has been shown and described above, it is to be understood that above-described embodiment is example Property, it is impossible to the limitation to the application is interpreted as, one of ordinary skill in the art within the scope of application can be to above-mentioned Embodiment is changed, changes, replacing and modification.

Claims

1. a kind of voice translation method based on artificial intelligence, it is characterised in that including：

Receive the voice of the source languages that user is input into by terminal device；

The voice of the source languages is sent to cloud server；

The audio file of the target language that the cloud server sends is received, the audio file of the target language is the cloud End server carries out speech recognition to the voice of the source languages, it is determined that being at least two mesh by the voiced translation of the source languages After at least one target language in poster kind in addition to the source languages, the text that speech recognition is obtained is translated into determination Target language text, and the text of the target language to translating into carries out what is obtained after phonetic synthesis；

Play the audio file of the target language.

2. method according to claim 1, it is characterised in that the source languages that the reception user is input into by terminal device Voice include：

User is received after the translation button for triggering the terminal device by the source of the microphone input of the terminal device The voice of languages.

3. method according to claim 1, it is characterised in that the target language that the reception cloud server sends Audio file before, also include：

The target language that the user is set is obtained, the target language that the user is set is uploaded to the cloud server, The mark and the target language of the terminal device are preserved so as to cloud server correspondence, the target language is included extremely Few two kinds of languages, at least two languages include the source languages.

4. method according to claim 1, it is characterised in that the user includes first user and second user；It is described Target language includes the first languages and the second languages；

The audio file of the target language includes the audio file of second languages, and the audio file of second languages is The cloud server carries out speech recognition and Application on Voiceprint Recognition to the voice of the source languages, and the voice for determining the source languages is The voice of the first languages that first user is input into by the terminal device, and determine the voiced translation of first languages After for the second languages, the text that speech recognition is obtained is translated into the text of the second languages, and the second language to translating into Kind text carry out what is obtained after phonetic synthesis；

After the audio file for playing the target language, also include：

Receive the voice of another source languages that second user is input into by the terminal device；

The voice of another source languages is sent to cloud server；

The audio file of the first languages that the cloud server sends is received, the audio file of first languages is the cloud End server carries out speech recognition and Application on Voiceprint Recognition to the voice of another source languages, determines the voice of another source languages It is the voice of the second languages that second user is input into by the terminal device, and the voice of second languages is turned in determination It is translated into after the first languages, the text that speech recognition is obtained is translated into the text of first languages, and to translating into The text of the first languages carries out what is obtained after phonetic synthesis；

Play the audio file of first languages.

5. the method according to claim 1-4 any one, it is characterised in that also include：

The wireless communication signals of the terminal device are supplied to another terminal device, so that another terminal device is connected to Internet.

6. a kind of voice translation method based on artificial intelligence, it is characterised in that including：

The voice of the source languages that receiving terminal apparatus send；

Voice to the source languages carries out speech recognition, and the voice of the source languages is converted into the text of source languages；

It is determined that being at least one at least two target languages in addition to the source languages by the voiced translation of the source languages Target language；

The text of the source languages is translated into the text of the target language of determination, the text of the target language to translating into is carried out Phonetic synthesis, obtains the audio file of target language；

The audio file of the target language is sent to the terminal device, so that the terminal device is played.

7. method according to claim 6, it is characterised in that the target language includes the first languages and the second languages；

It is described determine by the voiced translation of the source languages be at least two target languages in addition to the source languages at least A kind of target language includes：

Voice to the source languages carries out Application on Voiceprint Recognition, and the voice for determining the source languages is first user by the terminal The voice of the first languages of equipment input；

The corresponding target language of mark according to the terminal device for pre-saving, it is determined that the voice of first languages is turned over It is translated into the audio file of second languages.

8. method according to claim 7, it is characterised in that the audio file of the target language includes the second languages Audio file；

The audio file by the target language is sent to the terminal device, after being played for the terminal device, Also include：

The voice of another source languages that receiving terminal apparatus send；

Voice to another source languages carries out Application on Voiceprint Recognition, determines the voice of another source languages for second user passes through The voice of the second languages of the terminal device input；

Voice to second languages carries out speech recognition, and the voice of second languages is converted into the text of the second languages This；

The corresponding target language of mark according to the terminal device for pre-saving, it is determined that the voice of second languages is turned over It is translated into the audio file of the first languages；

The text of second languages is translated into the text of the first languages, the text to first languages carries out voice conjunction Into the audio file of the first languages of acquisition；

The audio file of first languages is sent to the terminal device, so that the terminal device is played.

9. the method according to claim 6-8 any one, it is characterised in that the text by the source languages is translated Into before the text of the target language for determining, also include：

The target language that the terminal device is uploaded is received, correspondence preserves the mark and the target language of the terminal device, The target language includes at least two languages, and at least two languages include the source languages.

10. method according to claim 9, it is characterised in that the text by the source languages translates into determination The text of target language includes：

According to the mark of the terminal device, the corpus of the corresponding target language of mark of the terminal device is called, by institute State source languages text translate into determination target language text.

A kind of 11. speech translation apparatus based on artificial intelligence, are set on the terminal device, it is characterised in that described based on people The speech translation apparatus of work intelligence include：

Receiver module, the voice for receiving the source languages that user is input into by terminal device；

Sending module, for the voice of the source languages to be sent to cloud server；

The receiver module, is additionally operable to receive the audio file of the target language that the cloud server sends, the target language Kind audio file be that the cloud server carries out speech recognition to the voice of the source languages, it is determined that by the source languages After voiced translation is at least one target language at least two target languages in addition to the source languages, by speech recognition The text of acquisition translates into the text of the target language of determination, and the text of the target language to translating into carries out phonetic synthesis Obtain afterwards；

Playing module, the audio file for playing the target language.

12. devices according to claim 11, it is characterised in that

The receiver module, specifically for receiving user after the translation button for triggering the terminal device by the terminal The voice of the source languages of the microphone input of equipment.

13. devices according to claim 11, it is characterised in that also include：Obtain module；

The acquisition module, the audio file for receiving the target language that the cloud server sends in the receiver module Before, the target language that the user is set is obtained；

The sending module, is additionally operable to for the target language that the user is set to be uploaded to the cloud server, so as to described Cloud server correspondence preserves the mark and the target language of the terminal device, and the target language includes at least two languages Kind, at least two languages include the source languages.

14. devices according to claim 11, it is characterised in that the user includes first user and second user；Institute Stating target language includes the first languages and the second languages；

The receiver module, is additionally operable to after the audio file that the playing module plays the target language, receives second The voice of another source languages that user is input into by the terminal device；

The sending module, is additionally operable to send the voice of another source languages to cloud server；

The receiver module, is additionally operable to receive the audio file of the first languages that the cloud server sends, first language The audio file planted is that the cloud server carries out speech recognition and Application on Voiceprint Recognition to the voice of another source languages, it is determined that The voice of another source languages is the voice of the second languages that second user is input into by the terminal device, and determination will The voiced translation of second languages be the first languages after, by speech recognition obtain text translate into first languages Text, and the text of the first languages to translating into carries out what is obtained after phonetic synthesis；

The playing module, is additionally operable to play the audio file of first languages.

15. device according to claim 11-14 any one, it is characterised in that also include：

Wireless signal provides module, for the wireless communication signals of the terminal device to be supplied into another terminal device, for Another terminal device is connected to internet.

A kind of 16. speech translation apparatus based on artificial intelligence, are arranged on cloud server, it is characterised in that described to be based on The speech translation apparatus of artificial intelligence include：

Receiver module, the voice of the source languages sent for receiving terminal apparatus；

Sound identification module, speech recognition is carried out for the voice to the source languages, and the voice of the source languages is converted into The text of source languages；

Determining module, for determine by the voiced translation of the source languages be at least two target languages in except the source languages it Outer at least one target language；

Translation module, the text for the text of the source languages to be translated into the target language that the determining module determines；

Voice synthetic module, the text of the target language for being translated into the translation module carries out phonetic synthesis, obtains mesh The audio file of poster kind；

Sending module, the audio file of the target language for the voice synthetic module to be obtained is sent to the terminal and sets It is standby, so that the terminal device is played.

17. devices according to claim 16, it is characterised in that the target language includes the first languages and the second language Kind；

The determining module, Application on Voiceprint Recognition is carried out specifically for the voice to the source languages, determines the voice of the source languages It is the voice of the first languages that first user is input into by the terminal device；According to the mark of the terminal device for pre-saving Corresponding target language is known, it is determined that by the audio file that the voiced translation of first languages is second languages.

18. devices according to claim 17, it is characterised in that the audio file of the target language includes the second languages Audio file；

The receiver module, is additionally operable to the audio file of the target language is sent into the terminal in the sending module and sets It is standby, after being played for the terminal device, the voice of another source languages that receiving terminal apparatus send；

The determining module, is additionally operable to carry out Application on Voiceprint Recognition to the voice of another source languages, determines another source languages The voice of the second languages that is input into by the terminal device for second user of voice；

The sound identification module, is additionally operable to carry out speech recognition to the voice of second languages, by second languages Voice is converted into the text of the second languages；

The determining module, is additionally operable to the corresponding target language of mark according to the terminal device for pre-saving, it is determined that will The voiced translation of second languages is the audio file of the first languages；

The translation module, is additionally operable to translate into the text of second languages text of the first languages；

The voice synthetic module, is additionally operable to carry out the text of first languages phonetic synthesis, obtains the sound of the first languages Frequency file；

The sending module, was additionally operable to for the audio file of first languages to be sent to the terminal device, for the end End equipment is played.

19. device according to claim 16-18 any one, it is characterised in that also include：Preserving module；

The receiver module, is additionally operable to that the text of the source languages is translated into the target language of determination in the translation module Before text, the target language that the terminal device is uploaded is received；

The preserving module, the mark and the target language of the terminal device, the target language bag are preserved for correspondence At least two languages are included, at least two languages include the source languages.

20. devices according to claim 19, it is characterised in that

The translation module, specifically for the mark according to the terminal device, calls the mark of the terminal device corresponding The corpus of target language, the text of the source languages is translated into the text of the target language of determination.

A kind of 21. terminal devices, it is characterised in that including：

One or more processors；

Memory, for storing one or more programs；

Receiver, the voice for receiving the source languages that user is input into by terminal device；And in transmitter by the source language The voice planted is sent to cloud server, receives the audio file of the target language that the cloud server sends, described The audio file of target language is that the cloud server carries out speech recognition to the voice of the source languages, it is determined that by the source After the voiced translation of languages is at least one target language at least two target languages in addition to the source languages, by language The text that sound identification is obtained translates into the text of the target language of determination, and the text of the target language to translating into carries out language Obtained after sound synthesis；

The transmitter, for the voice of the source languages to be sent to cloud server；

When one or more of programs are by one or more of computing devices so that one or more of processor realities The existing method as described in any in claim 1-5.

A kind of 22. storage mediums comprising computer executable instructions, the computer executable instructions are by computer disposal For performing the method as described in any in claim 1-5 when device is performed.

A kind of 23. cloud servers, it is characterised in that including：

One or more processors；

Memory, for storing one or more programs；

Receiver, the voice of the source languages sent for receiving terminal apparatus；

Transmitter, for the audio file of target language to be sent into the terminal device, so that the terminal device is played；

When one or more of programs are by one or more of computing devices so that one or more of processor realities The existing method as described in any in claim 6-10.

A kind of 24. storage mediums comprising computer executable instructions, the computer executable instructions are by computer disposal For performing the method as described in any in claim 6-10 when device is performed.