CN105427855A

CN105427855A - Voice broadcast system and voice broadcast method of intelligent software

Info

Publication number: CN105427855A
Application number: CN201510757022.7A
Authority: CN
Inventors: 王程程; 刘青松
Original assignee: SHANGHAI YUZHIYI INFORMATION TECHNOLOGY Co Ltd
Current assignee: Unisound Shanghai Intelligent Technology Co Ltd
Priority date: 2015-11-09
Filing date: 2015-11-09
Publication date: 2016-03-23

Abstract

The invention discloses a voice broadcast system and a voice broadcast method of intelligent software. The voice broadcast system comprises a character information acquisition module used for acquiring character information, a text front-end processing module connected with the character information acquisition module and used for converting the character information into text information with a special reading method, a model storage module used for building and storing a sound model, a voice synthesis module connected with the text front-end processing module and the model storage module and a voice broadcast module connected with the voice synthesis module and used for playing voice files, wherein the voice synthesis module is used for calling the sound model, obtaining acoustic parameters corresponding to the text information according to the sound model and prediction of a decision tree, carrying out voice synthesis of the acoustic parameters and outputting the voice files synthesized through voice. The technologies of text processing, parameter modeling, voice synthesis and the like are comprehensively used, an intelligent mobile phone end/tablet computer end text broadcast function is provided, and text broadcast of a specific tone is achieved.

Description

A kind of voice broadcasting system of intelligent software and voice broadcast method

Technical field

The present invention relates to a kind of voice broadcast field, particularly relate to a kind of voice broadcasting system and voice broadcast method of intelligent software.

Background technology

Along with the raising of people ' s health level and the prolongation of population life, the ratio that the elderly accounts for population is increasing, day by day receives the concern of international community in the idea of China's Healthy aging.The United Nations proposes, and Healthy ageing is turned to the objective of the struggle that the whole world solves aging problem.In this state of ceremonies of China, respect the aged people and like always to incorporate Chinese people deeply in the heart, data statistics according to China in 2010 the 6th census shows: current, China 60 years old and above population accounting reach 13.26%, along with the quickening of China's aging population trend, how to be the life in old age that the elderly's creating policy is comfortable, the aspect such as health, cultural life of the elderly more and more pay close attention to by people, following aspect such as product design, the marketing will to this trend development.

According to the investigation of centering old group specialty, investigation result reflects, the inside in person in middle and old age group, and the people before 60 years old, have 85.59% to have mobile phone, but 62.38% tenure of use exceeded 2 years; According to another investigation, children give in the present of old man, and smart mobile phone and panel computer account for great majority, because the large display screen of panel computer, can well solve the problem that old man sees word difficulty.

From the elderly's physiological function and life habit, by deeply going to carry out a large amount of the elderly's interviews and investigation, we find, in the cultural life of old man, have greatly people can be relevant to stock.There is time enough to operate staring at dish the day of trade after their a lot of people's retirement, compare some young men more absorbed, in addition, certain sense of accomplishment can be brought to them again, always have used, be looked after properly in the old age.The elderly is different from young man's maximum difficulty of speculating in shares and is, thickly dotted numeral and transaction code are challenges greatly for the rudimentary the elderly of eyesight, this just may cause, " oolong refers to " or input the amount of money by mistake and miss current transaction value, as easy as rolling off a logly causes economy and emotional distress.

In sum, a set of stock voice broadcast software based on Android or iOS platform is set up to old man's being necessary very.

Summary of the invention

Technical matters to be solved by this invention is to provide a kind of voice broadcasting system and voice broadcast method of intelligent software, the stock tickers based on Android or iOS platform can be applied to, the problem of Chu's stock numeral is not seen for the elderly, report voice message and the confirmation of each operation, and can real-time broadcasting current stock market overview.

For realizing above-mentioned technique effect, the invention discloses a kind of voice broadcasting system of intelligent software, comprising:

Word message acquisition module, for gathering the Word message in intelligent software;

Text front end processing block, is connected with described Word message acquisition module, for the described Word message gathered is converted into the text message with specific pronunciation;

Model storage module, for setting up and stored sound model;

Voice synthetic module, with described text front end processing block and described model storage model calling, for calling the sound model of described model storage module stores, parameters,acoustic corresponding to text message that described text front end processing block transmits is obtained according to described sound model and decision tree prediction, described parameters,acoustic is carried out phonetic synthesis, exports the voice document through phonetic synthesis; And

Voice playing module, is connected with described voice synthetic module, the voice file for playing.

The voice broadcasting system of described intelligent software further improves and is, described Word message acquisition module is connected with intellectual broadcast client communication, and described intellectual broadcast client is the plug-in unit assigning into the collection carrying out Word message in intelligent software.

The voice broadcasting system of described intelligent software further improves and is, described text front end processing block comprises:

Regular regular setting unit, is connected with described Word message acquisition module, for carrying out the regularization based on ad hoc rules to the described Word message collected; And

Text transforms mark unit, is connected with described regular regular setting unit, for marking the described Word message through regularization, is converted into the text message with specific pronunciation through mark.

The voice broadcasting system of described intelligent software further improves and is, described model storage module comprises:

Voice annotation front-end processing unit, for gathering sound data sources, carrying out voice annotation front-end processing to the described sound data sources gathered, obtaining text marking information;

Feature extracting unit, is connected with described voice annotation front-end processing unit, for extracting the described fundamental frequency of text marking information and the acoustic feature of frequency spectrum;

Training unit, is connected with described feature extracting unit, for based on the Parameter Clustering of hidden Markov model and training, forms the sound model of described acoustic feature; And

Model storage unit, is connected with described training unit, for storing described sound model.

The voice broadcasting system of described intelligent software further improves and is, described voice synthetic module comprises:

Mark storage unit, is connected with described text front end processing block, carries out part of speech analysis and prosody prediction for the text message transmitted described text front end processing block;

Parameter prediction unit, with described mark storage unit and described model storage model calling, for calling the sound model of described model storage module stores, obtain through parameters,acoustic corresponding to the described text message of part of speech analysis and prosody prediction according to described sound model and decision tree prediction; And

Compositor synthetic speech unit, is connected with described parameter prediction unit, carrying out phonetic synthesis, exporting the voice document through phonetic synthesis for being delivered in Parametric synthesizers by described parameters,acoustic.

The invention also discloses a kind of voice broadcast method of intelligent software, comprising:

Gather the Word message in intelligent software;

The described Word message gathered is converted into the text message with specific pronunciation;

Set up and stored sound model;

Call the sound model of storage, obtain parameters,acoustic corresponding to described text message according to described sound model and decision tree prediction, described parameters,acoustic is carried out phonetic synthesis, exports the voice document through phonetic synthesis; And

Play institute's voice file.

The voice broadcast method of described intelligent software further improves and is, gathering Word message, comprising: in intelligent software, assign the intellectual broadcast client for gathering Word message.

The voice broadcast method of described intelligent software further improves and is, the Word message of collection is converted into the text message with specific pronunciation, comprises:

Regularization based on ad hoc rules is carried out to the Word message collected; And

Described Word message through regularization is marked, is converted into the text message with specific pronunciation through mark.

The voice broadcast method of described intelligent software further improves and is, sets up and stored sound model, comprising:

Gather sound data sources, voice annotation front-end processing is carried out to the described sound data sources gathered, obtains text marking information;

Extract the described fundamental frequency of text marking information and the acoustic feature of frequency spectrum;

Based on Parameter Clustering and the training of hidden Markov model, form the sound model of described acoustic feature; And

Store described sound model.

The voice broadcast method of described intelligent software further improves and is, call the sound model of storage, obtain parameters,acoustic corresponding to described text message according to described sound model and decision tree prediction, described parameters,acoustic is carried out phonetic synthesis, export the voice document through phonetic synthesis, comprising:

Part of speech analysis and prosody prediction are carried out to text message;

Call the sound model of storage, obtain through parameters,acoustic corresponding to the described text message of part of speech analysis and prosody prediction according to described sound model and decision tree prediction; And

Described parameters,acoustic is delivered in Parametric synthesizers and carries out phonetic synthesis, export the voice document through phonetic synthesis.

The present invention, owing to have employed above technical scheme, makes it have following beneficial effect:

The technology such as comprehensive utilization text-processing, parameter model, phonetic synthesis, improve the voice broadcasting system of a set of omnibearing intelligent software, the intellectual broadcast client collection user assigned in intelligent software is utilized to need the Word message reported, recycling text front end processing block carries out special processing for the text rule of different field, can obtain the text message of the specific pronunciation in applicable various field, then, utilize model storage module to set up and store the sound model with special tamber, calling for voice synthetic module, afterwards, voice synthetic module is utilized to call the sound model of special tamber, text message is carried out to the phonetic synthesis of special tamber, the text obtaining special tamber is reported, user is facilitated to replace the mode of simple reading in the mode listening to report, listening to the laggard line operate of report information, avoid maloperation, accomplish accurate convenience, simultaneously, sound model in model storage module can be changed at any time, realize reporting text to adjust at any time with pronunciation tone color, when running into new warning scene and needing to upgrade report text or when wanting the pronunciation tone color of changing up-to-date network rising star, can adjust at any time, convenience very, cost-saving and increase and listen to enjoyment.

Accompanying drawing explanation

Fig. 1 is the high-level schematic functional block diagram of the voice broadcasting system of intelligent software of the present invention.

Fig. 2 is the process flow diagram of the voice broadcast method of intelligent software of the present invention.

Embodiment

Below in conjunction with the drawings and the specific embodiments, the present invention is further detailed explanation.

Consult shown in Fig. 1, the voice broadcasting system of intelligent software of the present invention forms primarily of Word message acquisition module 11, text front end processing block 12, model storage module 13, voice synthetic module 14 and voice playing module 15.

Wherein, Word message acquisition module 11 is for gathering Word message.This word information acquisition module 11 communicates to connect with intellectual broadcast client 111, intellectual broadcast client 111 generally can as plug-in unit, assign into the intelligent software based on Android or iOS platform, carry out the collection of Word message among such as stock tickers (as: stock trader's client, sequence, large wisdom etc.), the function that the mobile phone terminal of intelligence/flat board end is reported with text is provided.User is when needs carry out voice broadcast, intellectual broadcast client 111 can be started, intellectual broadcast client 111 is responsible for the Word message that collection user needs to report, such as relevant to stock text, the problem of Chu's stock numeral is not seen for the elderly, can be voice message and the confirmation that the elderly reports each operation, and can real-time broadcasting current stock market overview.Meanwhile, intellectual broadcast client 111 puts into stock tickers as a plug-in unit, whether reports by click switch unrestricted choice, practical and can not cause harassing and wrecking.

Text front end processing block 12 is connected with Word message acquisition module 11, and the Word message for being gathered by Word message acquisition module 11 is converted into the text message with specific pronunciation.Such as, text for stock carries out special processing, we know, at stock, "+" needs to be read into " rising ", "-" needs to be read into " falling ", index " 3542 " needs to be read into " 3,542 point ", etc., these need the Word message to gathering to carry out special process, make the specific pronunciation of its applicable stock, namely the semanteme of stock is resolved.Wherein, text front end processing block 12 specifically comprises regular regular setting unit 121 and text transforms mark unit 122, regular regular setting unit 121 is connected with Word message acquisition module 11, Word message for collecting Word message acquisition module 11 carries out the regularization based on ad hoc rules, such as read into " point " based on ". ", " % " read into ad hoc ruless such as " percent ", " 1.2% " canonical is turned to " 1 percent two ", then export through normalized Word message, as " 1 percent two ".Text transforms mark unit 122 and is connected with regular regular setting unit 121, for receiving the Word message through regularization that regular regular setting unit 121 exports, and this Word message through regularization is marked, such as, " 1 percent two " are labeled as " baifenzhiyidianer ", and further phone-level part of speech prosodic labeling, be converted into the text message with specific pronunciation through mark, and the text message this with specific pronunciation is delivered to next unit.

Model storage module 13, for setting up and stored sound model, is the vital step of the present invention.The report people that can be set up different tone color by model storage module 13 (can be have Wa Li robot tone color to report people, also can be the tone color of the cartoon figures such as similar RNB, Chibi Maruko Chan, the celebrity voice that when also can be, lower network be hot) sound model, and store, for follow-up phonetic synthesis provides the sound model of the speaker that precondition is good, call at any time for voice synthetic module 14, the text realizing special tamber is reported.Wherein, model storage module 13 specifically comprises voice annotation front-end processing unit 131, feature extracting unit 132, training unit 133 and model storage unit 134.Voice annotation front-end processing unit 131 for by collection 2 ~ 3 hours certain or some report the sound of people as sound data sources, and voice annotation front-end processing is carried out to the sound data sources gathered, obtains the text marking information of this sound data sources.Feature extracting unit 132 is connected with mark front-end processing unit 131, for the acoustic feature of the fundamental frequency and frequency spectrum that extract text marking information.Training unit 133 is connected with feature extracting unit 132, for based on the Parameter Clustering of hidden Markov model (HiddenMarkovModel, be called for short HMM) and training, forms the sound model of the acoustic feature extracted.Model storage unit 134 is connected with training unit 133, for the sound model of the report people of the various tone color of offline storage.Complete model storage module 13 to the foundation of the sound model of the report people of various different tone color and storage, when synthesizing demand and arriving, the relevant sound model reporting people can be called, carry out phonetic synthesis, thus reach the object of voice broadcast.

Voice synthetic module 14 is core technologies of the present invention, also be the module running through whole system, voice synthetic module 14 is connected with text front end processing block 12 and model storage module 13 simultaneously, for the sound model that calling model memory module 13 stores, parameters,acoustic corresponding to text message that text front end processing block 12 transmits is obtained according to this sound model and decision tree prediction, this parameters,acoustic is carried out phonetic synthesis, exports the voice document through phonetic synthesis.Phonetic synthesis, also known as literary periodicals (TexttoSpeech) technology, can be converted into the massage voice reading of standard smoothness out by any Word message in real time, is equivalent to load onto artificial face to machine.It relates to multiple subject technologies such as acoustics, linguistics, digital signal processing, computer science, it is a cutting edge technology in Chinese information processing field, the subject matter solved is exactly how Word message to be converted into the acoustic information that can listen, and also namely allows machine lift up one's voice as people.

Voice synthetic module 14 specifically comprises mark storage unit 141, parameter prediction unit 142 and compositor synthetic speech unit 143.Mark storage unit 141 and the text of text front end processing block 12 transform and marks unit 122 and be connected, and for marking the text message that unit 122 is sent to text conversion, as " deep bid rise today 35 six points ", carry out part of speech analysis and prosody prediction, parameter prediction unit 142 is connected with the model storage unit 134 of mark storage unit 141 and model storage module 13, for sending synthesis demand to model storage unit 134, the sound model of certain report people that the precondition stored in calling model storage unit 134 is good, can be the report people with Wa Li robot tone color, also can be similar RNB, the tone color of the cartoon figures such as Chibi Maruko Chan, the sound model of the celebrity voice that lower network is hot when also can be, obtain through parameters,acoustic corresponding to the text message of part of speech analysis and prosody prediction according to this sound model and decision tree prediction again.Decision tree (DecisionTree) is on the basis of known various situation probability of happening, the expectation value asking for net present value (NPV) by forming decision tree is more than or equal to the probability of zero, assessment item risk, judging the method for decision analysis of its feasibility, is a kind of graphical method intuitively using probability analysis.Compositor synthetic speech unit 143 is connected with parameter prediction unit 142, phonetic synthesis is carried out for parameter prediction unit 142 being predicted the parameters,acoustic obtained is delivered in Parametric synthesizers, export through the voice document of phonetic synthesis, as the sound of " today deep bid go up 35.6 points ".

Voice playing module 15 is connected with the compositor synthetic speech unit 143 of voice synthetic module 14, for playing the sound of the voice document " today deep bid go up 35.6 points " through phonetic synthesis.Process reported by the text completing whole special tamber.

The present invention fully utilizes the technology such as text-processing, parameter model, phonetic synthesis, for old man provides a kind of omnibearing stock to report solution, the intellectual broadcast client collection user assigned in stock tickers is utilized to need the Word message reported, recycling text front end processing block carries out special processing for stock text, can obtain the text message of the specific pronunciation of applicable stock, then, utilize model storage module to set up and store the sound model with special tamber, calling for voice synthetic module, afterwards, voice synthetic module is utilized to call the sound model of special tamber, text message is carried out to the phonetic synthesis of special tamber, the text obtaining special tamber is reported, user is facilitated to replace the mode of simple reading in the mode listening to report, listening to the laggard line operate of report information, avoid maloperation, accomplish accurate convenience, simultaneously, sound model in model storage module can be changed at any time, realize reporting text to adjust at any time with pronunciation tone color, when running into new warning scene and needing to upgrade report text or when wanting the pronunciation tone color of changing up-to-date network rising star, can adjust at any time, convenience very, cost-saving and increase and listen to enjoyment.

Coordinate shown in Fig. 2, utilize voice broadcasting system of the present invention to carry out voice broadcast, mainly comprise the steps:

S001: gather the Word message in intelligent software;

S002: the Word message of collection is converted into the text message with specific pronunciation;

S003: set up and stored sound model;

S004: the sound model calling storage, obtains parameters,acoustic corresponding to text message according to sound model and decision tree prediction, parameters,acoustic is carried out phonetic synthesis, exports the voice document through phonetic synthesis; And

S005: play voice document.

Wherein, step S001: gathering Word message, comprising: assign the intellectual broadcast client for gathering Word message in intelligent software.

This intellectual broadcast client generally can as plug-in unit, assign into the intelligent software based on Android or iOS platform, among such as stock tickers (as: stock trader's client, sequence, large wisdom etc.), carry out the collection of Word message, the function that the mobile phone terminal of intelligence/flat board end is reported with text is provided.User, when needs carry out voice broadcast, can start intellectual broadcast client, and intellectual broadcast client is responsible for the Word message that collection user needs to report, such as relevant to stock text.Do not see the problem of Chu's stock numeral for the elderly, the present invention can be voice message and the confirmation that the elderly reports each operation, and can real-time broadcasting current stock market overview.Meanwhile, intellectual broadcast client puts into stock tickers as a plug-in unit, whether reports by click switch unrestricted choice, practical and can not cause harassing and wrecking.

Step S002: the Word message of collection is converted into the text message with specific pronunciation, such as, the text for stock carries out special processing, and we know, at stock, "+" needs to be read into " rising ", and "-" needs to be read into " falling ", and index " 3542 " needs to be read into " 3,542 point ", etc., these need the Word message to gathering to carry out special process, make the specific pronunciation of its applicable stock, and namely the semanteme of stock is resolved.Specifically comprise:

First, regularization based on ad hoc rules is carried out to the Word message collected, such as read into " point " based on ". ", " % " read into ad hoc ruless such as " percent ", " 1.2% " canonical is turned to " 1 percent two ", then export through normalized Word message, as " 1 percent two ";

Then, the Word message through regularization is marked, such as, " 1 percent two " are labeled as " baifenzhiyidianer ", and further phone-level part of speech prosodic labeling, be converted into the text message with specific pronunciation through mark.

Step S003: set up and stored sound model, comprising:

First, certain reports the sound of people as sound data sources to gather 2-3 hour, carries out voice annotation front-end processing, obtain text marking information to this sound data sources gathered;

Secondly, the fundamental frequency of text marking information and the acoustic feature of frequency spectrum is extracted;

Then, based on Parameter Clustering and the training of HMM, the sound model of acoustic feature is formed;

Finally, stored sound model.

(can be there is Wa Li robot tone color report people by setting up the report people of different tone color, also can be the tone color of the cartoon figures such as similar RNB, Chibi Maruko Chan, the celebrity voice that when also can be, lower network be hot) sound model, and store, the sound model of the speaker that precondition is good can be provided for follow-up phonetic synthesis, for calling at any time, the text realizing special tamber is reported, for voice broadcast increases enjoyment.

Step S004: the sound model calling storage, obtains parameters,acoustic corresponding to text message according to sound model and decision tree prediction, parameters,acoustic is carried out phonetic synthesis, exports the voice document through phonetic synthesis, comprising:

First, to the text message reached, as " deep bid rise today 35 six points ", carry out part of speech analysis and prosody prediction;

Next, send synthesis demand, call the sound model of the storage of the report people trained, the sound model called according to this and decision tree prediction obtain through parameters,acoustic corresponding to the text message of part of speech analysis and prosody prediction;

Finally, carrying out phonetic synthesis by predicting that the parameters,acoustic that obtains is delivered in Parametric synthesizers, exporting the voice document through phonetic synthesis, as the sound of " today deep bid go up 35.6 points ".Complete whole special tamber text and report process.

Adopt voice broadcasting system of the present invention and voice broadcast method, the elderly can check certain stock in stock tickers, this page there will be report plug-in unit thereupon, click switch, then carry out the report of this page basic condition, as: stock code: 600001, stock name: Pudong Development Bank, present price: ten five four null elements.As user needs dealing to operate, after reporting plug-in unit unlatching, can carry out placing an order again after report confirms to the operation of user, prevent maloperation.As: code 600001 of buying in stocks, stock name Pudong Development Bank, 1000 strands, declaration form price ten is hexa-atomic whole.User confirms errorless can placing an order after receiving report information, can accomplish accurate convenience like this.

Below by reference to the accompanying drawings and embodiment to invention has been detailed description, those skilled in the art can make many variations example to the present invention according to the above description.Thus, some details in embodiment should not form limitation of the invention, the present invention by the scope that defines using appended claims as protection scope of the present invention.

Claims

1. a voice broadcasting system for intelligent software, is characterized in that, comprising:

Model storage module, for setting up and stored sound model;

2. the voice broadcasting system of intelligent software as claimed in claim 1, it is characterized in that: described Word message acquisition module is connected with intellectual broadcast client communication, described intellectual broadcast client is the plug-in unit assigning into the collection carrying out Word message in intelligent software.

3. the voice broadcasting system of intelligent software as claimed in claim 1, it is characterized in that, described text front end processing block comprises:

4. the voice broadcasting system of intelligent software as claimed in claim 1, it is characterized in that, described model storage module comprises:

5. the voice broadcasting system of intelligent software as claimed in claim 1, it is characterized in that, described voice synthetic module comprises:

6. a voice broadcast method for intelligent software, is characterized in that, comprising:

Gather the Word message in intelligent software;

Set up and stored sound model;

Play institute's voice file.

7. the voice broadcast method of intelligent software as claimed in claim 6, is characterized in that, gathering the Word message in intelligent software, comprising: in intelligent software, assign the intellectual broadcast client for gathering Word message.

8. the voice broadcast method of intelligent software as claimed in claim 6, is characterized in that, the Word message of collection is converted into the text message with specific pronunciation, comprises:

9. the voice broadcast method of intelligent software as claimed in claim 6, is characterized in that, sets up and stored sound model, comprising:

Store described sound model.

10. the voice broadcast method of intelligent software as claimed in claim 6, it is characterized in that, call the sound model of storage, parameters,acoustic corresponding to described text message is obtained according to described sound model and decision tree prediction, described parameters,acoustic is carried out phonetic synthesis, export the voice document through phonetic synthesis, comprising:

Part of speech analysis and prosody prediction are carried out to text message;