CN106507321A - The bilingual GSM message breath voice conversion broadcasting system of a kind of dimension, the Chinese - Google Patents
The bilingual GSM message breath voice conversion broadcasting system of a kind of dimension, the Chinese Download PDFInfo
- Publication number
- CN106507321A CN106507321A CN201611044873.8A CN201611044873A CN106507321A CN 106507321 A CN106507321 A CN 106507321A CN 201611044873 A CN201611044873 A CN 201611044873A CN 106507321 A CN106507321 A CN 106507321A
- Authority
- CN
- China
- Prior art keywords
- chinese
- voice
- note
- syllable
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000006243 chemical reaction Methods 0.000 title claims abstract description 16
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 34
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 34
- 238000012546 transfer Methods 0.000 claims abstract description 4
- 239000004615 ingredient Substances 0.000 claims abstract description 3
- 238000000034 method Methods 0.000 claims description 23
- 238000005516 engineering process Methods 0.000 claims description 12
- 238000004891 communication Methods 0.000 claims description 10
- 230000033764 rhythmic process Effects 0.000 claims description 10
- 238000001514 detection method Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 8
- 238000003860 storage Methods 0.000 claims description 7
- 238000005520 cutting process Methods 0.000 claims description 6
- 238000011430 maximum method Methods 0.000 claims description 6
- 230000009977 dual effect Effects 0.000 claims description 5
- 230000006870 function Effects 0.000 claims description 5
- 238000012098 association analyses Methods 0.000 claims description 3
- 238000013475 authorization Methods 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 3
- 238000009499 grossing Methods 0.000 claims description 3
- 239000004973 liquid crystal related substance Substances 0.000 claims description 3
- 239000000463 material Substances 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000005304 joining Methods 0.000 claims description 2
- 230000007246 mechanism Effects 0.000 abstract description 6
- 229910052802 copper Inorganic materials 0.000 description 6
- 239000010949 copper Substances 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 239000000203 mixture Substances 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 4
- 230000008901 benefit Effects 0.000 description 4
- 238000013461 design Methods 0.000 description 3
- 230000002194 synthesizing effect Effects 0.000 description 3
- 206010011224 Cough Diseases 0.000 description 2
- 206010047700 Vomiting Diseases 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 238000005538 encapsulation Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000013139 quantization Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 238000001308 synthesis method Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- 230000008673 vomiting Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/12—Messaging; Mailboxes; Announcements
- H04W4/14—Short messaging services, e.g. short message services [SMS] or unstructured supplementary service data [USSD]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/18—Information format or content conversion, e.g. adaptation by the network of the transmitted or received information for the purpose of wireless delivery to users or terminals
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Telephonic Communication Services (AREA)
Abstract
Disclosed by the invention is a kind of dimension, the bilingual GSM message breath voice conversion broadcasting system of the Chinese, is divided into two ingredients of software system and hardware system;The software system function is divided into note receiver module, Text Pretreatment module, building of corpus module, four part of waveform concatenation synthesis module;Described hardware system is:Use the arm processor of cortex M3 kernels as control unit, use SIM900A gsm modules to receive as note and voice transfer unit.Present invention achieves the conversion of dimension, Chinese short message voice, and ensure only to be identified the short message content of phone number for authorizing, automatic clawback mechanism and DTMF password authentification mechanism ensure that the legitimacy of the sender of short message content.Safety and stability of the present invention, integrated level are high, PC can be coordinated to be converted into voice and be broadcast to designated area by secure authentication mechanisms emergent note, substantially reduce the laying cost of emergent broadcast while flexibility ratio is improved.
Description
Technical field
The present invention relates to information speech changes broadcasting system technical field, specifically one kind is related to dimension, the bilingual GSM message of the Chinese
Breath voice conversion broadcasting system.
Background technology
Communication is one of electronic information technology of most worthy that twentieth century occurs.Wherein SMS is that people use
The earliest major way of mobile phone exchange.The conversion of short message voice broadcasts function in daily life with extremely important
Using value, it can give people class offer cheap, easily communication service.
The technology of the voice broadcast note that some mobile phones existing are provided is not well positioned to meet the needs of user:Have
Require connect to server and synthesized voice messaging is issued mobile phone again, so have the restriction of use environment and bring expense
Increase;Had carries out phonetic synthesis by the method in locally-stored syllable, and so synthesis tonequality is excessively poor.
Broadcast especially for dimension, the bilingual short message voice conversion of the Chinese, there is also some problems at present, note of such as meeting an urgent need
Voice cannot be converted into safely and be broadcast to designated area;In addition, in terms of phonetic synthesis, conventional phoneme synthesizing method is main
There are two kinds of technology of parameter synthesis and waveform concatenation method, parameter synthesis method to compare the parameter of dependence voice in synthetic technology and carry
Technology is taken, but, the research to model for speech production is perfect not enough at present, therefore the phonetic synthesis based on parameter synthesis method
Definition and naturalness all also do not reach practical level.Different in this, the phonetic synthesis principle based on waveform concatenation technology is handle
Speech waveform is spliced, the overall speech data of output.Waveform concatenation phonetic synthesis replaces parameter, voice with natural-sounding waveform
Waveform takes from word, phrase and sentence, implies the rhythm of nature in raw tone, and the lamprophonia nature of synthesis, its quality are general
All over higher than the phonetic synthesis based on parameter.During waveform concatenation, if directly spliced simple waveform, often send out
Splice point has problems with now:First, sound splice point can be found and there is thorn sound, secondly, if two sound sounds
High different, it is found that in splicing, the pitch of whole sound is different, suddenly low height suddenly.
For problem above, we are more deepened to dimension, the bilingual short message voice conversion broadcasting system application platform of the Chinese
Entered and refine designs and develops, will be with very extensive actual application value.
Content of the invention
Present invention solves the technical problem that be to provide the bilingual GSM message breath voice of a kind of dimension of low power consumption and low cost, the Chinese turning
Broadcasting system is changed, solves the problems, such as that emergent note cannot be converted into safely voice and be broadcast to designated area.
The technical scheme is that:The bilingual GSM message breath voice conversion broadcasting system of a kind of dimension, the Chinese, is divided into software system
System and two ingredients of hardware system;
The software system function is divided into note receiver module, Text Pretreatment module, building of corpus module, waveform and spells
It is bonded into four part of module;
Note receiver module:Responsible note is received, and obtains newest short message text in real time;When note is received, carry out
Clawback, carries out DTMF decodings by MT8870 chips to input through keyboard, obtains input password, and is verified, then by eventually
End control works come the real-time processing for carrying out pushed information;Achieve the information receive capabilities pushed based on SMS.Note
Reception is an exploitation formula system, and note source is not specified.Due in the actual environment of project application, can not
The meeting for avoiding receives some non-notes for pushing request.In order to carry out the certification for pushing request in short message receiving terminal, therefore add
Enter verification process.By calling back, user input password is then treated, carry out password authentification.In the proof procedure, Major Difficulties
When exactly calling back, the password that client is input is obtained by dialing keyboard.
Text Pretreatment module:Carry out languages identification first, distinguish Chinese and Uighur text, be then based on Chinese,
The normalization rule of Uighur, carries out text regularization to Chinese, Uighur respectively, and Chinese is entered according to existing dictionary
Row participle, carries out cutting to Uighur according to existing syllable storehouse, obtains the pronunciation unit of text;
Building of corpus module:Build Chinese vocabulary, individual character corpus;Build Uighur syllable corpus;
Waveform concatenation synthesis module:For Chinese, the corresponding audio file of vocabulary is selected to be spliced, for Uygur
Language, selects the corresponding audio file of syllable to be spliced;
Described hardware system is:Use the arm processor of cortex M3 kernels as control unit, use SIM900A
Gsm module is solved to MP3 audio files as note reception and voice transfer unit, using VS1003 audio decoders chip
Code, driven using SDIO and SD card set up FAT32 file system storage configuration files and audio file, use 400X240TFT3
Very little each running status of chromatic liquid crystal screen display system, MX3232 is used as the driving chip reality of RS232 interface circuits
Existing system and the communication connection of PC, RTC clock circuit is adopted to provide accurate real-time clock for system, adopt AMS1117 linearly steady
Depressor is down to running voltage, the embedded short message broadcasting system energy of arm processor and other chips by the voltage of power supply adaptor
Enough receive the note of designated mobile phone and actively can call back to mandate phone number, by dual tone multi-frequency dtmf (Dual Tone
Multi Frequency) identification mandate cell phone password, the short message sending for needing to report can be carried out language to PC after being verified
Sound synthesizes and broadcasts.SIM900A modules are the GSM/GPRS wireless communication modules of compact dimensions, are easy to client using SMT encapsulation
Carry out flexible design, powerful.
Further, in such scheme, the Text Pretreatment module is concrete in Unicode using languages character
Location recognition languages, distinguish Chinese and Uighur information, by engineering and the angle of technology, rule-based to Chinese, tie up me
You carry out Regularization by Chinese language part;Participle is carried out according to existing dictionary using Forward Maximum Method algorithm to Chinese, to dimension
I equally carries out cutting using Forward Maximum Method algorithm according to existing syllable storehouse by your radix;Languages identification is known comprising voice languages
Not, i.e., languages are distinguished according to voice document;The identification of text languages, i.e., distinguish languages according to text.
Used as the basis of languages identification, code identification is precondition, and the present invention adopts Unified coding UTF-8, therefore
Code identification link is avoided, identification minority language is operated mainly in, we adopt in the identification of minority language
Method based on particular location of the languages character in Unicode is carrying out canonical coupling.
In real text, often include substantial amounts of non-standard word character string, such as " in January, 2016 ", therein
" 2016 " and " 1 ";" ten thousand metres ", " 10000 " therein;" 1000 " therein;" 15 " therein.These non-standard alphabetic characters, mainly have Arabic numerals, English character, various symbols
Number composition.During phonetic synthesis, the date of these non-standard alphabetic character compositions, phone, numerical value etc. are needed to carry out spy
Different process, its process is exactly text regularization.
By taking the regularization of Chinese as an example.In order to distinguish the pronunciation rule of numeral, first, the pronunciation type of numeral is set, when
For phone type when, i.e., " 101 " reading " the one 0 one " when, then it is assumed that numeric type is P;When for character string type when, i.e., " 145 " read
When " one four five ", then it is assumed that numeric type is S;When for numeric type when, i.e., " 165 " reading " 165 " when, then it is assumed that number
Word type is N.With this form express time " on January 1st, 2016 " and when, then be denoted as " S S days month S ";Represent " 1,000,000 " when,
Then be denoted as " N ten thousand ";Represent " 2016/1/1 ", be then denoted as " S/S/S ";When representing " 010-25124585 ", then it is denoted as " P-P ".Logical
Dynamic construction primitive formula is crossed, so as to complete the regularization of numeral.The pronunciation of special symbol is considered, in order to send out symbol
Sound is expressed, and is proposed a kind of phonetic representation formula, is exactly directly to be described the pronunciation of special symbol.Such as " 12.25% ",
It is " N.N% " with above primitive formula, its phonetic representation is " percent N.N ".By dynamic construction phonetic representation formula, so as to
Complete the regularization of special symbol.
The phonetic synthesis of Chinese is the phoneme synthesizing method based on Large Scale Corpus.Spelled based on Large Scale Corpus waveform
The advantage for connecing speech synthesis technique is the linguistic feature for remaining raw tone to greatest extent, for vocabulary is not logged in, adopts
It is basic concatenation unit with syllable.By choosing in corpus, long vocabulary, phrase are synthesized as far as possible, are reduced in synthesis sentence
Splice point.Do so, has two benefits:First, considerable rhythm word, prosodic phrase use the original language in corpus
Sound data, maintain good primitive nature degree, while reduce splice point as far as possible, it is ensured that the overall naturalness of synthesis voice;
Secondly, exactly can be good at tackling the phonetic synthesis for being not logged in vocabulary.
Further, in such scheme, the structure Uighur syllable corpus of succeeding, in phonetic synthesis, can
To shield the pronunciation rule inside syllable, make the voice inside syllable more natural, by including long, can be to a certain degree
On avoid syllable from directly joining sound phenomenon, increase the speech naturalness between syllable and syllable;Uighur corpus is comprising about
6000 Uighur syllables, in addition to more than 2000 conventional syllable, remaining is the long that includes;Corpus raw tone size
About 0.72G;
Further, in such scheme, described structure Chinese vocabulary, individual character corpus are by collecting from interconnection
Net dictionary and association analysiss dictionary, input method dictionary, and popular comprehensive glossary storehouse on network, and all of dictionary is carried out
Duplicate removal collects, and obtains unduplicated lexicon dictionary;Vocabulary more than 50,000 is obtained, wherein mainly includes Chinese vocabulary, secondly comprising appropriate
Long phrase, and individual character more than 7000;Speech data is corresponded with the vocabulary in dictionary, and voice document adopts schoolgirl's sounding,
Sample rate is 8000Hz, and quantization digit is 16, is stored with wav phonetic matrixs, obtains 1.2G altogether;
Further, in such scheme, the building of corpus module is present for the voice document in corpus
Voice border issue, has carried out speech terminals detection, with general speech terminals detection technology, sound end is marked.
Further, in such scheme, the building of corpus module is analyzed to the sound bank for having built, is found
There is a considerable amount of non-speech data in voice, fusion mel-frequency cepstrum coefficient (Mel- has then been carried out to speech data
Scale Frequency Cepstral Coefficients) and kNN sorting algorithm speech terminals detections.
Further, in such scheme, described waveform concatenation synthesis module in splicing, using smoothing processing
Algorithm, by each syllable, vocabulary audio volume control the amplitude processing into being fade-in fade-out, prevents splice point from having the noise made in coughing or vomiting loudspeaker produced by imbalance
Sound;Then by using strategies such as rhythm model, duration controls, pairing is optimized into voice.
The rhythm model:The main syllables different by screening combine to be realized more natural voice, as far as possible
Ensure the globality of syllable, in a vocabulary, first have to the syllable combination for searching corresponding maximum, if there is connection two sound of sound
The voice of section, then using two syllable verbal audios, it is ensured that maximum syllable unit preferential principle, the natural rhythm for ensureing in voice with this
Rule;
For Chinese syllable synthesis splice point be primarily present in word and word, word and word, word and word, word and word, word with short
Language, sentence and sentence;The categorized gap mainly having between sentence and sentence, main comprising be made up of comma, fullstop etc. half
Sentence and the gap of whole sentence;Gap between word and word, between the word wherein comprising word and word, word and phrase, word and word composition
Gap;Gap between word and word;Wherein, the gap between word is less than the gap between word, and the gap between word is less than sentence
Between gap.In by splicing, constantly adjustment obtains the gap between various splicings.
The bilingual GSM message breath voice conversion broadcasting system of described dimension, the Chinese, its method of work is:When SIM900A is received
After one new note, UART can be passed through and frame AT instructions are sent to STM32 controllers, point out have new note to be received;This
When STM32 controllers send note to SIM900A and read instruction, read unread short messages, and extract note and receive time, note and send out
The information such as the phone number and short message content of the side of going out, while the configuration file that can read SD card contrasts the mobile phone of short message sending side
Whether number is authorization number, otherwise comes back to the state for waiting new message;Afterwards, controller passes through AT command operatings
SIM900A calls back to the sender of note, and points out to be input into password;Send after note side receives calls according to voice message
The short breath of input pushes password, and STM32 controllers parse character according to the DTMF of SIM900A and judge whether password is correct.If close
Code is verified, and the note for receiving before can be passed through RS232 communication interfaces by STM32 controllers, and according to specific frame format
PC ends are sent to, and short message content is converted into voice and broadcasted.
The invention has the beneficial effects as follows:The present invention is by note reception, Text Pretreatment, building of corpus, waveform concatenation
The realization of function, it is achieved that dimension, the conversion of Chinese short message voice.And ensure only to know the short message content of the phone number for authorizing
Not, automatic clawback mechanism and DTMF password authentification mechanism ensure that the legitimacy of the sender of short message content.Safety of the invention is steady
Fixed, integrated level is high, PC can be coordinated to be converted into voice and be broadcast to designated area by secure authentication mechanisms emergent note, carried
The laying cost of emergent broadcast is substantially reduced while high flexibility ratio.
Description of the drawings
Fig. 1 is that note data receives flow chart;
Fig. 2 is to push checking flow chart;
Fig. 3 is Forward Maximum Method algorithm flow chart;
Fig. 4 is speech terminals detection schematic diagram;
Fig. 5 is hardware system functional block diagram;
Fig. 6 is the shell structure schematic diagram of present invention conversion broadcasting system.
Wherein, 1- base plates, 2-PCB protection boards, 3- is without interface minor face side plate, 4- copper pillar one, 5-PCB plates, the long side plates of 6-
One, 7- display screen, 8- copper pillar two, 9- screen protecting plates, 10- signal emitting plates, 232 joints of 11-RS, 12- panels, 13- have
Interface minor face side plate, 14-RS232 joint jam plates, the long side plates two of 15-, 16-SD storage cards.
Specific embodiment
Below in conjunction with the accompanying drawings the present invention is further described in detail:
The bilingual GSM message breath voice conversion broadcasting system of a kind of dimension, the Chinese, is divided into two compositions of software system and hardware system
Part;
Software system function is divided into note receiver module, Text Pretreatment module, building of corpus module, waveform concatenation and closes
Into four part of module.Note receiver module:Responsible note receives (receiving flow chart, as shown in Figure 1), obtains in real time newest short
Message sheet;When note is received, clawback is carried out, DTMF decodings are carried out by MT8870 chips to input through keyboard, obtain input
Password, and (flow chart of proof procedure, such as Fig. 2) is verified, the real-time of pushed information is carried out by terminal control then
Process work;Achieve the information receive capabilities pushed based on SMS.It is an exploitation formula system that note is received, and does not have
Note source is specified.Please due in the actual environment of project application, inevitably receiving some non-push
The note that asks.In order to the certification for pushing request is carried out in short message receiving terminal, therefore add verification process.By calling back, so
After treat user input password, carry out password authentification.In the proof procedure, when Major Difficulties are exactly to call back, obtain client and lead to
Cross the password of dialing keyboard input.Text Pretreatment module:Carry out languages identification first, distinguish Chinese and Uighur text,
The normalization rule of Chinese, Uighur is then based on, text regularization is carried out to Chinese, Uighur respectively, to Chinese root
Participle is carried out according to existing dictionary, cutting is carried out according to existing syllable storehouse to Uighur, obtain the pronunciation unit of text.Language
Material storehouse builds module:Build Chinese vocabulary, individual character corpus;Build Uighur syllable corpus.Waveform concatenation synthesis module:
For Chinese, select the corresponding audio file of vocabulary to be spliced, for Uighur, select the corresponding audio file of syllable to enter
Row splicing.
Particular location identification languages of the pretreatment module using languages character in Unicode, distinguish Chinese and Uygur
Language information, by engineering and the angle of technology, rule-based carries out Regularization to Chinese, Uighur file;To Chinese
Participle is carried out using Forward Maximum Method algorithm (Fig. 3) according to existing dictionary, same according to existing syllable storehouse to Uighur
Sample carries out cutting using Forward Maximum Method algorithm (Fig. 3);Languages identification comprising voice languages identification, i.e., according to voice document come
Difference languages;The identification of text languages, i.e., distinguish languages according to text.
Used as the basis of languages identification, code identification is precondition, and the present invention adopts Unified coding UTF-8, therefore
Code identification link is avoided, identification minority language is operated mainly in, we adopt in the identification of minority language
Method based on particular location of the languages character in Unicode is carrying out canonical coupling.
The phonetic synthesis of Chinese is the phoneme synthesizing method based on Large Scale Corpus.Spelled based on Large Scale Corpus waveform
The advantage for connecing speech synthesis technique is the linguistic feature for remaining raw tone to greatest extent, for vocabulary is not logged in, adopts
It is basic concatenation unit with syllable.By choosing in corpus, long vocabulary, phrase are synthesized as far as possible, are reduced in synthesis sentence
Splice point.Do so, has two benefits:First, considerable rhythm word, prosodic phrase use the original language in corpus
Sound data, maintain good primitive nature degree, while reduce splice point as far as possible, it is ensured that the overall naturalness of synthesis voice;
Secondly, exactly can be good at tackling the phonetic synthesis for being not logged in vocabulary.
During building Uighur syllable corpus, in phonetic synthesis, the pronunciation rule inside syllable can be shielded,
Make the voice inside syllable more natural, by including long, syllable can be avoided to a certain extent directly to join sound phenomenon,
Increase the speech naturalness between syllable and syllable;Uighur corpus includes about 6000 Uighur syllables, except 2000
Outside multiple conventional syllables, remaining is the long that includes;Corpus raw tone size is about 0.72G.Build Chinese vocabulary, list
Word corpus is the input method dictionary by collecting from the Internet dictionary and association analysiss dictionary, and on network popular comprehensive
Lexicon is closed, and all of dictionary is carried out duplicate removal and collected, obtain unduplicated lexicon dictionary;Vocabulary more than 50,000 is obtained, is wherein led
Chinese vocabulary to be included, secondly includes appropriate long phrase, and individual character more than 7000;Vocabulary in speech data and dictionary is one by one
Corresponding, voice document adopts schoolgirl's sounding, and sample rate is 8000Hz, and quantization digit is 16, is stored with wav phonetic matrixs,
1.2G is obtained altogether;
The voice border issue that building of corpus module is present for the voice document in corpus, has carried out sound end
Detection, with general speech terminals detection technology, is marked to sound end.The sound bank for having built is analyzed, is sent out
There is a considerable amount of non-speech data in existing voice, fusion mel-frequency cepstrum coefficient has then been carried out to speech data
(Mel-scale Frequency Cepstral Coefficients) and kNN sorting algorithm speech terminals detection (Cleaning Principles
As shown in Figure 4).
Waveform concatenation synthesis module in splicing, using smoothing processing algorithm, by each syllable, vocabulary audio volume control
The amplitude processing prevents splice point from having the noise made in coughing or vomiting loudspeaker sound produced by imbalance into being fade-in fade-out;Then by adopting rhythm model, duration
The strategies such as control, pairing are optimized into voice.
Rhythm model is mainly realized more natural voice by screening different syllable combinations, is ensured as far as possible
The globality of syllable, in a vocabulary, first has to the syllable combination for searching corresponding maximum, if there is connection two syllable of sound
Voice, then using two syllable verbal audios, it is ensured that maximum syllable unit preferential principle, the rhythm of nature for ensureing in voice with this;
For Chinese syllable synthesis splice point be primarily present in word and word, word and word, word and word, word and word, word with short
Language, sentence and sentence;The categorized gap mainly having between sentence and sentence, main comprising be made up of comma, fullstop etc. half
Sentence and the gap of whole sentence;Gap between word and word, between the word wherein comprising word and word, word and phrase, word and word composition
Gap;Gap between word and word;Wherein, the gap between word is less than the gap between word, and the gap between word is less than sentence
Between gap.In by splicing, constantly adjustment obtains the gap between various splicings, is finally set to such as table 1.
Table 1:Interval setting
Hardware system (system functional block diagram, such as Fig. 5) use the arm processor of cortex M3 kernels as control unit,
Use SIM900A gsm modules as note reception and voice transfer unit, VS1003 audio decoders chip is adopted to MP3 audio frequency
File is decoded, is driven SD card using SDIO and set up FAT32 file system storage configuration files and audio file, use
400X240TFT3 cun chromatic liquid crystal screen display systems each running statuses, MX3232 is used as RS232 interface circuits
Driving chip is realized the communication connection of system and PC, adopts RTC clock circuit to provide accurate real-time clock, employing for system
AMS1117 linear voltage regulators are down to the running voltage of arm processor and other chips, embedded short by the voltage of power supply adaptor
Letter broadcasting system can receive the note of designated mobile phone and actively can call back to mandate phone number, by dual-tone multifrequency
DTMF (Dual Tone Multi Frequency) identifications authorize cell phone password, after being verified can will need the short of report
Letter is sent to PC and carries out phonetic synthesis and broadcast.SIM900A modules are the GSM/GPRS wireless communication modules of compact dimensions, adopt
SMT encapsulation is easy to client carries out flexible design, powerful.
The shell structure schematic diagram of the conversion broadcasting system, as shown in fig. 6, including base plate 1, PCB protection boards 2, without interface
Minor face side plate 3, copper pillar 1, pcb board 5, long side plate 1, display screen 7, copper pillar 28, screen protecting plate 9, signal emitting plate
10th, 232 joints 11 of RS, panel 12, have interface minor face side plate 13, RS232 joints jam plate 14, long side plate 2 15, SD storage cards
16.Base plate 1, panel 12, without interface minor face side plate 3, have interface minor face side plate 13 and long side plate 1,2 15 conduct of long side plate
Six faces besieged city, one cuboid, PCB protection boards 2, pcb board 5, display screen 7, signal emitting plate 10, RS232 joints 11, SD are deposited
Card storage 16 is each provided at rectangular internal;On the inside of base plate 1, the signal emitting plate 10 is located at PCB protection boards 2 to PCB protection boards 2
Surface on, the pcb board 5 by copper pillar 1 on the signal emitting plate 10, display screen 7 on pcb board 5, copper
Pillar 28 has 4, is located at the corner of signal emitting plate 10 respectively, and one end is fixed on expelling plate 10, and the other end withstands on panel 12
On;Joint jam plate 14 has been located on interface minor face side plate 13, and screen protecting plate 9 is opened on panel 12,11 one end of RS232 joints
In the side of pcb board 5, the other end is having on interface minor face side plate 13, and SD storage cards 16 are located on signal emitting plate 10.
The bilingual GSM message of dimension, the Chinese ceases the method for work of voice conversion broadcasting system:When SIM900A receives one newly
Note after, can pass through UART to STM32 controllers send a frame AT instruction, point out have new note to be received;Now STM32
Controller sends note to SIM900A and reads instruction, reads unread short messages, and extracts note and receives time, note sender
The information such as phone number and short message content, while the phone number that can read the configuration file contrast short message sending side of SD card is
No is authorization number, otherwise comes back to the state for waiting new message;Afterwards, controller by AT command operatings SIM900A to
The sender of note calls back, and points out to be input into password;Send after note side receives calls and short breath is input into according to voice message
Password is pushed, STM32 controllers parse character according to the DTMF of SIM900A and judge whether password is correct.If password authentification is led to
Cross, STM32 controllers by the note for receiving before by RS232 communication interfaces, and can be sent to PC according to specific frame format
Hold, and short message content is converted into voice and broadcasted.
Finally it should be noted that:Above example only in order to technical scheme to be described, rather than a limitation;Although
With reference to the foregoing embodiments the present invention has been described in detail, it will be understood by those within the art that:Which still may be used
To modify to the technical scheme described in previous embodiment, or equivalent is carried out to which part technical characteristic;And
These modifications are replaced, and do not make the essence of appropriate technical solution depart from spirit and the model of embodiment of the present invention technical scheme
Enclose.
Claims (8)
1. the bilingual GSM message breath voice of a kind of dimension, the Chinese changes broadcasting system, it is characterised in that be divided into software system and hardware system
Two ingredients of system.
The software system function is divided into note receiver module, Text Pretreatment module, building of corpus module, waveform concatenation and closes
Into four part of module;
Note receiver module:Responsible note is received, and obtains newest short message text in real time;When note is received, clawback is carried out,
DTMF decodings are carried out by MT8870 chips to input through keyboard, input password is obtained, and is verified, then pass through terminal control
Work to carry out the real-time processing of pushed information;
Text Pretreatment module:Carry out languages identification first, distinguish Chinese and Uighur text, be then based on Chinese, tie up me
The normalization rule of your language, carries out text regularization to Chinese, Uighur respectively, and Chinese is carried out point according to dictionary
Word, carries out cutting to Uighur according to existing syllable storehouse, obtains the pronunciation unit of text;
Building of corpus module:Build Chinese vocabulary, individual character corpus;Build Uighur syllable corpus;
Waveform concatenation synthesis module:For Chinese, the corresponding audio file of vocabulary is selected to be spliced, for Uighur, choosing
Select the corresponding audio file of syllable to be spliced;
Described hardware system is:Use the arm processor of cortex M3 kernels as control unit, use SIM900A GSM
Module is decoded, is adopted to MP3 audio files as note reception and voice transfer unit, using VS1003 audio decoders chip
SD card is driven with SDIO and set up FAT32 file system storage configuration files and audio file, used 400X240TFT3 cun of colour
Liquid crystal display screen display system each running status, MX3232 is used to realize system as the driving chip of RS232 interface circuits
Communication connection with PC, adopt RTC clock circuit provide for system accurate real-time clock, will using AMS1117 linear voltage regulators
The voltage of power supply adaptor is down to the running voltage of arm processor and other chips, embedded short message broadcasting system and can be received
The note of designated mobile phone simultaneously can actively to authorizing phone number to call back, by dual tone multi-frequency dtmf (Dual Tone Multi
Frequency) identification authorizes cell phone password, after being verified the short message sending for needing to report can be carried out phonetic synthesis to PC
And broadcast.
2. Text Pretreatment module according to claim 1, it is characterised in that the Text Pretreatment module uses languages
Particular location identification languages of the character in Unicode, distinguish Chinese and Uighur information, by engineering and the angle of technology
Degree, rule-based carries out Regularization to Chinese, Uighur file;To Chinese according to existing dictionary using positive maximum
Matching algorithm carries out participle, equally carries out cutting using Forward Maximum Method algorithm to Uighur according to existing syllable storehouse;
Languages identification includes the identification of voice languages, i.e., distinguish languages according to voice document;Text languages are recognized, i.e., according to text
To distinguish languages.
3. the bilingual GSM message breath voice of dimension according to claim 1, the Chinese changes broadcasting system, it is characterised in that the structure
Uighur syllable corpus is built, in phonetic synthesis, the pronunciation rule inside syllable can be shielded, be made the voice inside syllable
More natural, by including long, syllable can be avoided to a certain extent from directly joining sound phenomenon, increase syllable and syllable it
Between speech naturalness.
4. the bilingual GSM message breath voice of dimension according to claim 1, the Chinese changes broadcasting system, it is characterised in that described
It is the input method dictionary by collecting from the Internet dictionary and association analysiss dictionary to build Chinese vocabulary, individual character corpus, and
Popular comprehensive glossary storehouse on network, and all of dictionary is carried out duplicate removal collect, obtain unduplicated lexicon dictionary.
5. the bilingual GSM message breath voice of dimension according to claim 1, the Chinese changes broadcasting system, it is characterised in that institute's predicate
Material storehouse builds the voice border issue that module is present for the voice document in corpus, has carried out speech terminals detection, to language
Voice endpoint is marked.
6. the bilingual GSM message breath voice of dimension according to claim 1, the Chinese changes broadcasting system, it is characterised in that institute's predicate
Material storehouse builds module and has carried out fusion mel-frequency cepstrum coefficient (Mel-scale Frequency Cepstral to speech data
) and kNN sorting algorithm speech terminals detections Coefficients.
7. the bilingual GSM message breath voice of dimension according to claim 1, the Chinese changes broadcasting system, it is characterised in that described
Waveform concatenation synthesis module in splicing, using smoothing processing algorithm, by each syllable, vocabulary audio volume control the amplitude processing
Into being fade-in fade-out;Then by using strategies such as rhythm model, duration controls, pairing is optimized into voice.
8. the bilingual GSM message breath voice conversion broadcasting system of dimension according to claim 1 to 7, the Chinese, its feature exist, work
Method is:After SIM900A receives a new note, UART can be passed through and frame AT instructions are sent to STM32 controllers, be carried
It is shown with new note to be received;Now STM32 controllers send note to SIM900A and read instruction, read unread short messages, and carry
Take note and receive the information such as time, the phone number of note sender and short message content, while the configuration text of SD card can be read
Whether the phone number of part contrast short message sending side is authorization number, otherwise comes back to the state for waiting new message;Afterwards, control
Device processed is called back to the sender of note by AT command operatings SIM900A, and points out to be input into password;Send note side to answer
Password is pushed according to the short breath of voice message input after phone, STM32 controllers judge close according to the DTMF of SIM900A parsing characters
Whether code is correct.If password authentification passes through, the note for receiving before can be passed through RS232 communication interfaces by STM32 controllers, and
PC ends are sent to according to specific frame format, and short message content is converted into voice and broadcasted.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611044873.8A CN106507321A (en) | 2016-11-22 | 2016-11-22 | The bilingual GSM message breath voice conversion broadcasting system of a kind of dimension, the Chinese |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611044873.8A CN106507321A (en) | 2016-11-22 | 2016-11-22 | The bilingual GSM message breath voice conversion broadcasting system of a kind of dimension, the Chinese |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106507321A true CN106507321A (en) | 2017-03-15 |
Family
ID=58328489
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611044873.8A Pending CN106507321A (en) | 2016-11-22 | 2016-11-22 | The bilingual GSM message breath voice conversion broadcasting system of a kind of dimension, the Chinese |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106507321A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107680579A (en) * | 2017-09-29 | 2018-02-09 | 百度在线网络技术(北京)有限公司 | Text regularization model training method and device, text regularization method and device |
CN109031474A (en) * | 2018-08-31 | 2018-12-18 | 成都润联科技开发有限公司 | A kind of weather information hiding Chinese phonetic broadcasting terminals and its working method based on Beidou satellite communication |
WO2021134591A1 (en) * | 2019-12-31 | 2021-07-08 | 深圳市优必选科技股份有限公司 | Speech synthesis method, speech synthesis apparatus, smart terminal and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101082908A (en) * | 2007-06-26 | 2007-12-05 | 腾讯科技(深圳)有限公司 | Method and system for dividing Chinese sentences |
CN103164396A (en) * | 2011-12-19 | 2013-06-19 | 新疆新能信息通信有限责任公司 | Chinese-Uygur language-Kazakh-Kirgiz language electronic dictionary and automatic translating Chinese-Uygur language-Kazakh-Kirgiz language method thereof |
CN103164398A (en) * | 2011-12-19 | 2013-06-19 | 新疆新能信息通信有限责任公司 | Chinese-Uygur language electronic dictionary and automatic translating Chinese-Uygur language method thereof |
CN103165126A (en) * | 2011-12-15 | 2013-06-19 | 无锡中星微电子有限公司 | Method for voice playing of mobile phone text short messages |
-
2016
- 2016-11-22 CN CN201611044873.8A patent/CN106507321A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101082908A (en) * | 2007-06-26 | 2007-12-05 | 腾讯科技(深圳)有限公司 | Method and system for dividing Chinese sentences |
CN103165126A (en) * | 2011-12-15 | 2013-06-19 | 无锡中星微电子有限公司 | Method for voice playing of mobile phone text short messages |
CN103164396A (en) * | 2011-12-19 | 2013-06-19 | 新疆新能信息通信有限责任公司 | Chinese-Uygur language-Kazakh-Kirgiz language electronic dictionary and automatic translating Chinese-Uygur language-Kazakh-Kirgiz language method thereof |
CN103164398A (en) * | 2011-12-19 | 2013-06-19 | 新疆新能信息通信有限责任公司 | Chinese-Uygur language electronic dictionary and automatic translating Chinese-Uygur language method thereof |
Non-Patent Citations (1)
Title |
---|
白涛等: "基于词典和全切分的中文农业网页分词算法的研究", 《新疆农业大学学报》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107680579A (en) * | 2017-09-29 | 2018-02-09 | 百度在线网络技术(北京)有限公司 | Text regularization model training method and device, text regularization method and device |
CN107680579B (en) * | 2017-09-29 | 2020-08-14 | 百度在线网络技术(北京)有限公司 | Text regularization model training method and device, and text regularization method and device |
CN109031474A (en) * | 2018-08-31 | 2018-12-18 | 成都润联科技开发有限公司 | A kind of weather information hiding Chinese phonetic broadcasting terminals and its working method based on Beidou satellite communication |
WO2021134591A1 (en) * | 2019-12-31 | 2021-07-08 | 深圳市优必选科技股份有限公司 | Speech synthesis method, speech synthesis apparatus, smart terminal and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107516509B (en) | Voice database construction method and system for news broadcast voice synthesis | |
CN101141666B (en) | Method of converting text note to voice broadcast in mobile phone | |
CN102831195B (en) | Personalized speech gathers and semantic certainty annuity and method thereof | |
CN101008942A (en) | Machine translation device and method thereof | |
CN111951779B (en) | Front-end processing method for speech synthesis and related equipment | |
CN1731510B (en) | Text-speech conversion for amalgamated language | |
KR20080032640A (en) | Conversion of number into text and speech | |
CN1901041B (en) | Voice dictionary forming method and voice identifying system and its method | |
CN100592385C (en) | Method and system for performing speech recognition on multi-language name | |
CN106507321A (en) | The bilingual GSM message breath voice conversion broadcasting system of a kind of dimension, the Chinese | |
RU2419142C2 (en) | Method to organise synchronous interpretation of oral speech from one language to another by means of electronic transceiving system | |
CN112580335B (en) | Method and device for disambiguating polyphone | |
CN109859746B (en) | TTS-based voice recognition corpus generation method and system | |
Rabiner | The role of voice processing in telecommunications | |
CN102196100A (en) | Instant call translation system and method | |
CN112242134A (en) | Speech synthesis method and device | |
Tan | Design of intelligent speech translation system based on deep learning | |
Fung et al. | Concatenating syllables for response generation in spoken language applications | |
CN103366732A (en) | Voice broadcast method and device and vehicle-mounted system | |
RU80603U1 (en) | ELECTRONIC TRANSMISSION SYSTEM WITH THE FUNCTION OF SYNCHRONOUS TRANSLATION OF ORAL SPEECH FROM ONE LANGUAGE TO ANOTHER | |
Baum et al. | SpeechDat-AT: A telephone speech database for Austrian German | |
CN100365551C (en) | Words input method and apparatus for hand-held devices | |
CN111145727B (en) | Method and device for recognizing digital string by voice | |
CN115695943A (en) | Digital human video generation method, device, equipment and storage medium | |
TWI768412B (en) | Pronunciation teaching method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170315 |