CN2424513Y

CN2424513Y - Multifunctional speech identifying notebook and large capacity digital recording integrated machine

Info

Publication number: CN2424513Y
Application number: CN00233477U
Authority: CN
Inventors: 刘加; 刘润生; 薛晓光
Original assignee: ANKE'ER COMMUNICATION TECHN Co Ltd BEIJING; Tsinghua University
Current assignee: ANKE'ER COMMUNICATION TECHN Co Ltd BEIJING; Tsinghua University
Priority date: 2000-05-22
Filing date: 2000-05-22
Publication date: 2001-03-21
Anticipated expiration: 2010-05-22

Abstract

An invention of the utility model belongs to the technical field of speech and comprises a digital signal processor, a speech sampling codec, a microcontroller, a flash memory, a power supply manager, a microphone and a loudspeaker, wherein, the digital signal processor is solidified with a speech processing and managing program. The utility model has the functions of specified person speech recognition, speech prompts, speech playback, high quality digital speech recording of 0-8 hours, a calculator, timing waking, etc. Much commercial information is stored through a digital key or speech, and the utility model can also carry out long-time sound recording for a meeting, or conversation or classroom teaching. Much stored information can be conveniently inquired through speech recognition.

Description

Multifunction speech identification notepad and large capacity digital recording all-in-one

The present utility model belongs to the voice technology field, especially relates to adopting single-chip digital signal processor or microcontroller to realize speech recognition, phonetic synthesis, the multifunction speech notepad of encoding and decoding speech or the design of vocie personal digital assistants.

Voice technology-comprise that voice coding, phonetic synthesis, especially speech recognition technology are progressively ripe in the nineties, the voice product begins to occur on market.At present in the world, a lot of voice technology products can have been bought in particularly American-European market, as: the voice memo basis that U.S. Toshiba produces, the speech digit sound-track engraving apparatus that Toshiba Corp, Korea S Samsung produce, and have some mobile phone handsets of speech identifying function etc.This class speech digit sound-track engraving apparatus, voice memo originally generally comprise parts such as speech recognition or compress speech special chip, A/D, D/A, microcontroller, external memorizer, microphone, loudspeaker.There are speech recognition, phonetic synthesis or voice coding program in speech recognition or the compress speech special chip.In this series products, voice memo does not originally comprise the digital speech sound-recording function usually, or only can note down very short digital speech recording (1-2 minute).And the speech recognition technology that uses is not fine to the performance of Chinese speech identification, to some Chinese speechs that easily mix as " Li Ning ", " Li Ping ", the discrimination of similar Chinese speech pronunciations such as " Li Xing " is very low, and is just poorer to the recognition performance of Chinese digital voice " 0,1; 2; 3,4,5; 6; 7,8,9 ".Domestic also have some companies to take to the voice technology Products Development, the speech electronic notepad of producing as the special electronics technology of Guangdong Jiangmen three company limited, but the quality of its speech recognition performance and voice compression coding is all undesirable.To recording the also not free search function of digital speech, therefore use very inconvenient.Each phonetic function of these products is often disconnected from each other in addition, is not integrated on the monolithic.

The purpose of this utility model is intended to for overcoming the weak point of prior art, and handling procedures such as specific people's speech recognition of the applicant's exploitation, voice compression coding, phonetic synthesis are solidificated on the digital signal processing chip simultaneously.It not only can be discerned easily mix Chinese speech, can discern the Chinese digital voice, phonetic recognization rate is 99%, reaches realistic scale, and have the integrated level height, and function is many, and combination property is than good, and cost is low, volume is little, and is in light weight, and the outstanding feature of power consumptive province.

A kind of multifunction speech identification notepad that the utility model proposes and large capacity digital recording all-in-one (it is logical to be called for short voice memo) comprise digital signal processor, speech sample codec, microcontroller, flash memory, power supervisor, microphone, loudspeaker.It is characterized in that said digital signal processor links to each other with sampling codec and microcontroller with control line by data line, and link to each other with flash memory with control line, link to each other with said power supervisor by control line by data line, address wire; Said sampling codec and microphone, loudspeaker, and signal processor is continuous, finishes voice collecting digital-to-analog conversion and analog-digital conversion function; Said microcontroller links to each other with signal processor, display control circuit with data line by control line, is used for supervisory keyboard, shows and order; Be solidified with program, the system control program of speech recognition, phonetic synthesis and voice coding on the said digital signal processor, and storaged voice is carried out time retrieval and supervisory routine.

Said flash memory can comprise two flash memorys, and one of them is used for the speech data of store compressed, and another is used for the code book of storaged voice identification, and the speech processes program that needs external memory; Said power supervisor links to each other with these two flash memorys respectively by control line, and finishes voltage transitions and electricity saving function operation.

The also curable calculation procedure that a realization calculating, tally function are arranged of said microcontroller.

The utility model adopts and is solidified with the special speech processes of developing of the applicant and the general-purpose device of supervisory routine, and this unites two into one with the large capacity digital recording with voice memo, and the all-in-one multifunctional machine of formation has following characteristics:

1, realizes specific people's Chinese digital " 0 "～" 9 " speech recognition.

2, realize 200-400 key message (as name, unit name) speech identifying function, commute mixes the name voice very high discrimination.

3, adopt high-performance voice compressed encoding and decoding technology and voice activation technique, realize high capacity numeral voice sound-recording function, record length can reach 8 hours.Before the storage digital speech, earlier time mark stamped in voice, therefore have high precision speech retrieval function, can inquire that in such a month, and on such a day the some time is divided certain second the voice messaging of recording.

4, has vCommerce information recode function.

5, have voice suggestion and audio playback function, the user operates by voice suggestion; By audio playback the audio playback that the user deposits in is come out.

6, have the speech information retrieval function, utilize speech recognition technology, the voice inquirement key message will inquire for information about by screen display or voice playback output.

7, the utility model is to be speech recognition, voice coding, the phonetic synthesis module that core is formed with the digital signal processor.Anyly need replace staff industry control system operation occasion can use this module by speech command operation.Should be novel in conjunction with in the cell phone, this mobile phone has just possessed this function of voice memo.Outstanding feature such as this module has that volume is little, in light weight, power consumptive province, cost are low will bring great convenience to the user.In fields such as communication, Industry Control, household electrical appliance, intelligent toys great using value is arranged.

Brief Description Of Drawings:

Fig. 1 is an enforcement general structure synoptic diagram of the present utility model.

Fig. 2 is present embodiment speech processing module circuit theory diagrams ().

Fig. 3 is present embodiment speech processing module circuit theory diagrams (two).

Fig. 4 is a present embodiment speech processes FB(flow block).

A kind of multifunction speech identification notepad of the utility model design and the embodiment of large capacity digital recording all-in-one are described with reference to the accompanying drawings as follows:

The general structure of present embodiment as shown in Figure 1, it consists of: the U1 codec CODEC (TCM320AC37) that samples; U2 digital signal processor DSP (ADSP-218X); U3 microcontroller Micro Control UnitMCU (KS57C0400): U4 LCD LCD; The U5 keyboard; U6 and U7 flash memory; The U8 Power Management Unit.The each several part annexation of these devices such as Fig. 2, shown in Figure 3.Circuit connects with function declaration as follows:

1, voice are input in the U1 sample code device by microphone, and the sample code device is realized voice signal mould/number and D/A switch function as A/D and D/A.

2, be sent in the U2 signal processor by serial line interface through the digitized voice signal of U1, in signal processor, carry out signal Processing by speech recognition or voice compression coding, will export that the result deposits in the flash memory (U6 or U7) or convert audio digital signals to simulating signal output voice or display message is outputed on the U4 display screen LCD by the U3 microcontroller by U1.

3, in identification and compression process, the U2 signal processor will be to U6 or U7 flash memory read-write program and data.Therefore between U1 and U6, the U7 there be being connected of bi-directional data, address and control signal wire.

4, in total system work, the U3 microcontroller plays the master control effect to the U2 signal processor, U4 LCDs and U5 keyboard are controlled, and under the economize on electricity situation, carry out time counting and operate, therefore there is data line to be connected between U3 and the U2, has control line to be connected between U3 and U7 and the U8 with control line.

The control of present embodiment system is as follows with the speech processes procedure declaration:

1, microcontroller U3 plays the master control effect to total system:

(1) accepts keyboard commands, the mode of operation of control figure signal processor U2.

(2) accept the recognition result of digital signal processor, and output on the display screen.

(3) under the economize on electricity situation, carry out the time counting operation, keeping the time operate as normal of system.

2, control the work of sample code device U1 by digital signal processor U2.

After voice signal is input to U1 by microphone, in U1, carry out following processing:

(1) carry out Filtering Processing, filter bandwidht is 300～3400kHz.

(2) carrying out the 8k/s signal sampling handles.

(3) carry out A/D and be for conversion into 13 bit linear PCM audio digital signals, be input to then among the digital signal processor U2.

After synthetic speech and decoded speech output to U1, in U1, carry out following processing:

(1) carry out Filtering Processing, filter bandwidht is 300～3400kHz, removes high frequency noise.

(2) carry out the D/A conversion, convert digital speech to analog voice and output to loudspeaker.

3. after audio digital signals enters into the U2 signal processor,, carry out different speech processes, as shown in Figure 4, be described in detail as follows according to the requirement of speech recognition or voice coding:

A, when carrying out voice recognition processing:

(1) at first voice signal divides frame and frequency spectrum shaping by windowing, extracts speech characteristic parameter then, is used for speech recognition.

(2) carry out sound end and detect, irrelevant voice messaging in place to go and noise.

(3) when carrying out the speech recognition template training, by study the speech characteristic parameter that extracts is transformed into the speech recognition mode code book and is stored in the U6 flash memory 1; Simultaneously this speech waveform is compressed processing, be stored in the U7 flash memory 2 as the audio playback confirmation.

(4) when carrying out speech recognition, the speech characteristic parameter of input and the recognition template of storage are carried out the recognition result that pattern match is extracted the best.The Word message of recognition result is offered the U3 microcontroller output on the U4 display screen, the voice side of the returning information of storage is outputed to U1 sample code device change analog voice output into.

B, when carrying out the compress speech encoding and decoding:

(1) divide the processing of frame and frequency spectrum shaping with the voice signal windowing of input after, carrying out sound, the noiseless judgements of voice (voice activation) by short-time energy, zero-crossing rate handles, if there is voice signal to exist then voice activated coded program, carrying out voice coding handles, if do not have the voice signal input, then stop the work of voice compression coding device.

(2) voice compression coding adopts sign indicating number excitation (CELP) speech coding algorithm, use sign indicating number excitation voice coding model, extract linear forecasting parameter and vow code book with excitation, at last the parameter of voice coding model is carried out vector quantization, the digitize voice after the compression is stored among the flash memory U7 with date, time tag information then.Compress speech speed is 5.3kbits/s.When the store compressed voice signal, must comprise label informations such as date of being used for retrieving from now on, time.These label informations can be used for speech information retrieval easily.

(3) in the tone decoding process, can retrieve according to the temporal information of storage, extract the voice signal of different time recording, precision can be as accurate as second, and in such a month, and on such a day the some time is divided the voice messaging of recording in certain second as extracting certain year.

In the microcontroller of present embodiment, also be solidified with a calculation procedure of realizing calculating, tally function, and regularly wake function program.This program can adopt conventional method to realize, is not described in detail in this.

The using method of present embodiment is as follows:

When 1, using as notepad

(1) its input process is: at first input is used for the voice key message (as people's name, organization, place name etc.) of speech recognition, system extracts the identification that this voice key character parameter is used for the back by study, by keyboard input telephone number, postcode, I.D. digital information, everyone can store 4 telephone numbers then.Then pass through other commercial matters information of voice typing (for information about) etc. as address, e-mail address, post etc.;

(2) retrieving is: by the voice key message or manually by the strong canned data in advance that finds, comprising numerical information as telephone number, postcode, ID (identity number) card No. etc.; Voice messaging is as name, address, academic title etc.

When 2, using as sound-track engraving apparatus

(1) records automatically by button;

Import key messages such as date, time by keyboard when (2) retrieving, the voice of recording in the time of can extracting this also can carry out sequential search by button.

Claims

1, a kind of multifunction speech identification notepad and large capacity digital recording all-in-one comprise digital signal processor, speech sample codec, microcontroller, flash memory, power supervisor, microphone, loudspeaker.It is characterized in that said digital signal processor links to each other with sampling codec and microcontroller with control line by data line, and link to each other with flash memory with control line, link to each other with said power supervisor by control line by data line, address wire; Said sampling codec and microphone, loudspeaker, and signal processor links to each other; Said microcontroller links to each other with signal processor, display control circuit with data line by control line; Be solidified with program, the system control program of speech recognition, phonetic synthesis and voice coding on the said digital signal processor, and storaged voice is carried out time retrieval and supervisory routine.

2, multifunction speech identification notepad as claimed in claim 1 and large capacity digital recording all-in-one, it is characterized in that, said flash memory can comprise two flash memorys, one of them is used for the speech data of store compressed, another is used for the code book of storaged voice identification, and the speech processes program that needs external memory; Said power supervisor links to each other with these two flash memorys respectively by control line.

3, multifunction speech identification notepad as claimed in claim 1 and large capacity digital recording all-in-one is characterized in that, said microcontroller also is solidified with one and realizes the calculation procedure of calculating, tally function and regularly wake function program.