CN110047480A - Added Management robot head device and control for the inquiry of department, community hospital - Google Patents

Added Management robot head device and control for the inquiry of department, community hospital Download PDF

Info

Publication number
CN110047480A
CN110047480A CN201910321974.2A CN201910321974A CN110047480A CN 110047480 A CN110047480 A CN 110047480A CN 201910321974 A CN201910321974 A CN 201910321974A CN 110047480 A CN110047480 A CN 110047480A
Authority
CN
China
Prior art keywords
voice
frame
energy
steering engine
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910321974.2A
Other languages
Chinese (zh)
Inventor
王鹏
罗鹏
刘然
黎晓强
宋春宵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin University of Science and Technology
Original Assignee
Harbin University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin University of Science and Technology filed Critical Harbin University of Science and Technology
Priority to CN201910321974.2A priority Critical patent/CN110047480A/en
Publication of CN110047480A publication Critical patent/CN110047480A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transforming into visible information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transforming into visible information
    • G10L2021/105Synthesis of the lips movements from speech, e.g. for talking heads

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Probability & Statistics with Applications (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Manipulator (AREA)

Abstract

The present invention relates to the Added Management robot head device inquired for department, community hospital and controls;Community hospital is solved since medical worker's flowing is big, leads to the problem of lacking professional hospital guide personnel;The head device includes facial device, neck device and control system;The face device includes face support, voice module, mouth appliance;The neck device includes neck support and cervical motion device;It is directed to control system described in the background noise environment of hospital's complexity and speech processes is carried out using dsp chip, wherein preprocessing process uses improved double threshold end-point detection algorithm, improve the accuracy rate of speech recognition under low signal-to-noise ratio environment, while response clients phonetic problem, steering engine control panel control plane part device and neck device action is cooperated to complete the higher apery dialogue movement of anthropomorphic degree.

Description

Added Management robot head device and control for the inquiry of department, community hospital
Technical field
The invention belongs to intellect service robot field more particularly to a kind of auxiliary tubes for the inquiry of department, community hospital Manage robot head device and control.
Background technique
Robot is the core of high and new technology technology over the years, and society is being constantly progressive, and people want quality of life Ask also continuous raising, intelligent robot gradually enter into daily life, at present intelligent robot have been applied in welcome, The multiple fields such as conduct a sightseeing tour, keep a public place clean.
The present invention is applied to department, community hospital and inquires field, and country is greatly developing community medical service system at present, Minor illness can just can solve at home, alleviates the medical treatment pressure of large hospital significantly, but the status of community hospital at present It is that medical worker's amount of flow is big, lacks the hospital guide personnel of profession, results in hospital distributing diagnosis guiding doctor tache imperfection, medical environment noise The problems such as miscellaneous, medical experience difference, since the voice environment in hospital admission hall is complex, have footsteps, child cry and scream sound, hang The office sound etc. of number window, burst noise occurs now and then, and general audio recognition method identifies mistake under the lower environment of noise Accidentally rate is higher, and causing voice to be answered cannot proceed normally the normal office work for influencing hospital, delays patient assessment's speed, or even cause Serious consequence.
Summary of the invention
The present invention solves the above problem, provides a kind of Added Management robot head for the inquiry of department, community hospital Device and control, it uses improved double threshold end-point detection algorithm, improves the accurate of speech recognition under low signal-to-noise ratio environment Rate, if the location information of the correct just Quick-return clients department inquiry of identification, cooperates steering engine control panel control plane at the same time Part device and neck device complete the higher interactive action of anthropomorphic degree, improve the speech recognition of Added Management robot Ability and human-computer interaction are horizontal.
To solve the above-mentioned problems, the first object of the present invention is to provide a kind of for the auxiliary of department, community hospital inquiry Management robot head device is helped, second is designed to provide a kind of Added Management robot for the inquiry of department, community hospital The control method of head device.
First technical solution adopted by the present invention is:
A kind of Added Management robot head device for the inquiry of department, community hospital, including facial device, neck dress It sets and control system, the face device is set to above neck device;
The face device includes face support, voice module, mouth appliance;
The neck device includes neck support and cervical motion device;
The control system is based on dsp chip, including master control borad and steering engine control panel, and master control borad and steering engine control panel pass through Facial device and neck device action can be realized by crossing serial communication, and then complete the higher apery dialogue movement of anthropomorphic degree.
Further, the face support includes first facial support, the second face support, and second face support is imitative There are two circular through holes for the position processing in human eye portion, and the position of apery nose is machined with a circular through hole, and first facial branch Support is arranged vertically with the second face support;
Further, the voice module includes electret capacitor microphone, voice acquisition module, loudspeaker;The electret Body capacitance microphone is set in the circular through hole of the second face support apery nose shape, the voice acquisition module with Master control borad is all set in fixation on first facial support upper surface, and the loudspeaker are set to first facial support rearward edges and consolidate It is fixed;
Further, the mouth appliance includes the first fixed frame, the first steering engine, the first steering wheel, the first swing rod, mouth company Fitting, mouth component, the output shaft of first steering engine and the transmission connection of the first steering wheel, first steering wheel and the first swing rod One end is fixedly connected, and for realizing the movement of mouth occlusion, the other end and the mouth connector of first swing rod are fixed;It is described The front end face of mouth connector and the rear end face of mouth component are affixed, and first steering engine passes through the first fixed frame 1-3A and first The lower end surface of face support is affixed;
Further, the neck support includes neck pillar, first neck support, second neck support;The neck Pillar is supported for connecting first facial support with first neck, the first neck support and second neck support level cloth It sets;
Further, the cervical motion device include the second steering engine bracket, the second steering wheel, holder bracket, third steering wheel, Second steering engine, third steering engine;Second steering wheel is connected by the second steering engine bracket and first neck support, second steering wheel Output shaft and the second steering wheel be sequentially connected, second steering wheel be nested in the driving groove of the holder pedestal upper end transmit it is perpendicular Histogram is fixed to power needed for robot pitching motion, lower end and the holder bracket of third steering engine, and output shaft passes through described Holder pedestal lower end face and third steering wheel are sequentially connected, and the third steering wheel is nested in second neck support upper end driving groove, Power required for Robot neck is turned round is provided;
Further, the control system is based on dsp chip, including master control borad and steering engine control panel, master control borad and rudder Machine control panel is connected by serial communication, the SCLKX1 pin of dsp chip and the SCLK of voice acquisition module in the master control borad Pin connection provides the clock of 12MHz, and the BFSX1 pin of the dsp chip and the CS pin connection of voice acquisition module are used for Piece choosing, the BDX1 pin of the dsp chip and the SDIN pin connection of voice acquisition module are used for control interface, the DSP core BCLKX0, BCLKR0 of piece connect the clock for synchronous data transmission with the BCLK pin of voice acquisition module simultaneously, described BFSX0, BFSR0 pin of dsp chip and LRCIN, LRCOUT pin of voice acquisition module connect I2S formatted data input and Synchronous frame signal is exported, the BDX0 pin of dsp chip and the DIN pin of voice acquisition module connect, and are used for input stereo audio DAC, the BDR0 pin of dsp chip and the DOUT pin of voice acquisition module connect, as I2S formatted data output end, voice The MICIN of acquisition module is connect with electret capacitor microphone to be extracted for monophonic sounds;The LOUT of voice acquisition module with Loudspeaker are connected for exporting sound;CE1 external storage is given SDRAM by dsp chip EMIFA peripheral hardware in the master control borad, will CE2 external storage gives FLASH, and UART1_RX, UART1_DX pin of the dsp chip are respectively at steering engine control panel The connection of TX, RX pin, S1, S2, S3 pin of the steering engine control panel respectively with the first steering engine, the second steering engine, third steering engine The connection of digital control end, above-mentioned three kinds of steering engines by 5V power supply power supply, board mounted power is by 3.3V power supply power supply.
Second technical solution adopted by the present invention is:
Based on a kind of control method of the Added Management robot head device for the inquiry of department, community hospital, packet Include following steps:
Step a, electret capacitor microphone acquires clients information of acoustic wave (monophonic) in real time, will through voice acquisition module Sound wave is AD converted and decodes final output electric signal;
Step b, data are carried out through decoded electric signal input digital signal processor (dsp chip) of voice acquisition module Processing, by the text information of identification network output identification;
Step c, the text information recognized and key words text identification list LD_AsrAddFixed () are carried out pair Than comparing and successfully then exporting corresponding hexadecimal number;
Step d, dsp chip exports the corresponding hexadecimal number of real-time voice to type by peripheral hardware serial communication (UART) Number steering engine control panel for being LSC-16-V1.1, control cervical motion mechanism, in mouth opening and closing motor control, by what is recognized Corresponding audio MP3 data write-in FIF0 register (byte every time) nMp3Pos++ of voice messaging is passed to voice acquisition module, It is exported by the loudspeaker and answers voice;
Further, dsp chip described in step b carries out data processing, specifically includes:
Step b1, the pretreatment of audio digital signals specifically includes preemphasis, the framing of voice signal adding window and voice and increases By force, wherein preemphasis is improved to high frequency section spectral magnitude, and low frequency part spectrum amplitude is suitably inhibited, and removes mouth The influence of lip radiation, keeps frequency spectrum more smoothing speech more melodious, is specifically realized using single order FIR high-pass digital filter pre- It aggravates, transmission function is H (z)=1-az-1, 0.9 < a < 1.0, when wherein a is that pre emphasis factor present invention selection a=0.98 sets n The speech sample value at quarter is x (n), and result is y (n)=x (n)-ax (n-1) after preemphasis processing, and code is realized:
Data, time, framerate, nframes=wavread (filename) [: 4]
Pre_emphasis=0.98
Emphasized_signal=numpy.append (data [0], data [1 :]-pre_emphasis*data [:- 1])
Step b2, since the short-term stationarity of voice signal needs to carry out adding window framing to voice signal, every frame length is 20ms, sample frequency 8000Hz, obtaining every frame sign nw is 160samples (nw=160), since frame moves inc and frame length nw Ratio generally take between 0 to 0.5, the present invention take frame move inc be 40samples, in practical applications usually using on time shaft Position is variable and has the window function of one fixed width to carry out segmentation interception to voice, and the present invention uses Hamming window winfunc= Signal.hamming (nw), calculating frame number nf=int (numpy.ceil ((1.0*signal_length-nw+inc)/ Inc)), not overlapping frame total length pad_length=int ((nf-1) * inc+nw) is calculated, with the full original signal of spot patch to pad_ Length, zeros=numpy.zeros ((pad_length-signal_length)), are merged into an array pad_ Signal=numpy.concatenate ((signal, zeros)), obtains frame number group indices=numpy.tile (numpy.arange (0, nw), (nf, 1))+numpy.tile (numpy.arange (0, nf*inc, inc), (nw, 1)) .T, Voice signal is subjected to framing frames=pad_signal [indices] again, finally again by framing signal adding window Win= numpy.tile(winfunc,(nf,1));
Step b3, the later voice messaging of adding window framing we need to intercept voice segments and non-speech segment, to reduce the later period Dsp chip carries out the rate and energy consumption of mobile equipment of voice coding, compared with the general end-point detection based on energy, due to clear The short-time energy value of line point is smaller, and the missing of unvoiced part leads to that sound bite is complete, identification inaccuracy, and the present invention is using changing Into double threshold end-point detection algorithm, first pass through short-time energy detection and distinguish silence clip and voiced segments, then by putting down in short-term Unvoiced part is different from noise and extracted from silence clip by equal zero-crossing rate detection, general unvoiced part all before voiced sound, General double-threshold comparison algorithm only has under SNR environment preferable as a result, still knowing under the conditions of this Complex Channel of hospital Not rate substantially reduces, and sudden noise often causes short-time energy or short-time average zero-crossing rate very high under hospital's scene, increases The workload and error rate of big later period speech recognition, the present invention are double using short-time average zero-crossing rate energy and short-time average zero-crossing rate The method of re-detection identifies voiceless sound section voice, enhances the accuracy rate of speech recognition under low signal-to-noise ratio environment;
Judge a frame for the condition criterion of reliable voiceless sound by short-time average zero-crossing rate Z be greater than zero-crossing rate threshold value ZH, increase The criterion of one voiceless sound energy threshold uses short-time energy and short-time average zero-crossing rate double check unvoiced part, described flat The formula of equal short-time average zero-crossing rate detection are as follows:Wherein s is the value of sampled point, T For frame length, function sgn { stst-1< 0 } in stst-1< 0 be true duration is 1, is otherwise 0, using the later voice messaging of framing adding window FrameData=Frame input calculates the function ZCR () of short-time average zero-crossing rate, goes through all over all frame for i in voice signal Inrange (frameNum), wherein frameNum indicates totalframes, by the mobile temp=that is multiplied with original signal of frame SingleFrame [: frameSize-1] * singleFrame [1:frameSize], symbol temp=numpy.sign at this time (temp) then short-time average zero-crossing rate is zcr [i]=numpy.sum (temp < 0);
If n-th frame voice signal xn(m) short-time energy spectrum uses EnIndicate, then its calculation formula is:Energy () function is inputted using the later voice messaging frameData=Frame of framing adding window, It goes through all over all frame for i in range (frameNum) in voice signal, wherein frameNum indicates totalframes, by each frame Energy value be multiplied to obtain short-time energy ener [i]=sum (singleframe*singleframe);
First voice segments are segmented with higher energy threshold MH first, when short-time energy be greater than energy threshold, Energy [i] > MH, includes down this frame A.append (i), hereafter re-records this frame when short-time energy is less than energy threshold A.append (i), if wherein needing to judge that the basic time of this speech recognition is less than 500ms and (is by frame number circulation here Time flow, Gu 49.5 frames are 500ms) it is not recorded as voice segments then, it is judged as that burst noise, specific implementation code are as follows:
It is hereby achieved that preliminary voice segments matrix A [], carries out into one A [] by lower energy threshold ML Voice is started to be respectively processed with end point by step analysis, if j%2==1 by the odevity of frame number, voice segments starting The energy of frame is if more than lower energy threshold ML, while i<len (energy) and energy [i]>ML, then number of speech frames To the left (preceding) mobile frame i=i+1 expands forward voice segments, similarly until being equal to or less than lower energy threshold ML The energy of voice segments end frame then expands backward if more than lower energy threshold ML, while i > 0and energy [i] > ML Voice segments i=i -1, last voice segments are extended to matrix B [];
Finally the voice segments B [] obtained after energy measuring is expanded schwa part by detecting by short-time average zero-crossing rate For j in range in voice segments (len (B)) is opened up, voice is started and tied by the odevity of frame number by if j%2==1 Beam spot is respectively processed, and the frame number in B [] is taken to index i=B [j], is less than totalframes len (zeroCrossingRate) in i Under the premise of, when the average short-time zero-crossing rate of index point is greater than 3 times of zero-crossing rate average value Zs, speech frame is extended into forward i=i+ 1, it is on the contrary then extend i=i-1 backward, it is embodied in C.append (i) in C [] array;
Step b4, the present invention joins voice signal using the method for Mel cepstrum parameter coefficient (MFCC) feature extraction Number analysis, in order to obtain one group of data of the feature that one group can characterize this section of voice signal;
Step b5, using MFCC as the characteristic parameter of voice signal, voice letter is carried out using Hidden Markov Model (HMM) The matching of number parameter;
Step b6, it is calculated again by Recognition with Recurrent Neural Network (Recurrent Neural Network, RNN) language model The corresponding candidate recognition result of puzzlement degree minimum value is determined as objective result, wherein RNN language by the puzzlement degree of each candidate result Speech model is obtained by the training of wikipedia Chinese corpus, and sound wave is finally identified as text sentence;
Further, specific step is as follows for the selection of Mel cepstrum parameter coefficient MFCC of the present invention described in step b4:
Step b401, the sound bite extracted after end-point detection is become in the time domain as unit of each frame in short-term Feature, then Fast Fourier Transform (FFT) (FFT) is carried out, it is the feature on frequency domain, fft_signal=by the Feature Conversion in time domain numpy.fft.fft(Frame.T);
Step b402, followed by frequency conversion, specific method is filtered by Mel filtering group, practical for frequency Some simple triangular filters in domain, firstly, convert mel-frequency for frequency hz because human ear differentiate the size of sound with Frequency and non-linear direct ratio so turning to mel-frequency linear separation again, then convert hz frequency for mel-frequency, and find The corresponding position hz, then the position corresponded in the variation of fft fast Fourier is found, the expression formula of filter is finally established, The size NFFT default of middle number of filter filters_num=20, FFT take 512, and the filter is in first frequency and third frequency Rate is 0, is 1 at second frequency, calculates three angular separation filter fb=get_filter_Banks (filters_num of Meier =20, NFFT=512, samplerate=8000, low_freq=0, high_freq=None), the section audio is calculated later The energy spectrum of each frame of signal simultaneously carries out summation energy=numpy.sum (spec_power, 1), to filter and energy spectrum Carry out dot product feat=numpy.dot (spec_power, fb.T), function return feat and energy two value, by feat into Row logarithm operation feat=numpy.log (feat);
Step b403, discrete cosine transform (DTC) is carried out to above-mentioned log spectrum feat, only takes preceding 13 coefficients, Feat=dct (feat, type=2, axis=1, norm='ortho') [:: cep_num], wherein cep_num=13, then Feat is promoted into feat=lifter (feat, cep_lifter) by cepstrum, finally obtains MFCC;
Further, the present invention described in step d is turned using the steering engine control panel control steering engine of model LSC-16-V1.1 Dynamic to realize countenance and cervical motion, the steering engine control panel UART serial ports receives hexadecimal number, corresponding to complete not Same facial expressions and acts and cervical motion, No. 1 pin output pwm signal control the rotation of the first steering engine and complete the robot The opening and closing of chin act, and No. 2 pin output pwm signals control the revolution that the Robot neck is completed in the rotation of the second steering engine The pitching motion of the Robot neck is completed in movement, No. 3 pin output pwm signals control third steering engine rotation.
The present invention has the advantages that compared with the existing technology
1, the present invention provide it is a kind of for department, community hospital inquiry Added Management robot head device and control, adopt With improved double threshold end-point detection algorithm, general double threshold end-point detection algorithm only has preferable knot under high SNR environment Fruit, and recognition efficiency substantially reduces under the conditions of Complex Channel, more for burst noise under hospital's scene, noise is judged by accident several Rate is very big, and the present invention is directed to the starting point of speech recognition, i.e., takes the detection of voiceless sound section and used short-time average zero-crossing rate energy and short When Average zero-crossing rate double check method, more accurately realize the accurate identification of voice segments, enhance low signal-to-noise ratio ring The accuracy rate of speech recognition under border.
2, corresponding facial expression movement and neck may be implemented in the present invention while voice answer-back clients problem Movement, apery degree high-tech feeling is strong, and clients can inquire all department's position problems of hospital to the robot, and Inside lists of keywords to identification sentence structure be made that a variety of correspondences, may be implemented a variety of ways to put questions (XX section how to get to/I Want to go to XX section/building XX Ke Ji) it can identify rapidly and export same as a result, realizing hommization identification and fast response time.
3, robot head device profile mechanism of the present invention is succinct, in head arrangement of mechanism rationally, generate noise it is small, Completion facial expressions and acts are natural, speech recognition is accurate, mechanism fast response time, solve community hospital since medical worker is flowed Greatly, lead to the problem of lacking professional hospital guide personnel, make clients while medical in time see a doctor, also obtain pleasure in emotion, Be conducive to the rehabilitation of disease.
Detailed description of the invention
Fig. 1 is apparatus of the present invention main view;
Fig. 2 is apparatus of the present invention rear structure figure;
Fig. 3 is mouth appliance of the present invention, neck device partial view;
Fig. 4 is voice acquisition module circuit diagram of the present invention;
Fig. 5 is control system dsp chip circuit diagram of the present invention;
Fig. 6 is the circuit diagram that steering engine control panel of the present invention is connect with the first steering engine, the second steering engine, third steering engine;
Fig. 7 is the corresponding hexadecimal table of control system identification of the present invention match cognization list Hou Ge department;
Fig. 8 is that the improved double-threshold comparison algorithm of the present invention realizes that end-point detection and general-purpose algorithm realize end-point detection comparison Figure;
Fig. 9 is Whole Work Flow figure of the present invention;
Figure 10 is the improved double threshold end-point detection algorithm schematic diagram of the present invention.
In figure: facial device 1, neck device 2, control system 3, face support 1-1, voice module 1-2, mouth appliance 1- 3, neck support 2-1, cervical motion device 2-2, master control borad 3-1, steering engine control panel 3-2, first facial support 1-1A, the second face Portion supports 1-1B, electret capacitor microphone 1-2A, voice acquisition module 1-2B, loudspeaker 1-2C, the first fixed frame 1-3A, first Steering engine 1-3B, the first steering wheel 1-3C, the first swing rod 1-3D, mouth connector 1-3E, mouth component 1-3F, neck pillar 2-1A, First neck support 2-1B, second neck support 2-1C, the second steering engine bracket 2-2A, the second steering wheel 2-2B, holder bracket 2-2C, Third steering wheel 2-2D, the second steering engine 2-2E, third steering engine 2-2F.
Specific embodiment
Below with reference to attached drawing, the present invention is described in detail.
Specific embodiment one
A kind of Added Management robot head device for the inquiry of department, community hospital, as shown in Figure 1 and Figure 2, including face Part device 1, neck device 2 and control system 3, the face device 1 are set to 2 top of neck device;
The face device 1 includes face support 1-1, voice module 1-2, mouth appliance 1-3;
The neck device 2 includes neck support 2-1 and cervical motion device 2-2;
The control system 3 is based on dsp chip, including master control borad 3-1 and steering engine control panel 3-2, master control borad 3-1 and rudder Machine control panel 3-2 can realize that facial device 1 and neck device 2 act by serial communication, and then it is higher to complete anthropomorphic degree Apery dialogue movement.
Specific embodiment two
As shown in Figure 1, Figure 2, Figure 3 shows, on the basis of specific embodiment one, the face support 1-1 includes the first face Portion supports 1-1A, the second face support 1-1B, and the position of the second face support apery eye is processed there are two circular through hole, The position of apery nose is machined with a circular through hole, and first facial support 1-1A cloth vertical with the second face support 1-1B It sets;The voice module 1-2 includes electret capacitor microphone 1-2A, voice acquisition module 1-2B, loudspeaker 1-2C;The electret Body capacitance microphone 1-2A is set in the circular through hole of the second face support 1-1B apery nose shape, and the voice is adopted Collection module 1-2B and master control borad 3-1 is all set on the first facial support upper surface 1-1A and fixes, and the loudspeaker 1-2C is set to First facial supports 1-1A rearward edges and fixation;
The mouth appliance 1-3 includes the first fixed frame 1-3A, the first steering engine 1-3B, the first steering wheel 1-3C, the first swing rod 1-3D, mouth connector 1-3E, mouth component 1-3F, the output shaft of the first steering engine 1-3B and the first steering wheel 1-3C transmission connect It connecing, the first steering wheel 1-3C is fixedly connected with one end of the first swing rod 1-3D, for realizing the movement of mouth occlusion, described the The other end of one swing rod 1-3D is fixed with mouth connector 1-3E;The front end face and mouth component 1- of the mouth connector 1-3E The rear end face of 3F is affixed, and the first steering engine 1-3B supports the lower end surface of 1-1A solid by the first fixed frame 1-3A and first facial It connects;
Electret capacitor microphone 1-2A is the prior art in present embodiment, and manufacturer is the triumphant digital franchise of day scholar Shop, model DGHSEM1465U.
Voice acquisition module 1-2B is the prior art in present embodiment, and manufacturer is extremely than this digital monopolized store, type Number be TLC320AC01.
As shown in Figure 1 and Figure 2, the neck support 2-1 includes neck pillar 2-1A, first neck support 2-1B, the second neck Portion supports 2-1C;The neck pillar 2-1A is for connecting first facial support 1-1A and first neck supports 2-1B, and described the One neck support 2-1B and second neck support 2-1C are horizontally disposed;
The cervical motion device 2-2 includes the second steering engine bracket 2-2A, the second steering wheel 2-2B, holder bracket 2-2C, the Three steering wheel 2-2D, the second steering engine 2-2E, third steering engine 2-2F;The second steering wheel 2-2B passes through the second steering engine bracket 2-2A and the One neck support 2-1B connection, the output shaft of the second steering wheel 2-2B and the second steering wheel 2-2B are sequentially connected, second rudder Disk 2-2B is nested in the driving groove of the upper end holder bracket 2-2C to transmit and move needed for vertical direction robot pitching motion Power, lower end and the holder bracket 2-2C of third steering engine 2-2F are fixed, output shaft pass through the lower end surface holder bracket 2-2C with Third steering wheel 2-2D transmission connection, the third steering wheel 2-2D are nested in the second neck support upper end 2-1C driving groove, provide Power required for Robot neck is turned round;
The first steering engine 1-3B is the prior art in present embodiment, and manufacturer is the franchise of Rong Yu Washington electronic component Shop, model RDS3115.
The second steering engine 2-2E is the prior art in present embodiment, and manufacturer is the franchise of Rong Yu Washington electronic component Shop, model RDS3115.
Third steering engine 2-2F is the prior art in present embodiment, and manufacturer is the franchise of Rong Yu Washington electronic component Shop, model RDS3115.
As shown in Fig. 4, Fig. 5, Fig. 6, the control system 3 is based on dsp chip, including master control borad 3-1 and steering engine control Plate 3-2, master control borad 3-1 and steering engine control panel 3-2 are connect by serial communication, and the SCLKX1 of dsp chip draws in the master control borad Foot is connect with the SCLK pin of voice acquisition module 1-2B provides the clock of 12MHz, the BFSX1 pin and language of the dsp chip The CS pin connection of sound acquisition module 1-2B is selected for piece, and the BDX1 pin of the dsp chip is with voice acquisition module 1-2B's The connection of SDIN pin is used for control interface, and BCLKX0, BCLKR0 of the dsp chip are simultaneously with voice acquisition module 1-2B's The connection of BCLK pin is used for the clock of synchronous data transmission, BFSX0, BFSR0 pin and voice acquisition module of the dsp chip LRCIN, LRCOUT pin of 1-2B connects I2S formatted data outputs and inputs synchronous frame signal, the BDX0 pin of dsp chip with The DIN pin of voice acquisition module 1-2B connects, and is used for input stereo audio DAC, the BDR0 pin and voice collecting mould of dsp chip The DOUT pin of block 1-2B connects, as I2S formatted data output end, MICIN and the electret electricity of voice acquisition module 1-2B Hold microphone 1-2A connection to extract for monophonic sounds;The LOUT of voice acquisition module 1-2B is connect for defeated with loudspeaker 1-2C Sound out;CE1 external storage is given SDRAM by dsp chip EMIFA peripheral hardware in the master control borad, by CE2 external storage point To FLASH, UART1_RX, UART1_DX pin of the dsp chip are connected respectively at TX, RX pin of steering engine control panel 3-2, S1, S2, S3 pin of the steering engine control panel 3-2 respectively with the first steering engine 1-3B, the second steering engine 2-2E, third steering engine 2-2F The connection of digital control end, above-mentioned three kinds of steering engines by 5V power supply power supply, board mounted power is by 3.3V power supply power supply.
Dsp chip is the prior art in present embodiment, and manufacturer is Guangdong Jia Xin microelectronics monopolized store, model TMS320C6455。
The course of work:
The electret capacitor microphone 1-2A acquires clients speech sound waves information, converts through voice acquisition module 1-2B At electric signal, it is input in master control borad after voice acquisition module decoding, Speech processing is carried out, wherein specifically including Voice signal preemphasis, adding window, framing, end-point detection (present invention improved double threshold end-point detection algorithm), feature extraction with And decoding, last output character, by the text information recognized and key words text identification list LD_AsrAddFixed () It compares, successful match then exports corresponding hexadecimal number, and hexadecimal number is converted to character and ganged up by the dsp chip It crosses serial ports UART and exports and give steering engine control panel 3-2, character string is converted to hexadecimal by master control in the steering engine control panel 3-2 plate Number, then execute hexadecimal number corresponding steering engine movement, while dsp chip is by the corresponding response audio of the voice messaging recognized MP3 data are passed to voice acquisition module 1-2B and export response voice by loudspeaker 1-2C.
Specific embodiment three
As shown in Fig. 7, Fig. 8, Fig. 9, what the Added Management robot head device for the inquiry of department, community hospital was realized Control method, comprising the following steps:
Step a, electret capacitor microphone 1-2A acquires clients information of acoustic wave (monophonic) in real time, through voice collecting mould Sound wave is AD converted and is decoded final output electric signal by block 1-2B;
Step b, it is carried out through decoded electric signal input digital signal processor (dsp chip) of voice acquisition module 1-2B Data processing, by the text information of identification network output identification;
Step c, the text information recognized and key words text identification list LD_AsrAddFixed () are carried out pair Than comparing and successfully then exporting corresponding hexadecimal number;
Step d, dsp chip exports the corresponding hexadecimal number of real-time voice to type by peripheral hardware serial communication (UART) Number steering engine control panel 3-2 for being LSC-16-V1.1, control cervical motion mechanism, in mouth opening and closing motor control, will recognize Voice messaging corresponding audio MP3 data write-in FIF0 register (byte every time) nMp3Pos++ be passed to voice collecting mould Block 1-2B is exported by the loudspeaker 1-2C and is answered voice;
Further, dsp chip described in step b carries out data processing, specifically includes:
Step b1, the pretreatment of audio digital signals specifically includes preemphasis, the framing of voice signal adding window and voice and increases By force, wherein preemphasis is improved to high frequency section spectral magnitude, and low frequency part spectrum amplitude is suitably inhibited, and removes mouth The influence of lip radiation, keeps frequency spectrum more smoothing speech more melodious, is specifically realized using single order FIR high-pass digital filter pre- It aggravates, transmission function is H (z)=1-az-1, 0.9 < a < 1.0, when wherein a is that pre emphasis factor present invention selection a=0.98 sets n The speech sample value at quarter is x (n), is described by preemphasis processing result with formula are as follows: y (n)=x (n)-ax (n-1), code is retouched It states are as follows: emphasized_signal=numpy.append (data [0], data [1 :]-pre_emphasis*data [:- 1]), wherein emphasized_signal indicates that pretreated audio data, data indicate sampled value, and pre_emphasis is Pre emphasis factor 0.98;
Step b2, since the short-term stationarity of voice signal needs to carry out adding window framing to voice signal, every frame length is 20ms, sample frequency 8000Hz, obtaining every frame sign nw is 160samples (nw=160), since frame moves inc and frame length nw Ratio generally take between 0 to 0.5, the present invention take frame move inc be 40samples, in practical applications usually using on time shaft Position is variable and has the window function of one fixed width to carry out segmentation interception to voice, and the present invention uses Hamming window winfunc= Signal.hamming (nw), calculating frame number nf=int (numpy.ceil ((1.0*signal_length-nw+inc)/ Inc)), not overlapping frame total length pad_length=int ((nf-1) * inc+nw) is calculated, with the full original signal of spot patch to pad_ Length, zeros=numpy.zeros ((pad_length-signal_length)), are merged into an array pad_ Signal=numpy.concatenate ((signal, zeros)), obtains frame number group indices=numpy.tile (numpy.arange (0, nw), (nf, 1))+numpy.tile (numpy.arange (0, nf*inc, inc), (nw, 1)) .T, Voice signal is subjected to framing frames=pad_signal [indices] again, finally again by framing signal adding window Win= numpy.tile(winfunc,(nf,1));
Step b3, the later voice messaging of adding window framing we need to intercept voice segments and non-speech segment, to reduce the later period Dsp chip carries out the rate and energy consumption of mobile equipment of voice coding, compared with the general end-point detection based on energy, due to clear The short-time energy value of line point is smaller, and the missing of unvoiced part leads to that sound bite is complete, identification inaccuracy, and the present invention is using changing Into double threshold end-point detection algorithm, first pass through short-time energy detection and distinguish silence clip and voiced segments, then by putting down in short-term Unvoiced part is different from noise and extracted from silence clip by equal zero-crossing rate detection, general unvoiced part all before voiced sound, General double-threshold comparison algorithm only has under SNR environment preferable as a result, still knowing under the conditions of this Complex Channel of hospital Not rate substantially reduces, and sudden noise often causes short-time energy or short-time average zero-crossing rate very high under hospital's scene, increases The workload and error rate of big later period speech recognition, the present invention are double using short-time average zero-crossing rate energy and short-time average zero-crossing rate The method of re-detection identifies voiceless sound section voice, enhances the accuracy rate of speech recognition under low signal-to-noise ratio environment;
Judge a frame for the condition criterion of reliable voiceless sound by short-time average zero-crossing rate Z be greater than zero-crossing rate threshold value ZH, increase The criterion of one voiceless sound energy threshold uses short-time energy and short-time average zero-crossing rate double check unvoiced part, described flat The formula of equal short-time average zero-crossing rate detection are as follows:Wherein s is the value of sampled point, T For frame length, function sgn { stst-1< 0 } in stst-1< 0 be true duration is 1, is otherwise 0, using the later voice messaging of framing adding window FrameData=Frame input calculates the function ZCR () of short-time average zero-crossing rate, goes through all over all frame for i in voice signal Inrange (frameNum), wherein frameNum indicates totalframes, by the mobile temp=that is multiplied with original signal of frame SingleFrame [: frameSize-1] * singleFrame [1:frameSize], symbol temp=numpy.sign at this time (temp) then short-time average zero-crossing rate is zcr [i]=numpy.sum (temp < 0);
If n-th frame voice signal xn(m) short-time energy spectrum uses EnIndicate, then its calculation formula is:Energy () function is inputted using the later voice messaging frameData=Frame of framing adding window, It goes through all over all frame for i in range (frameNum) in voice signal, wherein frameNum indicates totalframes, by each frame Energy value be multiplied to obtain short-time energy ener [i]=sum (singleframe*singleframe);
First voice segments are segmented with higher energy threshold MH first, when short-time energy be greater than energy threshold, Energy [i] > MH, includes down this frame A.append (i), hereafter re-records this frame when short-time energy is less than energy threshold A.append (i), if wherein needing to judge that the basic time of this speech recognition is less than 500ms and (is by frame number circulation here Time flow, Gu 49.5 frames are 500ms) it is not recorded as voice segments then, it is judged as that burst noise, specific implementation code are as follows:
It is hereby achieved that preliminary voice segments matrix A [], carries out into one A [] by lower energy threshold ML Voice is started to be respectively processed with end point by step analysis, if j%2==1 by the odevity of frame number, voice segments starting The energy of frame is if more than lower energy threshold ML, while i<len (energy) and energy [i]>ML, then number of speech frames To the left (preceding) mobile frame i=i+1 expands forward voice segments, similarly until being equal to or less than lower energy threshold ML The energy of voice segments end frame then expands backward if more than lower energy threshold ML, while i > 0and energy [i] > ML Voice segments i=i -1, last voice segments are extended to matrix B [];
Finally the voice segments B [] obtained after energy measuring is expanded schwa part by detecting by short-time average zero-crossing rate For j in range in voice segments (len (B)) is opened up, voice is started and tied by the odevity of frame number by if j%2==1 Beam spot is respectively processed, and the frame number in B [] is taken to index i=B [j], is less than totalframes len (zeroCrossingRate) in i Under the premise of, when the average short-time zero-crossing rate of index point is greater than 3 times of zero-crossing rate average value Zs, speech frame is extended into forward i=i+ 1, it is on the contrary then extend i=i-1 backward, it is embodied in C.append (i) in C [] array;
Step b4, the present invention joins voice signal using the method for Mel cepstrum parameter coefficient (MFCC) feature extraction Number analysis, in order to obtain one group of data of the feature that one group can characterize this section of voice signal;
Step b5, using MFCC as the characteristic parameter of voice signal, voice letter is carried out using Hidden Markov Model (HMM) The matching of number parameter;
Step b6, it is calculated again by Recognition with Recurrent Neural Network (Recurrent Neural Network, RNN) language model The corresponding candidate recognition result of puzzlement degree minimum value is determined as objective result, wherein RNN language by the puzzlement degree of each candidate result Speech model is obtained by the training of wikipedia Chinese corpus, and sound wave is finally identified as text sentence;
Further, specific step is as follows for the selection of Mel cepstrum parameter coefficient MFCC of the present invention described in step b4:
Step b401, the sound bite extracted after end-point detection is become in the time domain as unit of each frame in short-term Feature, then Fast Fourier Transform (FFT) (FFT) is carried out, it is the feature on frequency domain, fft_signal=by the Feature Conversion in time domain numpy.fft.fft(Frame.T);
Step b402, followed by frequency conversion, specific method is filtered by Mel filtering group, practical for frequency Some simple triangular filters in domain, firstly, convert mel-frequency for frequency hz because human ear differentiate the size of sound with Frequency and non-linear direct ratio so turning to mel-frequency linear separation again, then convert hz frequency for mel-frequency, and find The corresponding position hz, then the position corresponded in the variation of fft fast Fourier is found, the expression formula of filter is finally established, The size NFFT default of middle number of filter filters_num=20, FFT take 512, and the filter is in first frequency and third frequency Rate is 0, is 1 at second frequency, calculates three angular separation filter fb=get_filter_banks (filters_num of Meier =20, NFFT=512, samplerate=8000, low_freq=0, high_freq=None), the section audio is calculated later The energy spectrum of each frame of signal simultaneously carries out summation energy=numpy.sum (spec_power, 1), to filter and energy spectrum Carry out dot product feat=numpy.dot (spec_power, fb.T), function return feat and energy two value, by feat into Row logarithm operation feat=numpy.log (feat);
Step b403, discrete cosine transform (DTC) is carried out to above-mentioned log spectrum feat, only takes preceding 13 coefficients, Feat=dct (feat, type=2, axis=1, norm='ortho') [:: cep_num], wherein cep_num=13, then Feat is promoted into feat=lifter (feat, cep_lifter) by cepstrum, finally obtains MFCC;
Further, the present invention described in step d uses the steering engine control panel 3-2 control flaps of model LSC-16-V1.1 Machine rotation realizes that countenance and cervical motion, the steering engine control panel 3-2UART serial ports receive hexadecimal number, corresponding Different facial expressions and acts and cervical motion are completed, No. 1 pin output pwm signal controls described in the rotation completion of the first steering engine The opening and closing of robot chin act, and No. 2 pin output pwm signals control the rotation of the second steering engine and complete the Robot neck Revolution movement, the pitching that the Robot neck is completed in No. 3 pin output pwm signals control third steering engine rotation is dynamic Make.

Claims (8)

1. the Added Management robot head device for the inquiry of department, community hospital, which is characterized in that including facial device (1), neck device (2) and control system (3), the face device (1) are set to above neck device (2);
The face device (1) includes face support (1-1), voice module (1-2), mouth appliance (1-3);
The neck device (2) includes neck support (2-1) and cervical motion device (2-2);
The control system (3) is based on dsp chip, including master control borad (3-1) and steering engine control panel (3-2), master control borad (3-1) It can realize that facial device (1) and neck device (2) act by serial communication with steering engine control panel (3-2), and then complete anthropomorphic The higher apery dialogue movement of degree.
2. the Added Management robot head device for the inquiry of department, community hospital, feature exist according to claim 1 In the face support (1-1) includes first facial support (1-1A), the second face support (1-1B), the second face branch There are two circular through holes for the position processing of support apery eye, and the position of apery nose is machined with a circular through hole, and the first face Portion's support (1-1A) is arranged vertically with the second face support (1-1B).
3. the Added Management robot head device for the inquiry of department, community hospital, feature exist according to claim 1 In the voice module (1-2) includes electret capacitor microphone (1-2A), voice acquisition module (1-2B), loudspeaker (1-2C); The electret capacitor microphone (1-2A) is set to the circular through hole of the second face support (1-1B) apery nose shape In, the voice acquisition module (1-2B) and master control borad (3-1) are all set on first facial support upper surface (1-1A) and fix, The loudspeaker (1-2C) are set to first facial support (1-1A) rearward edges and fixation.
4. the Added Management robot head device for the inquiry of department, community hospital, feature exist according to claim 1 In the mouth appliance (1-3) includes the first fixed frame (1-3A), the first steering engine (1-3B), the first steering wheel (1-3C), the first pendulum Bar (1-3D), mouth connector (1-3E), mouth component (1-3F), the output shaft and the first steering wheel of first steering engine (1-3B) (1-3C) transmission connection, first steering wheel (1-3C) is fixedly connected with one end of the first swing rod (1-3D), for realizing mouth The movement of occlusion, the other end and mouth connector (1-3E) of first swing rod (1-3D) are fixed;Mouth connector (the 1- The rear end face of front end face 3E) and mouth component (1-3F) is affixed, and first steering engine (1-3B) passes through the first fixed frame (1- It is 3A) affixed with the lower end surface of first facial support (1-1A).
5. the Added Management robot head device for the inquiry of department, community hospital, feature exist according to claim 1 In the neck support (2-1) includes neck pillar (2-1A), first neck support (2-1B), second neck support (2-1C); The neck pillar (2-1A) is for connecting first facial support (1-1A) and first neck support (2-1B), the first neck Support (2-1B) and second neck support (2-1C) horizontally disposed.
6. the Added Management robot head device for the inquiry of department, community hospital, feature exist according to claim 1 In, the cervical motion device (2-2) include the second steering engine bracket (2-2A), the second steering wheel (2-2B), holder bracket (2-2C), Third steering wheel (2-2D), the second steering engine (2-2E), third steering engine (2-2F);Second steering wheel (2-2B) passes through the second steering engine branch Frame (2-2A) is connect with first neck support (2-1B), and the output shaft and the second steering wheel (2-2B) of second steering wheel (2-2B) pass Dynamic connection, second steering wheel (2-2B) are nested in transmitting vertical direction machine in the driving groove of holder bracket (2-2C) upper end Power needed for device people's pitching motion, the lower end of third steering engine (2-2F) and holder bracket (2-2C) are fixed, and output shaft passes through The holder bracket lower end surface (2-2C) and third steering wheel (2-2D) are sequentially connected, and the third steering wheel (2-2D) is nested in second In the driving groove of the upper end neck support (2-1C), power required for Robot neck is turned round is provided.
7. the Added Management robot head device for the inquiry of department, community hospital, feature exist according to claim 1 In the control system (3) is based on dsp chip, including master control borad (3-1) and steering engine control panel (3-2), master control borad (3-1) It is connect with steering engine control panel (3-2) by serial communication, the SCLKX1 pin of dsp chip and voice collecting mould in the master control borad The SCLK pin connection of block (1-2B) provides the clock of 12MHz, the BFSX1 pin and voice acquisition module (1- of the dsp chip CS pin connection 2B) is selected for piece, and the BDX1 pin of the dsp chip and the SDIN pin of voice acquisition module (1-2B) connect It connects for control interface, BCLKX0, BCLKR0 of the dsp chip connect with the BCLK pin of voice acquisition module (1-2B) simultaneously Meet the clock for synchronous data transmission, BFSX0, BFSR0 pin and voice acquisition module (1-2B) of the dsp chip LRCIN, LRCOUT pin connection I2S formatted data output and input synchronous frame signal, and BDX0 pin and the voice of dsp chip are adopted The DIN pin connection for collecting module (1-2B), is used for input stereo audio DAC, the BDR0 pin and voice acquisition module of dsp chip The DOUT pin of (1-2B) connects, as I2S formatted data output end, the MICIN and electret of voice acquisition module (1-2B) Condenser microphone (1-2A) connection is extracted for monophonic sounds;The LOUT and loudspeaker (1-2C) of voice acquisition module (1-2B) are even It connects for exporting sound;CE1 external storage is given SDRAM by dsp chip EMIFA peripheral hardware in the master control borad, outside CE2 Reservoir gives FLASH, UART1_RX, UART1_DX pin of the dsp chip respectively at steering engine control panel (3-2) TX, RX pin connection, S1, S2, S3 pin of the steering engine control panel (3-2) respectively with the first steering engine (1-3B), the second steering engine (2- 2E), third steering engine (2-2F) digital control end connection, above-mentioned three kinds of steering engines by 5V power supply power supply, board mounted power is by 3.3V Power supply power supply.
8. a kind of based on any Added Management robot head dress for the inquiry of department, community hospital of claim 1 to 7 Set the control method of realization, which comprises the following steps:
Step a, electret capacitor microphone (1-2A) acquisition clients information of acoustic wave (monophonic) in real time, through voice acquisition module Sound wave is AD converted and is decoded final output electric signal by (1-2B);
Step b, it is counted through decoded electric signal input digital signal processor (dsp chip) of voice acquisition module (1-2B) According to processing, by the text information of identification network output identification;
Step c, by the text information recognized and key words text identification list LD_AsrAddFixed() it compares, it is right Corresponding hexadecimal number is then exported than success;
Step d, dsp chip exports the corresponding hexadecimal number of real-time voice to model by peripheral hardware serial communication (UART) The steering engine control panel (3-2) of LSC-16-V1.1, control cervical motion mechanism, in mouth opening and closing motor control, by what is recognized Corresponding audio MP3 data write-in FIF0 register (byte every time) nMp3Pos++ of voice messaging is passed to voice acquisition module (1-2B) is exported by the loudspeaker (1-2C) and is answered voice;
Further, dsp chip described in step b carries out data processing, specifically includes:
Step b1, the pretreatment of audio digital signals specifically includes preemphasis, the framing of voice signal adding window and speech enhan-cement, Wherein preemphasis is improved to high frequency section spectral magnitude, and low frequency part spectrum amplitude is suitably inhibited, and removes lip The influence of radiation keeps frequency spectrum more smoothing speech more melodious, specifically realizes pre-add using single order FIR high-pass digital filter Weight, transmission function is,, whereinFor pre emphasis factor present invention selection=0.98 sets the speech sample at n moment Value is, described by preemphasis processing result with formula are as follows:, code description are as follows: emphasized_signal = numpy.append(data[0], data[1:] - pre_emphasis * data[:- 1]), wherein emphasized_signal indicates that pretreated audio data, data indicate sampled value, and pre_emphasis is Pre emphasis factor 0.98;
Step b2, since the short-term stationarity of voice signal needs to carry out voice signal adding window framing, every frame length is 20ms, Sample frequency is 8000Hz, and obtaining every frame sign nw is 160 samples (nw=160), since frame moves inc's and frame length nw Than generally taking between 0 to 0.5, it is 40 samples that the present invention, which takes frame to move inc, upper usually using time shaft in practical applications Set variable and there is the window function of one fixed width to carry out segmentation interception to voice, the present invention using Hamming window winfunc= Signal.hamming (nw), calculating frame number nf=int (numpy.ceil ((1.0 * signal_length-nw+ Inc)/inc)), not overlapping frame total length pad_length=int ((nf-1) * inc+nw) is calculated, with spot patch original letter entirely Number pad_length is arrived, zeros=numpy.zeros ((pad_length-signal_length)) is merged into a number Group pad_signal=numpy.concatenate ((signal, zeros)), obtain frame number group indices= numpy.tile(numpy.arange(0, nw), (nf, 1)) + numpy.tile(numpy.arange(0, nf*inc, Inc), (nw, 1)) .T, then voice signal is subjected to framing frames=pad_signal [indices], it will finally divide again Frame signal adding window Win=numpy.tile (winfunc, (nf, 1));
Step b3, the later voice messaging of adding window framing we need to intercept voice segments and non-speech segment, to reduce later period DSP core Piece carries out the rate and energy consumption of mobile equipment of voice coding, compared with the general end-point detection based on energy, due to voiceless sound portion Point short-time energy value it is smaller, the missing of unvoiced part leads to that sound bite is complete, identification inaccuracy, and the present invention is using improved Double threshold end-point detection algorithm first passes through short-time energy detection and distinguishes silence clip and voiced segments, then passes through short-time average mistake Unvoiced part is different from noise and extracted from silence clip by zero rate detection, and general unvoiced part is general all before voiced sound Double-threshold comparison algorithm only have under SNR environment it is preferable as a result, still discrimination under the conditions of this Complex Channel of hospital It substantially reduces, sudden noise often causes short-time energy or short-time average zero-crossing rate very high under hospital's scene, after increase The workload and error rate of phase speech recognition, the present invention use short-time average zero-crossing rate energy and the dual inspection of short-time average zero-crossing rate The method of survey identifies voiceless sound section voice, enhances the accuracy rate of speech recognition under low signal-to-noise ratio environment;
Judging that a frame is the condition criterion of reliable voiceless sound by short-time average zero-crossing rateGreater than zero-crossing rate threshold value, increase by one The criterion of voiceless sound energy threshold uses short-time energy and short-time average zero-crossing rate double check unvoiced part, described average short When Average zero-crossing rate detection formula are as follows:, wherein s is the value of sampled point, and T is frame length, Function?Be true duration it is 1, is otherwise 0, using the later voice messaging of framing adding window FrameData=Frame input calculates the function ZCR () of short-time average zero-crossing rate, goes through all over all frame for i in voice signal In range (frameNum), wherein frameNum indicates totalframes, temp that frame mobile one is multiplied with original signal= SingleFrame [: frameSize-1] * singleFrame [1:frameSize], symbol temp at this time= Then short-time average zero-crossing rate is zcr [i]=numpy.sum (temp < 0) to numpy.sign (temp);
If n-th frame voice signalShort-time energy compose useIndicate, then its calculation formula is:, use The later voice messaging frameData=Frame of framing adding window inputs energy () function, goes through all over all frames in voice signal For i in range (frameNum), wherein frameNum indicates totalframes, and the energy value of each frame is multiplied to obtain in short-term Energy ener [i]=sum (singleframe * singleframe);
First voice segments are segmented with higher energy threshold MH first, when short-time energy is greater than energy threshold, energy [i] > MH includes down this frame A.append (i), hereafter re-records this frame A.append (i) when short-time energy is less than energy threshold, If wherein needing to judge, the basic time of this speech recognition is less than 500ms(and frame number circulates as time flow here, Gu 49.5 Frame is 500ms) it is not recorded as voice segments then, it is judged as that burst noise, specific implementation code are as follows:
if energy[i] > MH and i - 49.5 > A[len(A) - 1]:
A.append(i)
elif energy[i] > MH and i - 49.5 <= A[len(A) - 1]:
A = A[:len(A) - 1]
It is hereby achieved that preliminary voice segments matrix A [], further divides A [] by lower energy threshold ML Voice 1 is started to be respectively processed with end point by analysis, if j % 2==by the odevity of frame number, voice segments start frame Energy is if more than lower energy threshold ML, while i<len (energy) and energy [i]>ML, then speech frame (preceding) mobile frame i=i+1 to the left is counted, until equal to or less than lower energy threshold ML, i.e., expanding voice forward Section, similarly the energy of voice segments end frame is if more than lower energy threshold ML, and the and of while i > 0 energy [i] > ML, then expand voice segments i=i -1 backward, and last voice segments are extended to matrix B [];
Finally the voice segments B [] obtained after energy measuring is partially extended into schwa by detecting by short-time average zero-crossing rate For j in range (len (B)) in voice segments, if j % 2==1 voice is started and terminated by the odevity of frame number Point is respectively processed, and the frame number in B [] is taken to index i=B [j], is less than totalframes len (zeroCrossingRate) in i Under the premise of, when the average short-time zero-crossing rate of index point is greater than 3 times of zero-crossing rate average value Zs, speech frame is extended into forward i=i + 1, it is on the contrary then backward extend i=i -1, be embodied in C.append (i) in C [] array;
Step b4, the present invention carries out parameter point to voice signal using the method for Mel cepstrum parameter coefficient (MFCC) feature extraction Analysis, in order to obtain one group of data of the feature that one group can characterize this section of voice signal;
Step b5, using MFCC as the characteristic parameter of voice signal, voice signal ginseng is carried out using Hidden Markov Model (HMM) Number matching;
Step b6, it is calculated each by Recognition with Recurrent Neural Network (Recurrent Neural Network, RNN) language model again The corresponding candidate recognition result of puzzlement degree minimum value is determined as objective result, wherein RNN language mould by the puzzlement degree of candidate result Type is obtained by the training of wikipedia Chinese corpus, and sound wave is finally identified as text sentence;
Further, specific step is as follows for the selection of Mel cepstrum parameter coefficient MFCC of the present invention described in step b4:
Step b401, the sound bite extracted after end-point detection is become to special in short-term in the time domain as unit of each frame Sign, then carries out Fast Fourier Transform (FFT) (FFT), is the feature on frequency domain by the Feature Conversion in time domain, and fft_signal= numpy.fft.fft(Frame.T);
Step b402, followed by frequency conversion, specific method is filtered by Mel filtering group, and practical is in frequency domain Some simple triangular filters, firstly, mel-frequency is converted by frequency hz, because human ear differentiates the size and frequency of sound And non-linear direct ratio, so turning to mel-frequency linear separation again, then hz frequency is converted by mel-frequency, and find correspondence The position hz, then find correspond to fft fast Fourier variation in position, the expression formula of filter is finally established, wherein filtering The size NFFT default of wave device number filters_num=20, FFT take 512, which is in first frequency and third frequency 0, it is 1 at second frequency, calculates three angular separation filter fb=get_filter_banks of Meier
(filters_num=20, NFFT=512, samplerate=8000, low_freq=0, high_freq=None), after It calculates the energy spectrum of each frame of section audio signal and carries out summation energy=numpy.sum (spec_power, 1), to filter Wave device and energy spectrum carry out dot product feat=numpy.dot (spec_power, fb.T), and function returns to feat and energy two Feat is carried out logarithm operation feat=numpy.log (feat) by value;
Step b403, discrete cosine transform (DTC) is carried out to above-mentioned log spectrum feat, only takes preceding 13 coefficients, feat= Dct (feat, type=2, axis=1, norm=' ortho') [:: cep_num], wherein cep_num=13, then feat is passed through Cepstrum promotes feat=lifter (feat, cep_lifter), finally obtains MFCC;
Further, the present invention described in step d controls steering engine using the steering engine control panel (3-2) of model LSC-16-V1.1 Rotation realizes that countenance and cervical motion, steering engine control panel (3-2) the UART serial ports receive hexadecimal number, corresponding Different facial expressions and acts and cervical motion are completed, No. 1 pin output pwm signal controls described in the rotation completion of the first steering engine The opening and closing of robot chin act, and No. 2 pin output pwm signals control the rotation of the second steering engine and complete the Robot neck Revolution movement, the pitching that the Robot neck is completed in No. 3 pin output pwm signals control third steering engine rotation is dynamic Make.
CN201910321974.2A 2019-04-22 2019-04-22 Added Management robot head device and control for the inquiry of department, community hospital Pending CN110047480A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910321974.2A CN110047480A (en) 2019-04-22 2019-04-22 Added Management robot head device and control for the inquiry of department, community hospital

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910321974.2A CN110047480A (en) 2019-04-22 2019-04-22 Added Management robot head device and control for the inquiry of department, community hospital

Publications (1)

Publication Number Publication Date
CN110047480A true CN110047480A (en) 2019-07-23

Family

ID=67278151

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910321974.2A Pending CN110047480A (en) 2019-04-22 2019-04-22 Added Management robot head device and control for the inquiry of department, community hospital

Country Status (1)

Country Link
CN (1) CN110047480A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113160835A (en) * 2021-04-23 2021-07-23 河南牧原智能科技有限公司 Pig voice extraction method, device, equipment and readable storage medium
CN113305867A (en) * 2021-05-20 2021-08-27 上海纳深机器人有限公司 Robot control circuit and control system supporting various AI (Artificial Intelligence) programming

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004033624A (en) * 2002-07-05 2004-02-05 Nti:Kk Remote controller by pet type robot
CN102184732A (en) * 2011-04-28 2011-09-14 重庆邮电大学 Fractal-feature-based intelligent wheelchair voice identification control method and system
CN202677367U (en) * 2011-12-30 2013-01-16 南阳首控光电有限公司 Digital signal processor (DSP) speech recognition applied to laser large screen split joint control system
CN106056207A (en) * 2016-05-09 2016-10-26 武汉科技大学 Natural language-based robot deep interacting and reasoning method and device
CN106328122A (en) * 2016-08-19 2017-01-11 深圳市唯特视科技有限公司 Voice identification method using long-short term memory model recurrent neural network
CN106782550A (en) * 2016-11-28 2017-05-31 黑龙江八农垦大学 A kind of automatic speech recognition system based on dsp chip
CN206961514U (en) * 2017-06-13 2018-02-02 云南天罡北斗信息科技有限公司 A kind of voice inverter of hearing disfluency disabled person
KR20180115602A (en) * 2017-04-13 2018-10-23 인하대학교 산학협력단 Imaging Element and Apparatus for Recognition Speech Production and Intention Using Derencephalus Action
CN108942973A (en) * 2018-09-29 2018-12-07 哈尔滨理工大学 Science and technology center's guest-greeting machine department of human head and neck device with temperature and humidity casting function

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004033624A (en) * 2002-07-05 2004-02-05 Nti:Kk Remote controller by pet type robot
CN102184732A (en) * 2011-04-28 2011-09-14 重庆邮电大学 Fractal-feature-based intelligent wheelchair voice identification control method and system
CN202677367U (en) * 2011-12-30 2013-01-16 南阳首控光电有限公司 Digital signal processor (DSP) speech recognition applied to laser large screen split joint control system
CN106056207A (en) * 2016-05-09 2016-10-26 武汉科技大学 Natural language-based robot deep interacting and reasoning method and device
CN106328122A (en) * 2016-08-19 2017-01-11 深圳市唯特视科技有限公司 Voice identification method using long-short term memory model recurrent neural network
CN106782550A (en) * 2016-11-28 2017-05-31 黑龙江八农垦大学 A kind of automatic speech recognition system based on dsp chip
KR20180115602A (en) * 2017-04-13 2018-10-23 인하대학교 산학협력단 Imaging Element and Apparatus for Recognition Speech Production and Intention Using Derencephalus Action
CN206961514U (en) * 2017-06-13 2018-02-02 云南天罡北斗信息科技有限公司 A kind of voice inverter of hearing disfluency disabled person
CN108942973A (en) * 2018-09-29 2018-12-07 哈尔滨理工大学 Science and technology center's guest-greeting machine department of human head and neck device with temperature and humidity casting function

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
戴国强等: "《科技大数据》", 31 August 2018, 科学技术文献出版社 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113160835A (en) * 2021-04-23 2021-07-23 河南牧原智能科技有限公司 Pig voice extraction method, device, equipment and readable storage medium
CN113305867A (en) * 2021-05-20 2021-08-27 上海纳深机器人有限公司 Robot control circuit and control system supporting various AI (Artificial Intelligence) programming

Similar Documents

Publication Publication Date Title
Chapaneri Spoken digits recognition using weighted MFCC and improved features for dynamic time warping
CN102543073B (en) Shanghai dialect phonetic recognition information processing method
CN107610715B (en) Similarity calculation method based on multiple sound characteristics
Rabiner et al. A comparative performance study of several pitch detection algorithms
CN108369813A (en) Specific sound recognition methods, equipment and storage medium
CN105206271A (en) Intelligent equipment voice wake-up method and system for realizing method
CN109215665A (en) A kind of method for recognizing sound-groove based on 3D convolutional neural networks
Shanthi et al. Review of feature extraction techniques in automatic speech recognition
CN100521708C (en) Voice recognition and voice tag recoding and regulating method of mobile information terminal
CN102509547A (en) Method and system for voiceprint recognition based on vector quantization based
WO2014153800A1 (en) Voice recognition system
CN108847234B (en) Lip language synthesis method and device, electronic equipment and storage medium
CN109215634A (en) A kind of method and its system of more word voice control on-off systems
CN102237083A (en) Portable interpretation system based on WinCE platform and language recognition method thereof
CN109452932A (en) A kind of Constitution Identification method and apparatus based on sound
CN109065073A (en) Speech-emotion recognition method based on depth S VM network model
CN110047480A (en) Added Management robot head device and control for the inquiry of department, community hospital
CN111798846A (en) Voice command word recognition method and device, conference terminal and conference terminal system
CN114283822A (en) Many-to-one voice conversion method based on gamma pass frequency cepstrum coefficient
Ghai et al. A Study on the Effect of Pitch on LPCC and PLPC Features for Children's ASR in Comparison to MFCC.
WO2001029822A1 (en) Method and apparatus for determining pitch synchronous frames
Mini et al. Feature vector selection of fusion of MFCC and SMRT coefficients for SVM classifier based speech recognition system
CN111833869B (en) Voice interaction method and system applied to urban brain
Dumpala et al. Robust Vowel Landmark Detection Using Epoch-Based Features.
CN114724589A (en) Voice quality inspection method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190723