CN108109633A

CN108109633A - The System and method for of unattended high in the clouds sound bank acquisition and intellectual product test

Info

Publication number: CN108109633A
Application number: CN201711384472.1A
Authority: CN
Inventors: 靳源; 冯大航; 陈孝良; 苏少炜; 常乐
Original assignee: BEIJING WISDOM TECHNOLOGY Co Ltd
Current assignee: BEIJING WISDOM TECHNOLOGY Co Ltd; Beijing SoundAI Technology Co Ltd
Priority date: 2017-12-20
Filing date: 2017-12-20
Publication date: 2018-06-01

Abstract

Present disclose provides a kind of unattended high in the clouds sound banks to gather the system with intellectual product test, including：Sound storehouse Data acquisition and storage unit, for completing the acquisition of original audio data, and it stores to high in the clouds, it includes recording device, self-service acquisition module and cloud server, and Test data generation is with using unit, for being generated by the original audio data mass in high in the clouds the test audio signal of specification is specified to test tested intellectual product, and the equipment voice data automatic aligning mark returned will be gathered, including processing module and playing device.The disclosure improves the ratio and collecting efficiency of acquisition valid data；And speaker can independently complete gatherer process, accomplish unattended；Realize data uploads high in the clouds in real time simultaneously, avoids the unexpected loss of data；And energy automatic batch metaplasia realizes that equipment voice data aligns with test data into the test data of specified format.

Description

The System and method for of unattended high in the clouds sound bank acquisition and intellectual product test

Technical field

This disclosure relates to data under voice field more particularly to a kind of unattended high in the clouds sound bank acquisition and intelligence The System and method for of product test.

Background technology

Speech recognition technology is fermented and accumulated by very long, extensive commercial horizontal reaching in recent years, is started Smart home, intelligent vehicle-mounted system and a series of speech recognition softwares research and development overbearing tide.Realize it is man-machine between it is intelligent, hommization Effectively interaction builds man-machine communication's environment of efficient natural, has become the active demand of the application of current information technology and development. Depth neuroid is the important research direction of nowadays speech recognition, it needs the training set voice data of magnanimity that could instruct More accurately acoustic model is practised, improves the accuracy of identification.And build extensive, high-fineness, high naturalness and height The sound bank of accuracy is to the stability of speech synthesis system also important.As speech recognition product is more and more, Tester is also required to substantial amounts of voice data and is tested to ensure product quality, and the processing of voice data is also required to expend Tester's great effort.To sum up, the acquisition voice data structure sound bank of high efficiency, high quality, mass is to voice data Processing become particularly important.

First, the acquisition mode in traditional voice storehouse needs recording personnel in specific playback environ-ment speaker to be instructed to complete record The sound storehouse of sound language material is recorded, and this acquisition mode relies on substantial amounts of manual operation, such as recording personnel is needed to operate software configuration Sound card setting is recorded, and the later stage will carry out prolonged editing and mark work, and the position of such as manual modification recording error is adjusted Volume balance of the section per section audio, all can big heavy discount which results in jcharge factor and quality.Secondly, this acquisition mode leads to Often store data on collecting device, then entirety is uploaded to cloud server, there is many risks among these, if record There are emergency situations during system, such as power-off or equipment are damaged suddenly suddenly, and the data of acquisition is caused not preserve. Or occur delete operation by mistake when artificially arranging, lead to problems such as loss of data.Finally, traditional test mode requirement test Every audio file of sound bank is carried out splicing as broadcasting source of sound by personnel, and carries out long-time record to tested intellectual product Sound, since device product hardware problem or internal audio frequency Processing Algorithm problem are frequently encountered recording data and raw tone number The problem of according to misalignment, this detects intellectual product phonetic recognization rate, wake-up rate detects or the standard of machine learning training pattern Exactness all has a significant impact.And since the memory of different product is different, record length is limited, so playing accounting for for audio files It is also required to adjust accordingly for different product with space size, which increases the workloads of tester and research staff.

Disclosure

(1) technical problems to be solved

Present disclose provides a kind of unattended high in the clouds sound banks to gather the System and method for intellectual product test, with At least partly solve the technical issues of set forth above.

(2) technical solution

According to one aspect of the disclosure, a kind of unattended high in the clouds sound bank acquisition is provided to test with intellectual product System, including：Sound storehouse Data acquisition and storage unit for completing the acquisition of original audio data, and is stored to high in the clouds, bag It includes：Recording device, for gathering the audio of speaker；Self-service acquisition module obtains the audio of recording device acquisition by sound card, And it carries out matching generation original audio data with language material text；Cloud server is connected with self-service acquisition module, for will be original Voice data preserves beyond the clouds；Test data generation is with using unit, for passing through the original audio data batch metaplasia in high in the clouds Test audio signal into specified specification tests tested intellectual product, including：Processing module is connected to cloud service Device for obtaining the original audio data in high in the clouds, generates test audio signal；Playing device is connected to processing module, for Test audio signal is played under the control of processing module, for tested intellectual product test.

In the disclosure some embodiments, the processing module is additionally operable to that the equipment voice data returned will be gathered automatic Alignment mark, including：The equipment voice data that tested intellectual product is generated by collecting test audio signal is obtained, and will be original All time coordinates in the time-labeling file of voice data are multiplied to obtain new time coordinate with ratio cc, generate equipment sound The time-labeling file of frequency evidence, wherein, the ratio cc is equipment voice data and the ratio of original audio data duration.

In the disclosure some embodiments, the self-service acquisition module is additionally operable to display and has read textual data and remaining text Number, and judge whether speaker misreads；And operated according to speaker, recording pause and continuation are controlled in Recording Process.

According to another aspect of the disclosure, provide a kind of unattended high in the clouds sound bank acquisition and surveyed with intellectual product The method of examination, including：

Step S1, speaker are gathered by the self-service sound storehouse of completing of recording device and self-service acquisition module, and voice data is real-time Upload to cloud server；

Step S2, processing module extraction high in the clouds original audio data, generates test audio signal, and passes through playing device and broadcast It puts；

Step S3, tested intellectual product collect the audio of playing device broadcasting, and generate equipment voice data and be back to Processing module, processing module carry out the time-labeling file of calculating processing generation equipment voice data, output test result.

In the disclosure some embodiments, the step S2 further comprises：

Step S21, allocating default test data duration, the often mute duration to be inserted among section audio, and initializing slow It deposits；

Step S22 randomly selects the audio in sound storehouse, and previous audio splicing, and will be mute tired with the cycle per section audio Product splicing, calculates audio total length；

Step S23 is calculated and is recorded the duration in Xun Huan per secondary audio program as time-labeling T_k, generation mark text text Part；

Step S24 judges whether audio total length is more than and sets length, if more than length is set then to go to step S25, if Not less than length is set then to determine whether new audio file, if there is then reincarnation step S22, terminate to generate if not Test audio signal；

Step S25 is inserted into chirp signals to total signal two ends, and chirp signal expressions are：

Whereinf_lFor the initial frequency of swept-frequency signal, f_hFor the termination frequency of swept-frequency signal；φ₀Represent frequency sweep The phase of signal, T are duration, and A is amplitude, preserve testing audio, initialization caching, and go to step S22.

In the disclosure some embodiments, the step S3 further comprises：

Step S31, tested intellectual product collect the audio of playing device broadcasting, and generate equipment voice data and be back to Processing module, processing module read the original audio data of generation and equipment voice data；

Step S32, processing module detect chirp signal head and the tail endpoints in audio；

Step S33 utilizes time coordinate computing device acquisition voice data duration and original test audio data time length ratio Value：

Wherein, α is equipment acquisition the ratio between audio and testing audio sample rate；T_ybegFor equipment audio time started, T_yend For the equipment audio end time；T_xbegFor original test audio time started, T_xendFor the equipment original test audio end time.

All time coordinates that original time is marked in file are multiplied with α to obtain new time coordinate by step S34, are given birth to The time-labeling file of forming apparatus voice data.

In the disclosure some embodiments, the step S33 further comprises：

Sub-step S321, the generation chirp signal identical with test audio signal fall chirp signals progress time domain Turn to obtain matched filter h (t)=x (T-t)；

Sub-step S322, by equipment acquisition voice data y (t) and original audio data x (t) before tens second datas respectively with The matched filter carries out convolution, obtains the output signal r of matched filter₁(t)=h (t) * y (t), r₂(t)=h (t) * x (t)；

Sub-step S323 searches the output signal r of matched filter₁(t),r₂(t) time coordinate of signal maximum point As signal starting point time coordinate similarly detects signal tail point time coordinate.

In the disclosure some embodiments, the step S1 further comprises：

Step S11 reads language material text file information,

Step S12, and judge whether recording terminates, completion of recording if terminating, if not terminating to go to step S13；

Step S13 alternately displays wake-up word and records with language material text for speaker, calculated automatically according to text size Every section of text long recording time；

Step S14 often gathers a section audio and just calculates time domain draw energyWith setting Determine normalized energy value to make the difference and amplification factor is obtainedFinal normalization audio y_n=ax_nIn storage Cloud server is reached, wherein, N is to gather the total sampling number of echo frequency, x_nFor the audio volume control sequence gathered back, Y_rmsTo set The average energy value after fixed normalization, y_nFor the audio volume control sequence after normalization；

Step S15, real-time display has been read and remaining text number in Recording Process；

Step S16, judges whether speaker misreads, and recording error control rerecords data before covering, return to step S12.

In the disclosure some embodiments, before the reading language material text, step is further included：

Step S10, the name information for gathering speaker are used to preserve the name of recording file；It sets and wakes up word, configuration record Sound default parameters, including recording sample frequency and quantified precision.

In the disclosure some embodiments, in Recording Process, speaker by self-service acquisition module control recording pause and Continue.

(3) advantageous effect

It can be seen from the above technical proposal that the unattended high in the clouds sound bank acquisition of the disclosure and intellectual product test System and method at least has the advantages that one of them：

1) just automatically saved due in gatherer process, often collecting a new data, program automatically adopts text segmentation Collection, and audio volume is normalized and is stored, and the signal of preservation is uploaded to by WIFI to the cloud specified under same LAN Hold server.The improvement of the structure is solved in gatherer process, and accidental interruption, which occurs, in acquisition causes gathered data not have under preserving The phenomenon that coming has achieved the effect that upload in record；

2) after acquisition, research staff, which can directly generate and be downloaded from cloud server, adds head and the tail marking signal Self-defined duration voice data, and corresponding time-labeling text is generated, using the voice data as playing source of sound pair After tested intellectual product is recorded, new mark file can be automatically generated；

3) due to using recording device, with speaker real-time communication speaker oneself need not be allowed to complete collecting work, from And realize unattended acquisition mode, can control and rerecord in real time when the bright read error of speaker, improve collecting efficiency with Acquisition quality.

Description of the drawings

Fig. 1 is that the unattended high in the clouds sound bank acquisition of the embodiment of the present disclosure and the structure of the system of intellectual product test are shown It is intended to.

Fig. 2 is the method flow diagram of the unattended high in the clouds sound bank acquisition and intellectual product test of the embodiment of the present disclosure.

Fig. 3 is embodiment of the present disclosure automatic collection program flow diagram.

Fig. 4 is embodiment of the present disclosure test audio signal product process figure.

Fig. 5 is the time-labeling file generated flow chart of embodiment of the present disclosure equipment voice data.

Specific embodiment

Purpose, technical scheme and advantage to make the disclosure are more clearly understood, below in conjunction with specific embodiment, and reference The disclosure is further described in attached drawing.

Disclosure some embodiments will be done with reference to appended attached drawing in rear and more comprehensively describe to property, some of but not complete The embodiment in portion will be illustrated.In fact, the various embodiments of the disclosure can be realized in many different forms, and should not be construed To be limited to this several illustrated embodiment；Relatively, these embodiments are provided so that the disclosure meets applicable legal requirement.

In first exemplary embodiment of the disclosure, a kind of unattended high in the clouds sound bank acquisition and intelligence are provided The system of energy product test.Fig. 1 is that the unattended high in the clouds sound bank acquisition of the first embodiment of the present disclosure is tested with intellectual product System structure diagram.As shown in Figure 1, the unattended high in the clouds sound bank acquisition of the disclosure is with what intellectual product was tested System includes：Sound storehouse Data acquisition and storage unit, Test data generation are with using unit.

The high in the clouds sound bank unattended to the present embodiment gathers each with the system of intellectual product test individually below Component is described in detail.

The sound storehouse Data acquisition and storage unit for completing the acquisition of original audio data, and is stored to high in the clouds, bag It includes：

Recording device for gathering the audio of speaker, generates language material text, it is preferable that the recording device is using record Sound microphone and computer sound card；

Self-service acquisition module obtains the language material text of recording device acquisition by sound card, generates original audio data；It is preferred that Ground, the self-service acquisition module are self-service acquisition PC；

Cloud server is connected with self-service acquisition module, for original audio data to be preserved beyond the clouds, the high in the clouds clothes Business device and the connection of self-service acquisition module use WIFI or wired connection；

The Test data generation specifies the testing audio of specification to believe with the cardinal principle using unit for mass generation Number tested intellectual product is tested, gather the data automatic aligning mark returned；Including：

Playing device is connected with processing module, for playing test data；

Processing module is connected with the cloud server, for obtaining the original audio data in high in the clouds, the test tone of generation Frequency signal, and the equipment voice data of passback is obtained, generate the time-labeling file of equipment voice data.Preferably, the place Manage the PC computers that module is research staff；The processing module is with cloud server using WIFI, bluetooth, infrared or wired mode Connection.

Tested intellectual product is connected to processing module, the audio of acquisition playing device output, and is back to processing module； When all time coordinates in the time-labeling file of original audio data are multiplied to obtain with ratio cc new by the processing module Between coordinate, generate the time-labeling file of equipment voice data, the ratio cc for equipment voice data and original audio data when Long ratio.

The self-service acquisition module is additionally operable to display and has read textual data and remaining textual data, and judges whether speaker is read It is wrong；And operated according to speaker, recording pause and continuation are controlled in Recording Process.

So far, the system introduction of the first embodiment of the present disclosure unattended high in the clouds sound bank acquisition and intellectual product test It finishes.

In second exemplary embodiment of the disclosure, a kind of unattended high in the clouds sound bank acquisition and intelligence are provided The method of energy product test, Fig. 2 are the side of the unattended high in the clouds sound bank acquisition and intellectual product test of the embodiment of the present disclosure Method flow chart.As shown in Fig. 2, this method includes：

Step S1, speaker are gathered by the self-service sound storehouse of completing of recording device and self-service acquisition module, and voice data passes through WIFI uploads to cloud server in real time.

Step S2, processing module extraction high in the clouds original audio data, generates test audio signal, and passes through playing device and broadcast It puts.

Step S3, tested intellectual product collect the audio of playing device broadcasting, and generate equipment voice data and be back to Processing module calculates original audio data and the ratio of equipment voice data duration by processing module, and generates equipment audio The time-labeling file of data, outputs test result.

Fig. 3 is the flow chart of embodiment of the present disclosure sound storehouse acquisition, as shown in figure 3, the step S1 further comprises：

Step S11 reads language material text file information,

Step S13, speaker observation alternately display wake up word record with language material text, program according to text size oneself It is dynamic to calculate every section of text long recording time；

Step S16, judges whether speaker misreads, the controllable data before rerecording covering of recording error, return to step S12, it is preferable that it is described rerecord covering before data include removing a data on high in the clouds, and re-read a language material text This.

In above-mentioned Recording Process, speaker can control pause and continuation.

Before the reading language material text, step is further included：

Step S10, the name information for gathering speaker are used to preserve the name of recording file；It sets and wakes up word, configuration record Sound default parameters, such as recording sample frequency and quantified precision.

Fig. 4 is embodiment of the present disclosure test audio signal product process figure, as shown in figure 4, the step S2 is further wrapped It includes：

Step S25 is inserted into Linear chirp, i.e. chirp signals, chirp signal expressions to total signal two ends ForWhereinFl is the initial frequency of swept-frequency signal, and fh is The termination frequency of swept-frequency signal, φ₀Represent the phase of swept-frequency signal, T is duration, and f is chosen in the present embodiment_lFrom 2000Hz to f_h8000Hz, duration T be 500ms, amplitude A 1, φ₀For 0, testing audio, initialization caching are preserved, and goes to step S22.

Fig. 5 is tested the time-labeling file generated flow chart of the equipment voice data of intellectual product, such as Fig. 5 for the present embodiment Shown, the step S3 further comprises：

Step S31, tested intellectual product collect the audio of playing device broadcasting, and generate equipment voice data and be back to Processing module, processing module read the original audio data of generation and equipment voice data；In the present embodiment, the playing device For loud speaker；

Step S33 utilizes time coordinate computing device acquisition audio duration and original test audio data duration ratio

This is equipment acquisition the ratio between audio and testing audio sample rate；T_ybegFor equipment audio time started, T_yendTo set The standby audio end time；T_xbegFor original test audio time started, T_xendFor the equipment original test audio end time；

Step S34 reads the length information that original audio corresponds to text marking, i.e. time in original time mark file All time coordinates are multiplied with α to obtain new time coordinate, generate the time-labeling file of equipment voice data by coordinate.

Wherein, the step S32 further comprises：

Sub-step S323 searches the output signal r of matched filter₁(t),r₂(t) time coordinate of signal maximum point As signal starting point time coordinate can similarly detect signal tail point time coordinate.

By the sound storehouse acquisition method of the present invention, valid data account for more than the 80% of total data, and collecting efficiency also carries significantly It is high；And speaker can independently complete gatherer process, accomplish unattended, save the cost of labor of recording；It realizes simultaneously Data upload high in the clouds in real time, avoid the unexpected loss of data；And can automatic batch metaplasia into specified format test Data, the data returned of intelligent sound product acquisition can be realized aligns with test data, and generates accurate markup information.

In order to achieve the purpose that brief description, in above-described embodiment 1, any technical characteristic narration for making same application is all And in this, without repeating identical narration.

So far, the method introduction of the second embodiment of the present disclosure unattended high in the clouds sound bank acquisition and intellectual product test It finishes.

So far, attached drawing is had been combined the embodiment of the present disclosure is described in detail.It should be noted that it in attached drawing or says In bright book text, the realization method that does not illustrate or describe is form known to a person of ordinary skill in the art in technical field, and It is not described in detail.In addition, the above-mentioned definition to each element and method be not limited in mentioning in embodiment it is various specific Structure, shape or mode, those of ordinary skill in the art simply can be changed or replaced to it.

And the shape and size of each component do not reflect actual size and ratio in figure, and only illustrate the embodiment of the present disclosure Content.In addition, in the claims, any reference symbol between bracket should not be configured to the limit to claim System.

Furthermore word "comprising" does not exclude the presence of element or step not listed in the claims.Before element Word "a" or "an" does not exclude the presence of multiple such elements.

In addition, unless specifically described or the step of must sequentially occur, there is no restriction in more than institute for the order of above-mentioned steps Row, and can change or rearrange according to required design.And above-described embodiment can be based on the considerations of design and reliability, that This mix and match is used using or with other embodiment mix and match, i.e., the technical characteristic in different embodiments can be freely combined Form more embodiments.

Algorithm and display be not inherently related to any certain computer, virtual system or miscellaneous equipment provided herein. Various general-purpose systems can also be used together with teaching based on this.As described above, required by constructing this kind of system Structure be obvious.In addition, the disclosure is not also directed to any certain programmed language.It should be understood that it can utilize various Programming language realizes content of this disclosure described here, and the description done above to language-specific is to disclose this public affairs The preferred forms opened.

The disclosure can be by means of including the hardware of several different elements and by means of properly programmed computer It realizes.The all parts embodiment of the disclosure can be with hardware realization or to be run on one or more processor Software module is realized or realized with combination thereof.It it will be understood by those of skill in the art that can be in practice using micro- Processor or digital signal processor (DSP) are some or all in the relevant device according to the embodiment of the present disclosure to realize The some or all functions of component.The disclosure be also implemented as performing method as described herein a part or Whole equipment or program of device (for example, computer program and computer program product).Such journey for realizing the disclosure Sequence can may be stored on the computer-readable medium or can have the form of one or more signal.Such signal can It obtains either providing on carrier signal or providing in the form of any other to download from internet website.

Those skilled in the art, which are appreciated that, to carry out adaptively the module in the equipment in embodiment Change and they are arranged in one or more equipment different from the embodiment.It can be the module or list in embodiment Member or component be combined into a module or unit or component and can be divided into addition multiple submodule or subelement or Sub-component.In addition at least some in such feature and/or process or unit exclude each other, it may be employed any Combination is disclosed to all features disclosed in this specification (including adjoint claim, summary and attached drawing) and so to appoint Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification is (including adjoint power Profit requirement, summary and attached drawing) disclosed in each feature can be by providing the alternative features of identical, equivalent or similar purpose come generation It replaces.If also, in the unit claim for listing equipment for drying, several in these devices can be by same hard Part item embodies.

Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of each open aspect, Above in the description of the exemplary embodiment of the disclosure, each feature of the disclosure is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention：I.e. required guarantor The disclosure of shield requires features more more than the feature being expressly recited in each claim.It is more precisely, such as following Claims reflect as, open aspect is all features less than single embodiment disclosed above.Therefore, Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim is in itself Separate embodiments all as the disclosure.

Particular embodiments described above has carried out the purpose, technical solution and advantageous effect of the disclosure further in detail It describes in detail bright, it should be understood that the foregoing is merely the specific embodiments of the disclosure, is not limited to the disclosure, it is all Within the spirit and principle of the disclosure, any modification, equivalent substitution, improvement and etc. done should be included in the guarantor of the disclosure Within the scope of shield.

Claims

1. a kind of system of unattended high in the clouds sound bank acquisition and intellectual product test, including：

Sound storehouse Data acquisition and storage unit for completing the acquisition of original audio data, and is stored to high in the clouds, including：

Recording device, for gathering the audio of speaker；

Self-service acquisition module obtains the audio that recording device gathers by sound card, and with language material text match generate it is original Voice data；

Cloud server is connected with self-service acquisition module, for original audio data to be preserved beyond the clouds；

Test data generation is with using unit, for passing through the test that specification is specified in the generation of the original audio data mass in high in the clouds Audio signal tests tested intellectual product, including：

Processing module is connected to cloud server, for obtaining the original audio data in high in the clouds, generates test audio signal；

Playing device is connected to processing module, for playing test audio signal under the control of processing module, for tested intelligence Product test.

2. system according to claim 1, wherein,

The processing module is additionally operable to that the equipment voice data automatic aligning mark returned will be gathered, including：Obtain tested intelligence The equipment voice data that product is generated by collecting test audio signal, and will be in the time-labeling file of original audio data All time coordinates are multiplied to obtain new time coordinate with ratio cc, generate the time-labeling file of equipment voice data, wherein, The ratio cc is equipment voice data and the ratio of original audio data duration.

3. system according to claim 2, wherein,

The self-service acquisition module is additionally operable to display and has read textual data and remaining textual data, and judges whether speaker misreads；With And operated according to speaker, recording pause and continuation are controlled in Recording Process.

4. a kind of method of unattended high in the clouds sound bank acquisition and intellectual product test, including：

Step S1, speaker are gathered by the self-service sound storehouse of completing of recording device and self-service acquisition module, and voice data uploads in real time To cloud server；

Step S2, processing module extraction high in the clouds original audio data generate test audio signal, and are played by playing device；

5. according to the method described in claim 4, the step S2 further comprises：

Step S21, allocating default test data duration, the often mute duration to be inserted among section audio, and initialize caching；

Step S22 randomly selects the audio in sound storehouse, and previous audio splicing, and by mute with being spelled per section audio circulative accumulation It connects, calculates audio total length；

Step S23 is calculated and is recorded the duration in Xun Huan per secondary audio program as time-labeling T_k, generation mark text file；

Step S24 judges whether audio total length is more than and sets length, if more than length is set then to go to step S25, if not surpassing It crosses and length is set then to determine whether new audio file, if there is then reincarnation step S22, terminate generation test if not Audio signal；

Whereinf_lFor the initial frequency of swept-frequency signal, f_hFor the termination frequency of swept-frequency signal；φ₀Represent swept-frequency signal Phase, T is duration, and A is amplitude, preserves testing audio, and initialization caching and goes to step S22.

6. according to the method described in claim 4, the step S3 further comprises：

Step S33 utilizes time coordinate computing device acquisition voice data duration and original test audio data duration ratio：

<mrow> <mi>&alpha;</mi> <mo>=</mo> <mfrac> <mrow> <msub> <mi>T</mi> <mrow> <mi>y</mi> <mi>e</mi> <mi>n</mi> <mi>d</mi> </mrow> </msub> <mo>-</mo> <msub> <mi>T</mi> <mrow> <mi>y</mi> <mi>b</mi> <mi>e</mi> <mi>g</mi> </mrow> </msub> </mrow> <mrow> <msub> <mi>T</mi> <mrow> <mi>x</mi> <mi>e</mi> <mi>n</mi> <mi>d</mi> </mrow> </msub> <mo>-</mo> <msub> <mi>T</mi> <mrow> <mi>x</mi> <mi>b</mi> <mi>e</mi> <mi>g</mi> </mrow> </msub> </mrow> </mfrac> </mrow>

Wherein, α is equipment acquisition the ratio between audio and testing audio sample rate；T_ybegFor equipment audio time started, T_yendTo set The standby audio end time；T_xbegFor original test audio time started, T_xendFor the equipment original test audio end time.

All time coordinates that original time is marked in file are multiplied with α to obtain new time coordinate by step S34, and generation is set The time-labeling file of standby voice data.

7. according to the method described in claim 6, the step S33 further comprises：

Sub-step S321, the generation chirp signal identical with test audio signal reverse chirp signals progress time domain To matched filter h (t)=x (T-t)；

Sub-step S322, by equipment acquisition voice data y (t) and original audio data x (t) before tens second datas respectively with this Convolution is carried out with wave filter, obtains the output signal r of matched filter₁(t)=h (t) * y (t), r₂(t)=h (t) * x (t)；

Sub-step S323 searches the output signal r of matched filter₁(t),r₂(t) time coordinate of signal maximum point is Signal starting point time coordinate similarly detects signal tail point time coordinate.

8. according to the method described in claim 4, the step S1 further comprises：

Step S11 reads language material text file information,

Step S13 alternately displays wake-up word and records with language material text for speaker, every section is calculated automatically according to text size Text long recording time；

Step S14 often gathers a section audio and just calculates time domain draw energyReturn with setting One change energy value makes the difference and amplification factor is obtainedFinal normalization audio y_n=ax_nStorage is uploaded to Cloud server, wherein, N is to gather the total sampling number of echo frequency, x_nFor the audio volume control sequence gathered back, Y_rmsReturn for setting The average energy value after one change, y_nFor the audio volume control sequence after normalization；

9. according to the method described in claim 8, before the reading language material text, step is further included：

Step S10, the name information for gathering speaker are used to preserve the name of recording file；It sets and wakes up word, configuration recording is silent Parameter is recognized, including recording sample frequency and quantified precision.

10. according to the method described in claim 8, in Recording Process, speaker controls recording pause by self-service acquisition module And continuation.