CN110335616A - Voice data noise-reduction method, device, computer equipment and storage medium - Google Patents
Voice data noise-reduction method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN110335616A CN110335616A CN201910650447.6A CN201910650447A CN110335616A CN 110335616 A CN110335616 A CN 110335616A CN 201910650447 A CN201910650447 A CN 201910650447A CN 110335616 A CN110335616 A CN 110335616A
- Authority
- CN
- China
- Prior art keywords
- feature
- audio data
- combination
- noise reduction
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 230000009467 reduction Effects 0.000 claims abstract description 97
- 238000013135 deep learning Methods 0.000 claims abstract description 43
- 238000011156 evaluation Methods 0.000 claims abstract description 39
- 238000011946 reduction process Methods 0.000 claims abstract description 37
- 230000001755 vocal effect Effects 0.000 claims description 67
- 238000012545 processing Methods 0.000 claims description 30
- 238000013136 deep learning model Methods 0.000 claims description 20
- 238000004590 computer program Methods 0.000 claims description 17
- 238000003672 processing method Methods 0.000 claims description 15
- 238000012549 training Methods 0.000 claims description 11
- 239000000284 extract Substances 0.000 claims description 6
- 238000001228 spectrum Methods 0.000 claims description 5
- 230000005540 biological transmission Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 10
- 238000013473 artificial intelligence Methods 0.000 abstract 1
- 239000000470 constituent Substances 0.000 description 10
- 241001269238 Data Species 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 230000004069 differentiation Effects 0.000 description 4
- 238000005259 measurement Methods 0.000 description 4
- 239000000203 mixture Substances 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 210000003484 anatomy Anatomy 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000002354 daily effect Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 239000004615 ingredient Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
This application involves a kind of voice data noise-reduction method, device, computer equipment and storage medium based on artificial intelligence, it include: the noise reduction request for receiving terminal and sending, and obtain the incidence relation in the combination of audio data character pair and feature combination to be processed between each feature.According to the incidence relation between each feature and each feature, the discrimination of each feature combination is calculated.The combination of each feature is screened according to preset discrimination threshold value, obtain initial characteristics combination, initial characteristics combination is screened using preset evaluation index, obtain available feature combination, and obtain available feature and combine corresponding audio data to be processed, the first initial audio data based on discrimination is generated, deep learning noise reduction model is based on, noise reduction process is carried out to the first initial audio data, the voice data after generating noise reduction.This method carries out noise reduction to the voice data based on discrimination using deep learning noise reduction model, improves voice data noise reduction effect.
Description
Technical field
This application involves voice processing technology fields, more particularly to a kind of voice data noise-reduction method, device, computer
Equipment and storage medium.
Background technique
With increasingly developed, the generally use of voice data in daily life, for different user of voice processing technology
Demand, the voice quality of voice data is required also different, and in various situations used in everyday, there are a variety of noises
The interference of data and device signal, voice quality will receive certain influence, can not meet the needs of users, therefore occur
Voice de-noising technology.
A kind of currently used voice de-noising method is the signal-to-noise ratio curve by determining voice signal, and is believed according to voice
Number signal-to-noise ratio curve determine speech frame and noise frame in voice signal, only to noise frame obtained carry out noise reduction process,
But the method used is relatively simple, and to be improved for the accuracy of differentiation and the determination of speech frame and noise frame, such as occurs
The problem for determining inaccuracy will affect voice quality, reduce the effect of voice data noise reduction.
Summary of the invention
Based on this, it is necessary in view of the above technical problems, provide a kind of voice data that can be improved voice de-noising effect
Noise-reduction method, device, computer equipment and storage medium.
A kind of voice data noise-reduction method, which comprises
It receives the noise reduction to audio data to be processed that terminal is sent to request, and obtains the audio data to be processed;
The corresponding feature combination of the audio data to be processed is obtained, and obtains the feature and combines between interior each feature
Incidence relation combines the incidence relation between corresponding feature and each feature according to the feature, calculates each feature combination
Discrimination;
Each feature combination is screened according to the discrimination threshold value, obtains initial characteristics combination;
Initial characteristics combination is screened using preset evaluation index, acquisition meets the preset evaluation index can
It is combined with feature;
It obtains the available feature and combines corresponding audio data to be processed, generate the first initial audio based on discrimination
Data;
Based on deep learning noise reduction model, noise reduction process is carried out to first initial audio data, after generating noise reduction
Voice data.
The mode of the deep learning noise reduction model is obtained in one of the embodiments, comprising:
Obtained from training sample without departing from the feature of the discrimination threshold value combine corresponding effective audio data and its
Corresponding second initial audio data;
Slicing treatment is carried out to effective audio data and second initial audio data respectively according to predetermined length;
According to effective audio data after slice, the first voiceprint map of effective audio data is generated, from institute
State the first vocal print parameter that effective audio data is extracted in the first voiceprint map;
According to second initial audio data after slice, the second vocal print figure of second initial audio data is generated
Spectrum, extracts the second vocal print parameter of second initial audio data from second voiceprint map;
Using the second vocal print parameter of second initial audio data as the input of deep learning model, the moment is corresponded to
Output of the first vocal print parameter of effective audio data as deep learning model, is trained deep learning model, obtains
Deep learning noise reduction model.
The corresponding feature combination of the audio data to be processed is obtained in one of the embodiments, and is calculated each described
The discrimination of feature combination, comprising:
Obtain the incidence relation between the corresponding feature of the audio data to be processed and each feature;
According to the incidence relation between the feature and each feature, feature corresponding with the audio data to be processed is generated
Combination;
The incidence relation between corresponding feature and each feature is combined according to each feature, calculates separately each feature combination
Discrimination.
It is described in one of the embodiments, that each feature combination is sieved according to the preset discrimination threshold value
Choosing obtains initial characteristics combination, comprising:
The discrimination of each feature combination is compared with the discrimination threshold value respectively;
The corresponding feature combination of discrimination beyond the discrimination threshold value is obtained, initial characteristics combination is generated.
It is described in one of the embodiments, that initial characteristics combination is screened using preset evaluation index, it obtains
Obtain available feature combination, comprising:
Obtain preset evaluation index;The preset evaluation index includes AUC value, accuracy rate and recall rate;
According to the AUC value, accuracy rate and recall rate, initial characteristics combination is screened;
Satisfactory initial characteristics combination is obtained, available feature combination is generated.
It is described in one of the embodiments, to be based on deep learning noise reduction model, to first initial audio data into
Row noise reduction process, the voice data after generating noise reduction, comprising:
Slicing treatment is carried out to first initial audio data according to predetermined length;
According to first initial audio data after slice, the vocal print to be processed of first initial audio data is generated
Map extracts the vocal print parameter to be processed of first initial audio data from the voiceprint map to be processed;
The vocal print parameter to be processed is inputted in the deep learning noise reduction model, the voice data after obtaining noise reduction.
In one of the embodiments, before the step of calculating the discrimination of each feature combination, further includes:
According to the corresponding relationship between each feature combination and data type, obtains combined with each feature pair respectively
The data type answered;The data type includes numeric type, byte type and text-type;
According to the corresponding relationship between the data type and data processing method, obtain corresponding with the data type
Data processing method;The data processing method includes judgement processing, assignment processing and statement processing;
According to each data processing method, the corresponding audio data to be processed of each feature combination is carried out at data respectively
Reason.
A kind of voice data denoising device, described device include:
Receiving module, the noise reduction to audio data to be processed for receiving terminal transmission are requested, and are obtained described wait locate
Manage audio data;
Discrimination computing module for obtaining the corresponding feature combination of the audio data to be processed, and obtains the spy
Sign combines the incidence relation between interior each feature, combines the association between corresponding feature and each feature according to the feature and closes
System calculates the discrimination of each feature combination;
Initial characteristics combination obtains module, for being sieved according to preset discrimination threshold value to each feature combination
Choosing obtains initial characteristics combination;
Available feature combination obtains module, for being screened using preset evaluation index to initial characteristics combination,
Obtain the available feature combination for meeting the preset evaluation index;
Initial audio data generation module combines corresponding audio data to be processed for obtaining the available feature, raw
At the first initial audio data based on discrimination;
Noise reduction module carries out noise reduction process to first initial audio data for being based on deep learning noise reduction model,
Voice data after generating noise reduction.
A kind of computer equipment, including memory and processor, the memory are stored with computer program, the processing
Device performs the steps of when executing the computer program
It receives the noise reduction to audio data to be processed that terminal is sent to request, and obtains the audio data to be processed;
The corresponding feature combination of the audio data to be processed is obtained, and obtains the feature and combines between interior each feature
Incidence relation combines the incidence relation between corresponding feature and each feature according to the feature, calculates each feature combination
Discrimination;
Each feature combination is screened according to the discrimination threshold value, obtains initial characteristics combination;
Initial characteristics combination is screened using preset evaluation index, acquisition meets the preset evaluation index can
It is combined with feature;
It obtains the available feature and combines corresponding audio data to be processed, generate the first initial audio based on discrimination
Data;
Based on deep learning noise reduction model, noise reduction process is carried out to first initial audio data, after generating noise reduction
Voice data.
A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor
It is performed the steps of when row
It receives the noise reduction to audio data to be processed that terminal is sent to request, and obtains the audio data to be processed;
The corresponding feature combination of the audio data to be processed is obtained, and obtains the feature and combines between interior each feature
Incidence relation combines the incidence relation between corresponding feature and each feature according to the feature, calculates each feature combination
Discrimination;
Each feature combination is screened according to the discrimination threshold value, obtains initial characteristics combination;
Initial characteristics combination is screened using preset evaluation index, acquisition meets the preset evaluation index can
It is combined with feature;
It obtains the available feature and combines corresponding audio data to be processed, generate the first initial audio based on discrimination
Data;
Based on deep learning noise reduction model, noise reduction process is carried out to first initial audio data, after generating noise reduction
Voice data.
Above-mentioned voice data noise-reduction method, device, computer equipment and storage medium, by utilizing preset discrimination threshold
Value, combines feature corresponding with audio data to be processed and traverses, and obtains the feature combination for meeting discrimination threshold value, and benefit
The feature for meeting discrimination threshold value is combined with preset evaluation index and is screened, available feature combination is obtained, strengthens pair
The reliability that voice data and noise data distinguish.Using deep learning noise reduction model, to based at the beginning of the first of discrimination
Beginning audio data carries out noise reduction process, the voice data after obtaining noise reduction.And improving voice data and noise data differentiation
On the basis of degree, it is utilized trained deep learning noise reduction model, at the noise reduction for quickly and efficiently realizing voice data
Reason, further improves voice data noise reduction effect.
Detailed description of the invention
Fig. 1 is the application scenario diagram of voice data noise-reduction method in one embodiment;
Fig. 2 is the flow diagram of voice data noise-reduction method in one embodiment;
Fig. 3 is flow diagram the step of obtaining deep learning noise reduction model in one embodiment;
Fig. 4 is the structural block diagram of voice data denoising device in one embodiment;
Fig. 5 is the internal structure chart of computer equipment in one embodiment.
Specific embodiment
It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood
The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not
For limiting the application.
Voice data noise-reduction method provided by the present application, can be applied in application environment as shown in Figure 1.Wherein, eventually
End 102 is communicated by network with server 104.Server 104 receive that terminal 102 sends to audio data to be processed
Noise reduction request, obtains audio data to be processed, obtains the corresponding feature combination of audio data to be processed, and obtain in feature combination
Incidence relation between each feature combines the incidence relation between corresponding feature and each feature according to feature, calculates each feature
Combined discrimination.Server 104 screens the combination of each feature according to preset discrimination threshold value, obtains initial characteristics group
It closes.Initial characteristics combination is screened using preset evaluation index, obtains the available feature combination for meeting preset evaluation index,
It obtains available feature and combines corresponding audio data to be processed, generate the first initial audio data based on discrimination, and be based on
Deep learning noise reduction model carries out noise reduction process to the first initial audio data, the voice data after generating noise reduction.Wherein, eventually
End 102 can be, but not limited to be various personal computers, laptop, smart phone and tablet computer, and server 104 can be with
It is realized with the server cluster of the either multiple server compositions of independent server.
In one embodiment, as shown in Fig. 2, providing a kind of voice data noise-reduction method, it is applied to Fig. 1 in this way
In server for be illustrated, comprising the following steps:
S202 receives the noise reduction to audio data to be processed that terminal is sent and requests, and obtains audio data to be processed.
S204 obtains the corresponding feature combination of audio data to be processed, and obtains the feature and combine between interior each feature
Incidence relation, the incidence relation between corresponding feature and each feature is combined according to the feature, calculates the combination of each feature
Discrimination.
Specifically, audio data to be processed corresponding with noise reduction process request, corresponding multiple features can be multiple by obtaining
Incidence relation between feature generates corresponding feature combination according to multiple features and each incidence relation.Server can be by obtaining
The incidence relation between the corresponding feature of audio data to be processed and each feature is taken, according to the association between feature and each feature
Relationship generates feature corresponding with audio data to be processed and combines.So as to combine corresponding feature and Ge Te according to each feature
Incidence relation between sign calculates separately the discrimination of each feature combination.
In the present solution, the feature of audio data includes sample frequency, bit rate, port number, frame per second, zero-crossing rate and short
Shi Nengliang, wherein sample frequency indicated within the unit time, the number of Sample point collection was carried out in analog signal, to the mould taken
Point on quasi- signal assigns a number, can be transformed into digital signal.Bit rate indicates light ring of analog signal (influencing sound
The amplitude of loudness) different brackets that is divided.Port number indicates the number of channels of audio, and frame per second indicates voiced frame in the unit time
Number, a frame may include multiple sample sounds.Zero-crossing rate indicates the number of signal zero passage in every frame signal, for embodying audio
Frequecy characteristic.Short-time energy is for embodying audio signal in the degree of strength of different moments.Therefore, because different audio datas
The value of corresponding feature is different, therefore is combined according to the feature that the corresponding each feature of different audio datas generates, also not identical.
It further, further include being carried out at corresponding data to audio data to be processed before degree of distinguishing calculating
Reason is specifically included with improving the accuracy that each feature being calculated combines corresponding discrimination:
For the initial audio data of different types of data, different noise reduction process modes is executed, data type includes number
Font, byte type and text-type etc., corresponding noise reduction process mode are judgement processing, assignment processing harmony daylight reason.For number
Font initial audio data executes judgement processing, obtains preset value range, and preset value range and numeric type is initial
The value of audio data is compared, and judges whether the value of numeric type initial audio data meets preset value range, mentions
The numeric type initial audio data for meeting default value range is taken, the noise data in numeric type initial audio data is deleted, it is raw
At numeric type data available.
For byte type initial audio data, execute assignment processing, judge byte type initial audio data value whether
Meet default value, when the value of byte type initial audio data does not meet default value, default value is assigned to corresponding
Byte type initial audio data, and the noise data in the byte type initial audio data after being assigned is deleted, generate byte type
Data available.
For text-type initial audio data, statement processing is executed, the constituent of text-type initial audio data is obtained,
Be compared according to constituent and default constituent, when text-type initial audio data constituent and default composition at
When dividing inconsistent, text-type initial audio data is claimed as default constituent, is deleted in text-type initial audio data
Noise data generates text-type data available.
S206 screens the combination of each feature according to preset discrimination threshold value, obtains initial characteristics combination.Specifically
Ground, server are compared with discrimination threshold value respectively by the discrimination for combining each feature, are obtained and are exceeded discrimination threshold value
Discrimination corresponding feature combination, as initial characteristics combine.Wherein, discrimination threshold value is used for audio data pair to be processed
The feature combination answered is traversed and is screened, and then obtains the corresponding audio number to be processed of feature combination for meeting discrimination threshold value
According to.That is, server traverses each feature combination correspondence by obtaining preset discrimination threshold value, and according to discrimination threshold value
Discrimination generate initial characteristics combination to obtain the discrimination corresponding feature combination beyond discrimination threshold value.
Further, the noise data that threshold value is broken through in initial audio data can also be deleted.Wherein, the noise of threshold value is broken through
Data are that discrimination is lower than the corresponding data of initial discrimination threshold value, i.e., invalid audio data, and initial characteristics combine corresponding number
According to for the corresponding data of discrimination beyond discrimination threshold value, i.e. initial audio data.In the present solution, can be by discrimination threshold value
Range is set as 0.8 to 1, is the noise number that can not carry out noise reduction operation lower than the invalid audio data of discrimination threshold value 0.8
According to being also not belonging to effective audio data, execute delete processing.Initial audio data beyond discrimination threshold value 1 is to need to carry out
Noise reduction process, to generate effective audio data.
S208 screens initial characteristics combination using preset evaluation index, obtain meet preset evaluation index can
It is combined with feature.
Specifically, the preset evaluation index that server obtains, including AUC value, accuracy rate and recall rate, server root
According to the AUC value, accuracy rate and recall rate got, initial characteristics combination is screened, satisfactory initial spy is obtained
Sign combination generates available feature combination.
Wherein, AUC value is the size experienced below linearity curve, and full name Area Under Curve is defined as ROC
Area under a curve, value range is between 0.5 and 1.ROC curve is to experience linearity curve, and each point reflects identical on curve
Sensitivity, they are all the reactions to same signal stimulus, resulting under two different criterion as a result, subject
Performance curve is exactly using false positive probability as horizontal axis, and true positives are coordinate diagram composed by the longitudinal axis and subject specific
Due to curve that the Different Results obtained using different judgment criterias are drawn under incentive condition.
Accuracy rate (Precision) indicates for given test data set, sample number that classifier is correctly classified and total
The ratio between sample number, that is, loss function are accuracys rate when 0-1 loses in test data set.It may is that standard with formula expression
True rate=system retrieval to all total number of files retrieved of associated documents/system.
Recall rate (Recall) is the measurement of covering surface, and measurement has multiple positive examples to be divided into positive example.It can be expressed with formula
Are as follows: all relevant total number of files of associated documents/system that recall rate=system retrieval arrives.
Further, server obtains the corresponding relationship between AUC value and initial characteristics combination, and according to preset AUC
Value screens initial characteristics combination, obtains the initial characteristics combination for meeting preset AUC value.
Wherein, server can set 0.8 for AUC value, using AUC value 0.8 to initial according to the value range of AUC value
Feature combination is screened, and the initial characteristics combination for meeting AUC value is obtained.Server obtains accuracy rate and initial characteristics combine it
Between corresponding relationship, according to preset accuracy rate to initial characteristics combination screen, obtain meet preset accuracy rate just
The combination of beginning feature.Server obtains the corresponding relationship between recall rate and initial characteristics combination, according to preset recall rate to first
The combination of beginning feature is screened, and the initial characteristics combination for meeting preset recall rate is obtained.Finally, server is default according to meeting
Evaluation index AUC value, accuracy rate and recall rate initial characteristics combination, generate available feature combination.
S210 obtains available feature and combines corresponding audio data to be processed, generates the first initial sound based on discrimination
Frequency evidence.
S212 is based on deep learning noise reduction model, noise reduction process is carried out to the first initial audio data, after generating noise reduction
Voice data.
Specifically, server according to predetermined length to the first initial audio data carry out slicing treatment, and according to slice after
The first initial audio data, generate the first initial audio data voiceprint map to be processed, mentioned from voiceprint map to be processed
Take the vocal print parameter to be processed of the first initial audio data.So as to which vocal print parameter to be processed is inputted deep learning noise reduction model
In, the voice data after obtaining noise reduction.
Further, server is joined by obtaining the vocal print to be processed of the first initial audio data of pending noise reduction process
Number, and parameter to be processed is inputted in deep learning noise reduction model, by the second vocal print parameter and vocal print parameter to be processed progress
Match, obtain corresponding first initial audio data of vocal print parameter to be processed for meeting the second vocal print parameter, is joined using the first vocal print
Number carries out noise reduction process to corresponding first initial audio data of vocal print parameter to be processed for meeting the second vocal print parameter, obtains
Voice data after noise reduction.
In above-mentioned voice data noise-reduction method, by utilize preset discrimination threshold value, to audio data pair to be processed
The feature combination answered is traversed, and obtains the feature combination for meeting discrimination threshold value, and using preset evaluation index to meeting
The feature combination of discrimination threshold value is screened, and available feature combination is obtained, and is strengthened to voice data and noise data progress
The reliability of differentiation.Using deep learning noise reduction model, noise reduction process is carried out to the first initial audio data based on discrimination,
Voice data after obtaining noise reduction.And on the basis of improving voice data and noise data discrimination, it is utilized and trains
Deep learning noise reduction model, quickly and efficiently realize the noise reduction process of voice data, further improve voice data
Noise reduction effect.
In one embodiment, as shown in figure 3, the step of providing a kind of acquisition deep learning noise reduction model, comprising:
S302, obtained from training sample without departing from the feature of discrimination threshold value combine corresponding effective audio data and its
Corresponding second initial audio data.
Specifically, include in training sample combined without departing from the feature of discrimination threshold value corresponding effective audio data, with
Effective corresponding second initial audio data of audio data and the invalid audio data for breaking through threshold value.Server is deep according to training
The requirement of degree study noise reduction model, obtains effective audio data and corresponding with effective audio data second from training sample
Initial audio data.
S304 carries out slicing treatment to effective audio data and the second initial audio data respectively according to predetermined length.
Specifically, server also needs before carrying out slicing treatment to effective audio data and the second initial audio data
Effective audio data and the second initial audio data are pre-processed, obtain effective audio data and second of predetermined format
Initial audio data.Preset slice length is further obtained, and according to the predetermined length to effective sound of a pair of of predetermined format
Frequency evidence and the second initial audio data, carry out slicing treatment.
S306 generates the first voiceprint map of effective audio data according to effective audio data after slice, from the first sound
The first vocal print parameter of effective audio data is extracted in line map.
S308 generates the second voiceprint map of the second initial audio data according to the second initial audio data after slice,
The second vocal print parameter of the second initial audio data is extracted from the second voiceprint map.
Specifically, server generates effective respectively according to the effective audio data and the second initial audio data after slice
First voiceprint map of audio data and and corresponding second voiceprint map of initial audio data, and respectively from the first vocal print figure
The first vocal print parameter of effective audio data is extracted in spectrum, and the of the second initial audio data is extracted from the second voiceprint map
Two vocal print parameters.
Wherein, presently known of voiceprint map specifically includes that broadband vocal print, narrowband vocal print, amplitude vocal print, contour sound
Line, time wave spectrum vocal print and section vocal print (and dividing two kinds of broadband, narrowband).Wherein, the frequency of the first two kind display language and strong
The variation characteristic of degree over time, the feature that intermediate three kinds of display voice intensity or acoustic pressure change over time;Section vocal print is only
It is to show the vocal print figure for sometime putting upper intensity of acoustic wave and frequecy characteristic.
Vocal print parameter is the exclusive feature of an audio data, refers to the important acoustics that can represent a segment of audio data
The digital signature based on content of feature, main purpose are to establish a kind of effective mechanism to compare the perception of two audio datas
Acoustical quality, can include but is not limited to acoustic feature related with the anatomical structure of pronunciation mechanism of the mankind, such as frequency spectrum,
Cepstrum, formant, fundamental tone and reflection coefficient etc..
S310, using the second vocal print parameter of the second initial audio data as the input of deep learning model, corresponding moment
Effective audio data output of the first vocal print parameter as deep learning model, deep learning model is trained, is obtained
Obtain deep learning noise reduction model.
Specifically, the second vocal print parameter of the second initial audio data, as the input of deep learning model, in requisition for
The audio data of noise reduction process is carried out, the first vocal print parameter of effective audio data corresponding with the second initial audio data is made
For the output of deep learning model, the corresponding audio data after noise reduction process.By repeatedly obtaining effective sound from sample
Frequency evidence and the second initial audio data, and corresponding second vocal print parameter and the first vocal print parameter are extracted, to deep learning mould
Type is trained, and deep learning noise reduction model can be obtained.
In above-mentioned steps, server obtains corresponding without departing from the combination of the feature of discrimination threshold value effective from training sample
Audio data and its corresponding second initial audio data, and it is initial to effective audio data and second respectively according to predetermined length
Audio data carries out slicing treatment and generates the first voiceprint map of effective audio data according to effective audio data after slice,
The first vocal print parameter that effective audio data is extracted from the first voiceprint map, according to the second initial audio data after slice,
The second voiceprint map for generating the second initial audio data, extracts the second of the second initial audio data from the second voiceprint map
Vocal print parameter.So as to using the second vocal print parameter of the second initial audio data as the input of deep learning model, to it is corresponding when
Output of the first vocal print parameter of the effective audio data carved as deep learning model, realizes the instruction to deep learning model
Practice, obtains the deep learning noise reduction model that can be used for voice data, improve the noise reduction effect for voice data.
In one embodiment, provide it is a kind of obtain the corresponding feature combination of audio data to be processed, and calculate each spy
The step of levying combined discrimination, comprising:
Server obtains the incidence relation between the corresponding feature of audio data to be processed and each feature;According to feature and
Incidence relation between each feature generates feature corresponding with audio data to be processed and combines;It is combined according to each feature corresponding
Incidence relation between feature and each feature calculates separately the discrimination of each feature combination.
Specifically, audio data to be processed corresponding with noise reduction process request, corresponding multiple features can be multiple by obtaining
Incidence relation between feature generates corresponding feature combination according to multiple features and each incidence relation.Server can be by obtaining
The incidence relation between the corresponding feature of audio data to be processed and each feature is taken, according to the association between feature and each feature
Relationship generates feature corresponding with audio data to be processed and combines.So as to combine corresponding feature and Ge Te according to each feature
Incidence relation between sign calculates separately the discrimination of each feature combination.
In the present solution, the feature of audio data includes sample frequency, bit rate, port number, frame per second, zero-crossing rate and short
Shi Nengliang, wherein sample frequency indicated within the unit time, the number of Sample point collection was carried out in analog signal, to the mould taken
Point on quasi- signal assigns a number, can be transformed into digital signal.Bit rate indicates light ring of analog signal (influencing sound
The amplitude of loudness) different brackets that is divided.Port number indicates the number of channels of audio, and frame per second indicates voiced frame in the unit time
Number, a frame may include multiple sample sounds.Zero-crossing rate indicates the number of signal zero passage in every frame signal, for embodying audio
Frequecy characteristic.Short-time energy is for embodying audio signal in the degree of strength of different moments.Therefore, because different audio datas
The value of corresponding feature is different, therefore is combined according to the feature that the corresponding each feature of different audio datas generates, also not identical.
Further, audio data includes that corresponding each feature value is different, i.e. sample frequency, bit rate, port number, frame
Rate and the value of short-time energy are different, and the incidence relation between character pair is also inconsistent, therefore server passes through respectively
The different values and incidence relation of feature can calculate separately to obtain the discrimination between the feature combination of different audio datas.
In above-mentioned steps, server generates and audio data to be processed according to the incidence relation between feature and each feature
Corresponding feature combination, and the incidence relation between corresponding feature and each feature is combined according to each feature, calculate separately each spy
Levy combined discrimination.It realizes and is combined for different characteristic, corresponding discrimination is calculated separately, by sound effective in voice data
The noise data of frequency evidence and pending noise reduction process, quickly distinguishes, and improves work efficiency.
In one embodiment, it provides a kind of combine each feature according to preset discrimination threshold value to screen, obtain
The step of obtaining initial characteristics combination, comprising:
The discrimination that each feature combines is compared with discrimination threshold value server respectively;It obtains and exceeds discrimination threshold value
Discrimination corresponding feature combination, generate initial characteristics combination.
Specifically, server is compared with discrimination threshold value respectively by the discrimination for combining each feature, is obtained super
The corresponding feature combination of the discrimination of discrimination threshold value, as initial characteristics combine out.That is, server is pre- by obtaining
If discrimination threshold value, and each feature is traversed according to discrimination threshold value and combines corresponding discrimination, exceed discrimination threshold to obtain
The corresponding feature combination of the discrimination of value, generates initial characteristics combination.
Further, the noise data that threshold value is broken through in initial audio data can also be deleted.Wherein, the noise of threshold value is broken through
Data are that discrimination is lower than the corresponding data of initial discrimination threshold value, i.e., invalid audio data, and initial characteristics combine corresponding number
According to for the corresponding data of discrimination beyond discrimination threshold value, i.e. initial audio data.In the present solution, can be by discrimination threshold value
Range is set as 0.8 to 1, is the noise number that can not carry out noise reduction operation lower than the invalid audio data of discrimination threshold value 0.8
According to being also not belonging to effective audio data, execute delete processing.Initial audio data beyond discrimination threshold value 1 is to need to carry out
Noise reduction process, to generate effective audio data.
In above-mentioned steps, the discrimination that each feature combines is compared with discrimination threshold value server respectively, and obtains
The corresponding feature combination of discrimination beyond discrimination threshold value, generates initial characteristics combination.Due to consideration that each feature combination pair
The comparison of the discrimination and discrimination threshold value answered can delete the invalid data in audio data to be processed, acquisition need into
When the initial audio data of row noise reduction process, screening operation amount is reduced, is improved work efficiency.
In one embodiment, it provides a kind of combine initial characteristics using preset evaluation index to screen, obtain
The step of available feature combines, comprising:
Server obtains preset evaluation index;Preset evaluation index includes AUC value, accuracy rate and recall rate;According to AUC
Value, accuracy rate and recall rate screen initial characteristics combination;Satisfactory initial characteristics combination is obtained, generation can
It is combined with feature.
Specifically, AUC value is the size experienced below linearity curve, and full name Area Under Curve is defined as
Area under ROC curve, value range is between 0.5 and 1.Accuracy rate (Precision) is indicated for given test number
According to collection, the ratio between sample number and total number of samples that classifier is correctly classified, that is, loss function are test data sets when 0-1 loses
On accuracy rate.With formula expression may is that accuracy rate=system retrieval to all files retrieved of associated documents/system
Sum.Recall rate (Recall) is the measurement of covering surface, and measurement has multiple positive examples to be divided into positive example.It can be expressed as with formula:
All relevant total number of files of associated documents/system that recall rate=system retrieval arrives.
Further, server obtains the corresponding relationship between AUC value and initial characteristics combination, and according to preset AUC
Value screens initial characteristics combination, obtains the initial characteristics combination for meeting preset AUC value.
Wherein, server can set 0.8 for AUC value, using AUC value 0.8 to initial according to the value range of AUC value
Feature combination is screened, and the initial characteristics combination for meeting AUC value is obtained.Server obtains accuracy rate and initial characteristics combine it
Between corresponding relationship, according to preset accuracy rate to initial characteristics combination screen, obtain meet preset accuracy rate just
The combination of beginning feature.Server obtains the corresponding relationship between recall rate and initial characteristics combination, according to preset recall rate to first
The combination of beginning feature is screened, and the initial characteristics combination for meeting preset recall rate is obtained.Finally, server is default according to meeting
Evaluation index AUC value, accuracy rate and recall rate initial characteristics combination, generate available feature combination.
In above-mentioned steps, server is according to preset evaluation index AUC value, accuracy rate and recall rate, to initial characteristics group
Conjunction is screened, and available feature combination is generated, and is screened again to feature combination, is further improved initial audio data
Obtain efficiency.
In one embodiment, one kind is provided based on deep learning noise reduction model, and the first initial audio data is carried out
Noise reduction process, generate noise reduction after voice data the step of, comprising:
Server carries out slicing treatment to the first initial audio data according to predetermined length;It is initial according to first after slice
Audio data generates the voiceprint map to be processed of the first initial audio data, and it is initial that first is extracted from voiceprint map to be processed
The vocal print parameter to be processed of audio data;Vocal print parameter to be processed is inputted in deep learning noise reduction model, after obtaining noise reduction
Voice data.
Specifically, server is joined by obtaining the vocal print to be processed of the first initial audio data of pending noise reduction process
Number, and parameter to be processed is inputted in deep learning noise reduction model, by the second vocal print parameter and vocal print parameter to be processed progress
Match, obtain corresponding first initial audio data of vocal print parameter to be processed for meeting the second vocal print parameter, is joined using the first vocal print
Number carries out noise reduction process to corresponding first initial audio data of vocal print parameter to be processed for meeting the second vocal print parameter, obtains
Voice data after noise reduction.
In above-mentioned steps, server carries out slicing treatment to the first initial audio data according to predetermined length, according to slice
The first initial audio data afterwards generates the voiceprint map to be processed of the first initial audio data, from voiceprint map to be processed
The vocal print parameter to be processed for extracting the first initial audio data inputs vocal print parameter to be processed in deep learning noise reduction model,
Voice data after obtaining noise reduction.By utilizing deep learning noise reduction model, to the first initial audio data based on discrimination
Noise reduction process is carried out, voice data noise reduction effect is improved.
In one embodiment, a kind of voice data noise-reduction method is provided, further includes:
Server combines the corresponding relationship between data type according to each feature, obtains combine correspondence with each feature respectively
Data type;Data type includes numeric type, byte type and text-type;According between data type and data processing method
Corresponding relationship, obtain corresponding with data type data processing method;Data processing method includes judgement processing, assignment processing
And statement processing;According to each data processing method, data are carried out to the corresponding audio data to be processed of each feature combination respectively
Processing.
Specifically, for numeric type primary data, judgement processing is executed, preset value range is obtained, is taken preset
The value of value range and numeric type primary data is compared, and judges whether the value of numeric type primary data meets preset take
It is worth range, extracts the numeric type primary data for meeting default value range, deletes the noise data in numeric type primary data, it is raw
At numeric type data available.
For byte type primary data, assignment processing is executed, it is default to judge whether the value of byte type primary data meets
Default value is assigned to corresponding byte type initial number when the value of byte type primary data does not meet default value by value
According to, and the noise data in the byte type primary data after being assigned is deleted, generate byte type data available.
For text-type primary data, statement processing is executed, the constituent of text-type primary data is obtained, according to composition
Ingredient and default constituent are compared, when the constituent of text-type primary data and default constituent are inconsistent,
Text-type primary data is claimed as default constituent, the noise data in text-type primary data is deleted, generates text-type
Data available.
In above-mentioned steps, server is before the discrimination for carrying out each feature combination calculates, according to audio data to be processed
Different type, perform corresponding data prediction respectively, improve subsequent each feature and combine what corresponding discrimination calculated
Accuracy.
It should be understood that although each step in the flow chart of Fig. 2-3 is successively shown according to the instruction of arrow,
These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps
Execution there is no stringent sequences to limit, these steps can execute in other order.Moreover, at least one in Fig. 2-3
Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps
Completion is executed, but can be executed at different times, the execution sequence in these sub-steps or stage is also not necessarily successively
It carries out, but can be at least part of the sub-step or stage of other steps or other steps in turn or alternately
It executes.
In one embodiment, as shown in figure 4, providing a kind of voice data denoising device, comprising: receiving module 402,
Discrimination computing module 404, initial characteristics combination obtain module 406, available feature combination obtains module 408, initial audio number
According to generation module 410 and noise reduction module 412, in which:
Receiving module 402, the noise reduction to audio data to be processed for receiving terminal transmission is requested, and is obtained to be processed
Audio data;
Discrimination computing module 404, for obtaining the corresponding feature combination of the audio data to be processed, and described in acquisition
Feature combines the incidence relation between interior each feature, combines the association between corresponding feature and each feature according to the feature and closes
System calculates the discrimination of each feature combination.
Initial characteristics combination obtains module 406, for being screened according to preset discrimination threshold value to the combination of each feature,
Obtain initial characteristics combination;
Available feature combination obtains module 408, for being screened using preset evaluation index to initial characteristics combination, obtains
The available feature combination of preset evaluation index must be met;
Initial audio data generation module 410 combines corresponding audio data to be processed for obtaining available feature, generates
The first initial audio data based on discrimination;
Noise reduction module 412 carries out noise reduction process to the first initial audio data for being based on deep learning noise reduction model,
Voice data after generating noise reduction.
Above-mentioned voice data denoising device, by utilizing preset discrimination threshold value, to corresponding with audio data to be processed
Feature combination traversed, obtain the feature combination for meeting discrimination threshold value, and using preset evaluation index to meeting area
The feature combination of indexing threshold value is screened, and is obtained available feature combination, is strengthened and carry out area to voice data and noise data
The reliability divided.Using deep learning noise reduction model, noise reduction process is carried out to the first initial audio data based on discrimination, is obtained
Voice data after obtaining noise reduction.And on the basis of improving voice data and noise data discrimination, it is utilized trained
Deep learning noise reduction model, quickly and efficiently realizes the noise reduction process of voice data, further improves voice data drop
It makes an uproar effect.
In one embodiment, a kind of deep learning noise reduction model training module is provided, is also used to:
It is obtained from training sample and combines corresponding effective audio data and its correspondence without departing from the feature of discrimination threshold value
The second initial audio data;Effective audio data and the second initial audio data are carried out at slice respectively according to predetermined length
Reason;According to effective audio data after slice, the first voiceprint map of effective audio data is generated, is mentioned from the first voiceprint map
Take the first vocal print parameter of effective audio data;According to the second initial audio data after slice, the second initial audio number is generated
According to the second voiceprint map, the second vocal print parameter of the second initial audio data is extracted from the second voiceprint map;It will be at the beginning of second
Input of the second vocal print parameter of beginning audio data as deep learning model, the first sound of effective audio data at corresponding moment
Output of the line parameter as deep learning model, is trained deep learning model, obtains deep learning noise reduction model.
Above-mentioned deep learning noise reduction model training module, by using the second vocal print parameter of the second initial audio data as
The input of deep learning model, the first vocal print parameter of effective audio data at corresponding moment is as the defeated of deep learning model
Out, realize the training to deep learning model, obtain the deep learning noise reduction model that can be used for voice data, improve for
The noise reduction effect of voice data.
In one embodiment, a kind of discrimination computing module is provided, is also used to:
Obtain the incidence relation between the corresponding feature of audio data to be processed and each feature;According to feature and each feature
Between incidence relation, generate corresponding with audio data to be processed feature and combine;According to each feature combine corresponding feature and
Incidence relation between each feature calculates separately the discrimination of each feature combination.
Above-mentioned discrimination computing module realizes and combines for different characteristic, corresponding discrimination calculated separately, by voice
The noise data of effective audio data and pending noise reduction process, quickly distinguishes, improves work efficiency in data.
In one embodiment, a kind of initial characteristics combination acquisition module is provided, is also used to:
The discrimination that each feature combines is compared with discrimination threshold value respectively;Obtain the differentiation for exceeding discrimination threshold value
Corresponding feature combination is spent, initial characteristics combination is generated.
Above-mentioned initial characteristics combination obtains module, it is contemplated that each feature combines the ratio of corresponding discrimination and discrimination threshold value
It is right, the invalid data in audio data to be processed can be deleted, when acquisition needs to carry out the initial audio data of noise reduction process,
Screening operation amount is reduced, is improved work efficiency.
In one embodiment, a kind of available feature acquisition module is provided, is also used to:
Obtain preset evaluation index;Preset evaluation index includes AUC value, accuracy rate and recall rate;According to AUC value, standard
True rate and recall rate screen initial characteristics combination;Satisfactory initial characteristics combination is obtained, available feature is generated
Combination.
Above-mentioned available feature obtains module, and server is right according to preset evaluation index AUC value, accuracy rate and recall rate
Initial characteristics combination is screened, and available feature combination is generated, and is screened again to feature combination, is further improved initial
The acquisition efficiency of audio data.
In one embodiment, a kind of noise reduction module is provided, is also used to:
Slicing treatment is carried out to the first initial audio data according to predetermined length;According to the first initial audio number after slice
According to the voiceprint map to be processed of the first initial audio data of generation extracts the first initial audio number from voiceprint map to be processed
According to vocal print parameter to be processed;Vocal print parameter to be processed is inputted in deep learning noise reduction model, the voice number after obtaining noise reduction
According to.
Above-mentioned noise reduction module, by utilizing deep learning noise reduction model, to the first initial audio data based on discrimination
Noise reduction process is carried out, voice data noise reduction effect is improved.
In one embodiment, a kind of data processing module is provided, is also used to:
According to the corresponding relationship between the combination of each feature and data type, obtains combine corresponding data with each feature respectively
Type;Data type includes numeric type, byte type and text-type;According to the correspondence between data type and data processing method
Relationship obtains data processing method corresponding with data type;Data processing method includes judgement processing, assignment processing and sound
Daylight reason;According to each data processing method, data processing is carried out to the corresponding audio data to be processed of each feature combination respectively.
Above-mentioned data processing module, server is before the discrimination for carrying out each feature combination calculates, according to sound to be processed
The different type of frequency evidence performs corresponding data prediction respectively, improves subsequent each feature and combines corresponding discrimination
The accuracy of calculating.
Specific about voice data denoising device limits the limit that may refer to above for voice data noise-reduction method
Fixed, details are not described herein.Modules in above-mentioned voice data denoising device can fully or partially through software, hardware and its
Combination is to realize.Above-mentioned each module can be embedded in the form of hardware or independently of in the processor in computer equipment, can also be with
It is stored in the memory in computer equipment in a software form, in order to which processor calls the above modules of execution corresponding
Operation.
In one embodiment, a kind of computer equipment is provided, which can be server, internal junction
Composition can be as shown in Figure 5.The computer equipment include by system bus connect processor, memory, network interface and
Database.Wherein, the processor of the computer equipment is for providing calculating and control ability.The memory packet of the computer equipment
Include non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program and data
Library.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating
The database of machine equipment is used for voice data noise reduction data.The network interface of the computer equipment is used for and external terminal
It is communicated by network connection.To realize a kind of voice data noise-reduction method when the computer program is executed by processor.
It will be understood by those skilled in the art that structure shown in Fig. 5, only part relevant to application scheme is tied
The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme, specific computer equipment
It may include perhaps combining certain components or with different component layouts than more or fewer components as shown in the figure.
In one embodiment, a kind of computer equipment, including memory and processor are provided, which is stored with
Computer program, the processor realize the step in above-mentioned each embodiment of the method when executing computer program.
In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated
Machine program realizes the step in above-mentioned each embodiment of the method when being executed by processor.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer
In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein,
To any reference of memory, storage, database or other media used in each embodiment provided herein,
Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM
(PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include
Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms,
Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing
Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM
(RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
Each technical characteristic of above embodiments can be combined arbitrarily, for simplicity of description, not to above-described embodiment
In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance
Shield all should be considered as described in this specification.
The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously
It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art
It says, without departing from the concept of this application, various modifications and improvements can be made, these belong to the protection of the application
Range.Therefore, the scope of protection shall be subject to the appended claims for the application patent.
Claims (10)
1. a kind of voice data noise-reduction method, which comprises
It receives the noise reduction to audio data to be processed that terminal is sent to request, and obtains the audio data to be processed;
The corresponding feature combination of the audio data to be processed is obtained, and obtains the association in the feature combination between each feature
Relationship combines the incidence relation between corresponding feature and each feature according to the feature, calculates the area of each feature combination
Indexing;
Each feature combination is screened according to preset discrimination threshold value, obtains initial characteristics combination;
Initial characteristics combination is screened using preset evaluation index, obtains and meets the available of the preset evaluation index
Feature combination;
It obtains the available feature and combines corresponding audio data to be processed, generate the first initial audio number based on discrimination
According to;
Based on deep learning noise reduction model, noise reduction process is carried out to first initial audio data, the voice after generating noise reduction
Data.
2. the method according to claim 1, wherein obtaining the mode of the deep learning noise reduction model, comprising:
It is obtained from training sample and combines corresponding effective audio data and its correspondence without departing from the feature of the discrimination threshold value
The second initial audio data;
Slicing treatment is carried out to effective audio data and second initial audio data respectively according to predetermined length;
According to effective audio data after slice, the first voiceprint map of effective audio data is generated, from described the
The first vocal print parameter of effective audio data is extracted in one voiceprint map;
According to second initial audio data after slice, the second voiceprint map of second initial audio data is generated,
The second vocal print parameter of second initial audio data is extracted from second voiceprint map;
Using the second vocal print parameter of second initial audio data as the input of deep learning model, the effective of moment is corresponded to
Output of the first vocal print parameter of audio data as deep learning model, is trained deep learning model, obtains depth
Learn noise reduction model.
3. the method according to claim 1, wherein obtaining the corresponding feature group of the audio data to be processed
It closes, and obtains the incidence relation in the feature combination between each feature, corresponding feature and Ge Te are combined according to the feature
Incidence relation between sign calculates the discrimination of each feature combination, comprising:
Obtain the incidence relation between the corresponding feature of the audio data to be processed and each feature;
According to the incidence relation between the feature and each feature, feature group corresponding with the audio data to be processed is generated
It closes;
The incidence relation between corresponding feature and each feature is combined according to each feature, calculates separately the area of each feature combination
Indexing.
4. according to claim 1 to method described in 3 any one, which is characterized in that described according to preset discrimination threshold value
Each feature combination is screened, initial characteristics combination is obtained, comprising:
The discrimination of each feature combination is compared with the discrimination threshold value respectively;
The corresponding feature combination of discrimination beyond the discrimination threshold value is obtained, initial characteristics combination is generated.
5. according to claim 1 to method described in 3 any one, which is characterized in that described to utilize preset evaluation index to institute
It states initial characteristics combination to be screened, obtains the available feature combination for meeting the preset evaluation index, comprising:
Obtain preset evaluation index;The preset evaluation index includes AUC value, accuracy rate and recall rate;
According to the AUC value, accuracy rate and recall rate, initial characteristics combination is screened;
Satisfactory initial characteristics combination is obtained, available feature combination is generated.
6. according to the method described in claim 2, it is characterized in that, described be based on deep learning noise reduction model, to described first
Initial audio data carries out noise reduction process, the voice data after generating noise reduction, comprising:
Slicing treatment is carried out to first initial audio data according to predetermined length;
According to first initial audio data after slice, the vocal print figure to be processed of first initial audio data is generated
Spectrum, extracts the vocal print parameter to be processed of first initial audio data from the voiceprint map to be processed;
The vocal print parameter to be processed is inputted in the deep learning noise reduction model, the voice data after obtaining noise reduction.
7. the method according to claim 1, wherein the step of calculating the discrimination of each feature combination it
Before, further includes:
According to the corresponding relationship between each feature combination and data type, acquisition is combined corresponding with each feature respectively
Data type;The data type includes numeric type, byte type and text-type;
According to the corresponding relationship between the data type and data processing method, data corresponding with the data type are obtained
Processing mode;The data processing method includes judgement processing, assignment processing and statement processing;
According to each data processing method, data processing is carried out to the corresponding audio data to be processed of each feature combination respectively.
8. a kind of voice data denoising device, which is characterized in that described device includes:
Receiving module, the noise reduction to audio data to be processed for receiving terminal transmission is requested, and obtains the sound to be processed
Frequency evidence;
Discrimination computing module for obtaining the corresponding feature combination of the audio data to be processed, and obtains the feature group
Incidence relation in closing between each feature, combines the incidence relation between corresponding feature and each feature according to the feature, counts
Calculate the discrimination of each feature combination;
Initial characteristics combination obtains module, for being screened according to preset discrimination threshold value to each feature combination, obtains
Obtain initial characteristics combination;
Available feature combination obtains module, for being screened using preset evaluation index to initial characteristics combination, obtains
Meet the available feature combination of the preset evaluation index;
Initial audio data generation module combines corresponding audio data to be processed for obtaining the available feature, generates base
In the first initial audio data of discrimination;
Noise reduction module carries out noise reduction process to first initial audio data, generates for being based on deep learning noise reduction model
Voice data after noise reduction.
9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists
In the step of processor realizes any one of claims 1 to 7 the method when executing the computer program.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program
The step of method described in any one of claims 1 to 7 is realized when being executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910650447.6A CN110335616A (en) | 2019-07-18 | 2019-07-18 | Voice data noise-reduction method, device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910650447.6A CN110335616A (en) | 2019-07-18 | 2019-07-18 | Voice data noise-reduction method, device, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110335616A true CN110335616A (en) | 2019-10-15 |
Family
ID=68146065
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910650447.6A Pending CN110335616A (en) | 2019-07-18 | 2019-07-18 | Voice data noise-reduction method, device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110335616A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107068161A (en) * | 2017-04-14 | 2017-08-18 | 百度在线网络技术(北京)有限公司 | Voice de-noising method, device and computer equipment based on artificial intelligence |
CN109471853A (en) * | 2018-09-18 | 2019-03-15 | 平安科技(深圳)有限公司 | Data noise reduction, device, computer equipment and storage medium |
WO2019112468A1 (en) * | 2017-12-08 | 2019-06-13 | Huawei Technologies Co., Ltd. | Multi-microphone noise reduction method, apparatus and terminal device |
-
2019
- 2019-07-18 CN CN201910650447.6A patent/CN110335616A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107068161A (en) * | 2017-04-14 | 2017-08-18 | 百度在线网络技术(北京)有限公司 | Voice de-noising method, device and computer equipment based on artificial intelligence |
WO2019112468A1 (en) * | 2017-12-08 | 2019-06-13 | Huawei Technologies Co., Ltd. | Multi-microphone noise reduction method, apparatus and terminal device |
CN109471853A (en) * | 2018-09-18 | 2019-03-15 | 平安科技(深圳)有限公司 | Data noise reduction, device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020177380A1 (en) | Voiceprint detection method, apparatus and device based on short text, and storage medium | |
US11004461B2 (en) | Real-time vocal features extraction for automated emotional or mental state assessment | |
CN112818892B (en) | Multi-modal depression detection method and system based on time convolution neural network | |
CN107492382B (en) | Voiceprint information extraction method and device based on neural network | |
CN110120224B (en) | Method and device for constructing bird sound recognition model, computer equipment and storage medium | |
CN110782872A (en) | Language identification method and device based on deep convolutional recurrent neural network | |
WO2021179717A1 (en) | Speech recognition front-end processing method and apparatus, and terminal device | |
WO2018223727A1 (en) | Voiceprint recognition method, apparatus and device, and medium | |
CN108022587A (en) | Audio recognition method, device, computer equipment and storage medium | |
CN111433847A (en) | Speech conversion method and training method, intelligent device and storage medium | |
Faundez-Zanuy et al. | Nonlinear speech processing: overview and applications | |
CN108922561A (en) | Speech differentiation method, apparatus, computer equipment and storage medium | |
CN108922543A (en) | Model library method for building up, audio recognition method, device, equipment and medium | |
CN113646833A (en) | Voice confrontation sample detection method, device, equipment and computer readable storage medium | |
EP3729419A1 (en) | Method and apparatus for emotion recognition from speech | |
CN113470688B (en) | Voice data separation method, device, equipment and storage medium | |
KR102204975B1 (en) | Method and apparatus for speech recognition using deep neural network | |
WO2021134591A1 (en) | Speech synthesis method, speech synthesis apparatus, smart terminal and storage medium | |
CN117542373A (en) | Non-air conduction voice recovery system and method | |
CN110619886B (en) | End-to-end voice enhancement method for low-resource Tujia language | |
CN110335616A (en) | Voice data noise-reduction method, device, computer equipment and storage medium | |
CN113869212A (en) | Multi-modal in-vivo detection method and device, computer equipment and storage medium | |
CN113889073A (en) | Voice processing method, device, electronic equipment and storage medium | |
CN113012680A (en) | Speech technology synthesis method and device for speech robot | |
CN116959421B (en) | Method and device for processing audio data, audio data processing equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |