CN110691296B - Channel mapping method for built-in earphone of microphone - Google Patents

Channel mapping method for built-in earphone of microphone Download PDF

Info

Publication number
CN110691296B
CN110691296B CN201911183807.2A CN201911183807A CN110691296B CN 110691296 B CN110691296 B CN 110691296B CN 201911183807 A CN201911183807 A CN 201911183807A CN 110691296 B CN110691296 B CN 110691296B
Authority
CN
China
Prior art keywords
channel
voice
frequency
mapping
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911183807.2A
Other languages
Chinese (zh)
Other versions
CN110691296A (en
Inventor
何敏
王鹏
戴伟彬
陈光勤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yueersheng Acoustics Co ltd
Original Assignee
Shenzhen Yueersheng Acoustics Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Yueersheng Acoustics Co ltd filed Critical Shenzhen Yueersheng Acoustics Co ltd
Priority to CN201911183807.2A priority Critical patent/CN110691296B/en
Publication of CN110691296A publication Critical patent/CN110691296A/en
Application granted granted Critical
Publication of CN110691296B publication Critical patent/CN110691296B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/10Earpieces; Attachments therefor ; Earphones; Monophonic headphones
    • H04R1/1083Reduction of ambient noise
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/10Earpieces; Attachments therefor ; Earphones; Monophonic headphones
    • H04R1/1058Manufacture or assembly
    • H04R1/1075Mountings of transducers in earphones or headphones

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Manufacturing & Machinery (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a channel mapping method of a built-in earphone of a microphone, belonging to the technical field of earphones. The channel mapping method of the microphone built-in earphone adopts the channel mapping based on the characteristic statistics, establishes a mapping model between an internal channel and an original channel, and realizes the correction of the internal channel, and the specific steps are as follows: (1) feature extraction based on LPC; (2) channel mapping based on formant modification; (3) LPC based speech synthesis. The channel mapping method of the invention adopts the channel mapping based on the characteristic statistics, establishes the mapping model between the internal channel and the original channel, realizes the correction of the internal channel, and greatly improves the tone quality effect of the built-in earphone of the microphone.

Description

Channel mapping method for built-in earphone of microphone
Technical Field
The invention belongs to the technical field of earphones, and particularly relates to a channel mapping method for a microphone built-in earphone.
Background
At present, the earphones on the market are mostly arranged in an external microphone mode; the earphones adopting the arrangement adopt the original channel to transmit voice signals; the original channel carries speech signals with large noise interference.
In the market, aiming at the defects of the earphone with the external microphone, the earphone with the built-in microphone is also provided. Domestic patent 201920436026.9 discloses a bluetooth pronunciation interactive headset of built-in microphone, including earphone shell, the circuit board of setting in earphone shell, the central processing unit of setting on the circuit board, set up the bluetooth chip of being connected with central processing unit on the circuit board and set up the loudspeaker on earphone shell, be provided with the earlobe shell in earphone shell's the lower part shell, the earlobe shell encloses with earphone shell's inner wall and closes and form the holding chamber, the holding intracavity is provided with the microphone, the holding intracavity still is provided with the backup pad that is used for fixed microphone, the backup pad can be used for that terminal surface of radio reception to earphone shell's inner wall with the microphone, pick up the sound hole on earphone shell's the inner wall, pick up that terminal surface that corresponding microphone is used for the radio reception. The utility model discloses when preventing that the earphone from dropping that can be great, the earlobe shell is direct and ground contact, leads to the ear hammer shell to damage to it is not good to make microphone radio reception effect. The utility model discloses a microphone is built-in to be set up only structurally, but can have the transmission of original channel, two strands of signals of inside channel after the microphone is built-in to the transmission of inside channel can have the not "natural" situation of audio. Therefore, a method for correcting the speech transmission of the internal channel is needed to solve the problem of speech transmission distortion of the built-in microphone.
Disclosure of Invention
The invention aims to overcome the defects of the earphone transmitting noise reduction method and provides a channel mapping method of a built-in earphone of a microphone. The channel mapping method of the invention adopts the channel mapping based on the characteristic statistics, establishes a mapping model between the internal channel and the original channel, and realizes the correction of the internal channel.
The channel mapping method of the microphone built-in earphone is applied to installing the microphone in an independent cavity inside the earphone, and the influence of ambient noise on call voice is greatly reduced. The headset described above receives a speech signal thus forming two channels: the first channel is an external channel (also called original channel), namely an original channel formed by glottis, sound channel and lip radiation under normal conditions; the signal is mixed with external noise, attenuated by the earphone shell and collected by the built-in microphone; the second channel is an internal channel, namely, a signal is conducted to the internal channel of the built-in microphone through the glottis, the sound channel, the nasal cavity and the head cavity and finally through the auricle; the signal of the second channel is isolated by the earphone shell, so that the external noise signal is well attenuated, the signal-to-noise ratio is remarkably enhanced, the medium-frequency and high-frequency external noise has a deeper noise reduction effect, and the wind noise is well inhibited; however, although the signal-to-noise ratio of the internal channel is enhanced, the speech characteristics of the internal channel have a certain difference compared with the original channel, and it is particularly necessary to implement the correction of the internal channel by establishing a mapping model between the internal channel and the original channel.
For the microphone-embedded headset, the speech signal entering the internal speech channel is the mapping-modified and finally output speech signal. Because the voice signal of the original channel still has partial residue after being attenuated by the earphone shell, the voice signal of the original channel has the same frequency spectrum with the internal channel, has different time delays, and can form coherent interference. Meanwhile, the pickup microphone is designed as an independent sealed cavity, so that reverberation with certain strength can be generated, and the definition of voice is influenced. These noises need to be suppressed in the pre-processing stage for subsequent processing of the speech signal. Meanwhile, because the internal channel has a great difference from the original channel structure, its voice characteristics, such as formants, tone contours, etc., are changed compared with the original voice channel. This difference affects the auditory perception of the speech, making the speech sound less "natural", and also causes the frequency response of each segment of the speech to differ from the original channel, for example, the low frequency energy is too large, and the high frequency energy is relatively low, which also affects the auditory perception. The voice of the internal channel needs to be processed accordingly before being output, so as to restore the natural state of the original voice channel. Aiming at the characteristics of the internal channel, the invention establishes a mapping model between the internal channel and the original channel by adopting channel mapping based on characteristic statistics, thereby realizing the correction of the internal channel and ensuring that the voice transmitted by the internal channel does not have distortion.
The technical scheme of the invention is realized by the following modes:
a channel mapping method of a microphone built-in earphone adopts channel mapping based on characteristic statistics, establishes a mapping model between an internal channel and an original channel, and realizes correction of the internal channel, and comprises the following specific steps:
(1) feature extraction based on LPC (Linear predictive coding)
The feature extraction is performed by using a system model as shown in formula I:
Figure GDA0002722037920000031
wherein: z represents the frequency domain, H (z) is the transfer function of the inner channel;
g, representing the filter gain;
akexpressing coefficients of a linear constant coefficient difference equation, wherein k represents the kth moment of the discrete time system;
p, representing the order of the system;
(2) channel mapping based on formant modification;
(3) LPC-based speech synthesis;
the channel mapping based on formant correction described in step (2) specifically includes the following steps:
dividing the speech frequency spectrum into 9 partitions according to the ISO Octave (8-degree interval) audio partition standard, wherein the resolution of the frequency spectrum is 50Hz, so that low frequencies are combined, each frequency band of medium and high frequencies is provided with two bins, the total number of 15 bins related to frequency space is calculated, and a variable f is usedkRepresents; the ISO Octave audio frequency partition standard is a frequency band partition method of octaves;
establishing mapping vector, selecting fk,Δfc,Δfw4 features of Δ A as variables for mapping the internal voice channel to the original voice channel frequency space, and the mapping vector is M (f)k,Δfc,Δfw,ΔA);
Wherein:
fkrepresenting a frequency space, counting 15 elements;
Δfcrepresenting the variation of the center frequency of the resonance peak, distinguishing positive and negative, taking the values as +1 and-1, and counting 2 elements;
Δfwrepresenting the variation of resonance peak Q value, and distinguishing the positive and negative values in the range of [ -4,4 []Every 0.25 is a gradient, counting 33 elements;
delta A represents the variation of the power of the formant and distinguishes positive and negative; determining one element every 2 dB; setting the difference between the maximum value and the minimum value to be 30dB, and calculating 16 elements;
in summary, vector M (f) is mappedk,Δfc,ΔfwΔ a), 4-dimensional, 15840 feature descriptors in total;
deducing the position of a resonance peak based on an LPC method;
fourthly, statistical analysis, under the same condition, comparing the position and the shape of the adjacent formants of the internal channel and the original channel, and mapping the vector M (f)k,Δfc,ΔfwDelta A), making quantitative statistics on the difference, and enabling the result of each measurement and comparison to fall into a corresponding descriptor; taking the results of multiple comparison tests as a training sample set;
fifthly, correcting the formants and the waveforms of the internal channels by using the statistical results, performing interpolation calculation, and correcting the voice frequency spectrum curve;
sixthly, inverse IFFT is carried out to recover time domain signal output, and a can be calculated according to the requirementkAnd G (see formula I) for LPC encoding.
Wherein:
the feature extraction in the step (1) is based on short-time frame-by-frame operation, the voice sampling frequency in the feature extraction is 22KHz, short-time signals of 20ms are used for forming a frame, a sliding hamming window is adopted to intercept voice signals for STFT (short-time Fourier transform), and the window interval is 10 ms;
as a preferred implementation mode, in the step (1), a linear prediction equation set is established by adopting an inverse filtering method and an LMSE (least mean square error) criterion, and the pairs G, { a ] are based on an autocorrelation method and a Levinson-Durbin algorithmkCarry out recursion calculation.
G, { a ] of the autocorrelation-based and Levinson-Durbin algorithm pair described in step (1)kThe recursive calculation process is performed, as shown in fig. 2, where rn (i) represents the ith autocorrelation coefficient of the nth frame,
Figure GDA0002722037920000041
is the coefficient a of the difference equationkIs estimated.
Firstly, carrying out windowing truncation processing on a speech signal to form a speech frame, then calculating an autocorrelation coefficient Rn (i) of each frame signal, wherein i is 1,21,a2,...,apAnd a gain G.
In the step (2), the speech spectrum is divided into 9 partitions, and the specific speech spectrum segments are shown in table 1:
TABLE 1 Speech Spectrum segmentation
Lower frequency limit (Hz) Geometric mean center frequency (Hz) Upper limit of frequency (Hz)
22 31.5 44
44 63 88
88 125 177
177 250 355
355 500 710
710 1000 1420
1420 2000 2840
2480 4000 5680
5680 8000 9000
And (3) synthesizing the voice signal by adopting a multi-stage AR parameter model based on the voice synthesis of LPC.
In the step (3), the multi-order AR parameter model synthesizes the voice signal, and the specific process is as follows:
1) from extracted p-order { akAnd EnConstructing a single-frame speech power spectrum based on the AR parameter model, as shown in the following formula II:
Figure GDA0002722037920000051
wherein n represents an nth frame speech signal; l represents the l frequency point of the voice signal, and k represents the k time; m represents the number of samples of a frame of speech signal;
and: a is0=1,
Figure GDA0002722037920000052
0≤l≤M-1,p<M;
In the above formula
Figure GDA0002722037920000053
The vector represents the power spectrum density with frequency w, w is 2 pi l/M, and M is the number of sampling points of one frame;
2) calculating each denominator item corresponding to the value l by using STFT (short-time Fourier transform), and finally obtaining a voice power spectrum of a frame;
3) and calculating a voice time domain signal by IFFT (inverse Fourier transform) to complete the voice synthesis process.
On the basis of structural design and acoustic design, the definition, the intelligibility and the like of voice transmission on the full frequency band are obviously enhanced through a core algorithm of channel mapping. The invention uses DSP chip of ADI company and MCU of STM32 series; the DSP chip is the core for realizing the algorithm, the MCU realizes the starting and the control of the DSP chip, and the engineering file is loaded in the FLASH inside the MCU.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the channel mapping method of the microphone built-in earphone is applied to installing the microphone in an independent cavity inside the earphone, and the influence of ambient noise on call voice is greatly reduced. However, although the signal-to-noise ratio of the internal channel is enhanced, the voice characteristics of the internal channel have a certain difference compared with the original channel, and a mapping model is established between the internal channel and the original channel to realize the correction of the internal channel.
2. The invention adopts the channel mapping based on the characteristic statistics, establishes a mapping model between the internal channel and the original channel, and realizes the correction of the internal channel.
Drawings
Fig. 1 is a schematic flow chart of an algorithm for enhancing a speech based on a built-in microphone according to embodiment 1 of the present invention.
FIG. 2 is a schematic diagram of an LPC-based feature extraction process according to embodiment 1 of the present invention; wherein n is the n-th frame,
Figure GDA0002722037920000061
is akIs estimated.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example 1
The embodiment of the invention provides a channel mapping method of a microphone built-in earphone, which adopts channel mapping based on characteristic statistics to establish a mapping model between an internal channel and an original channel so as to realize the correction of the internal channel.
The channel mapping method of the microphone built-in earphone is applied to installing the microphone in an independent cavity inside the earphone, and the influence of ambient noise on call voice is greatly reduced. The headset described above receives a speech signal thus forming two channels: the first channel is an external channel (also called original channel), namely an original channel formed by glottis, sound channel and lip radiation under normal conditions; the signal is mixed with external noise, attenuated by the earphone shell and collected by the built-in microphone; the second channel is an internal channel, namely, a signal is conducted to the internal channel of the built-in microphone through the glottis, the sound channel, the nasal cavity and the head cavity and finally through the auricle; the signal of the second channel is isolated by the earphone shell, so that the external noise signal is well attenuated, the signal-to-noise ratio is remarkably enhanced, the medium-frequency and high-frequency external noise has a deeper noise reduction effect, and the wind noise is well inhibited; however, although the signal-to-noise ratio of the internal channel is enhanced, the speech characteristics of the internal channel have a certain difference compared with the original channel, and it is particularly necessary to implement the correction of the internal channel by establishing a mapping model between the internal channel and the original channel.
For the microphone-embedded headset, the speech signal entering the internal speech channel is the mapping-modified and finally output speech signal. Because the voice signal of the original channel still has partial residue after being attenuated by the earphone shell, the voice signal of the original channel has the same frequency spectrum with the internal channel, has different time delays, and can form coherent interference. Meanwhile, the pickup microphone is designed as an independent sealed cavity, so that reverberation with certain strength can be generated, and the definition of voice is influenced. These noises need to be suppressed in the pre-processing stage for subsequent processing of the speech signal. Meanwhile, because the internal channel has a great difference from the original channel structure, its voice characteristics, such as formants, tone contours, etc., are changed compared with the original voice channel. This difference affects the auditory perception of the speech, making the speech sound less "natural", and also causes the frequency response of each segment of the speech to differ from the original channel, for example, the low frequency energy is too large, and the high frequency energy is relatively low, which also affects the auditory perception. The voice of the internal channel needs to be processed accordingly before being output, so as to restore the natural state of the original voice channel. Aiming at the characteristics of the internal channel, the invention establishes a mapping model between the internal channel and the original channel by adopting channel mapping based on characteristic statistics, thereby realizing the correction of the internal channel and preventing the voice transmitted by the internal channel from generating tone distortion.
As shown in fig. 1, based on the structural design and the acoustic design, the present invention significantly enhances the intelligibility and intelligibility of speech transmission over the full frequency band through the core algorithm of channel mapping. The invention uses DSP chip of ADI company and MCU of STM32 series; the DSP chip is the core for realizing the algorithm, the MCU realizes the starting and the control of the DSP chip, and the engineering file is loaded in the FLASH inside the MCU.
The technical scheme of the invention is realized by the following modes:
a channel mapping method of a microphone built-in earphone adopts channel mapping based on characteristic statistics, establishes a mapping model between an internal channel and an original channel, and realizes correction of the internal channel, and comprises the following specific steps:
(1) feature extraction based on LPC (Linear predictive coding)
The present invention uses LPC for speech extraction and later speech synthesis output because LPC can extract almost all spectral characteristics including the frequency, bandwidth and amplitude of formants except the pitch period, and provides the total pitch contour required for generating voice from word concatenation and the prosodic characteristics of speech.
The speech signal is a non-stationary signal, but the speech characteristics change slowly in the short time (20ms-40ms), during which the glottic period, the vocal tract shape and its transfer function can be considered approximately constant; therefore, the processes of feature extraction, channel mapping, speech synthesis and the like involved in the invention are all operated on the basis of short-time frames frame by frame; in the feature extraction, the voice sampling frequency is 22KHz, short-time signals of 20ms are used for forming a frame, a sliding hamming window is adopted to intercept the voice signals for STFT (short-time Fourier transform), and the window interval is 10 ms;
similar to the original voice channel, the voice signal of the internal channel can be simplified into the convolution of the glottal excitation and the internal transmission channel as shown in the formula;
namely:
Figure GDA0002722037920000081
x (n) is the speech signal output by the internal channel at time n, x1(n) is a glottis excitation signal at n moments, and contains fundamental wave periodic characteristics; x is the number of2And (n) is the impact response of the internal channel at the time n, and comprises the characteristic of the voice formant.
If the characteristics are extracted from the above formula directly and are complex, the relevant parameters can be solved by starting from the external characteristics of the voice signals, namely system functions, and the relevant parameters can be used as the basis of later-stage voice synthesis.
Equation iii is a time domain model of the speech signal generated by the internal channel. Like the original channel, the signal of the internal channel can be considered to be linearly stable in a short time, and a linear table of linear difference equations is satisfied. The system function of the frequency domain can still be expressed as follows.
Figure GDA0002722037920000091
In the formula: z denotes the frequency domain, and h (z) denotes the intra-channel system function.
G, representing the filter gain;
akexpressing coefficients of a linear constant coefficient difference equation, wherein k represents the kth time of the discrete time system;
p, representing the order of the system;
the invention adopts an inverse filtering method and an LMSE (least mean square error) criterion to establish a linear prediction equation set, and is based on an autocorrelation method and a Levinson-Durbin algorithm to carry out G, { akCarry out recursion calculation. The solving process is shown in fig. 2, where Rn represents the autocorrelation coefficient of the speech signal of the nth frame,
Figure GDA0002722037920000092
is akIs estimated. The algorithm is firstly windowed framing and then solvedThe auto-correlation coefficient of each frame is,
(2) channel mapping based on formant correction
1) According to the ISO Octave (8-degree interval) audio partition standard, the ISO Octave audio partition standard is a frequency band partition method of octaves; dividing the voice spectrum into 9 partitions, see table 1 specifically;
TABLE 1 Octave-based speech Spectrum segmentation
Figure GDA0002722037920000093
Figure GDA0002722037920000101
Considering the sampling frequency and the sampling point number of a frame, the obtained spectrum resolution is 50Hz, so the low frequency is reasonably combined, each frequency band of the medium and high frequency is provided with two bins, the total number of the 15 bins related to the frequency space is calculated, and the variable f is usedkRepresents;
2) and establishing a mapping vector. Is selected fk,Δfc,Δfw4 features of Δ A as variables for mapping the internal voice channel to the original voice channel frequency space, and the mapping vector is M (f)k,Δfc,Δfw,ΔA);
Wherein:
fkrepresenting a frequency space, counting 15 elements;
Δfcrepresenting the variation of the center frequency of the resonance peak, distinguishing positive and negative, taking the values as +1 and-1, and counting 2 elements;
Δfwrepresenting the variation of resonance peak Q value, and distinguishing the positive and negative values in the range of [ -4,4 []Every 0.25 is a gradient, counting 33 elements;
delta A represents the variation of the power of the formant and distinguishes positive and negative; determining one element every 2 dB; setting the difference between the maximum value and the minimum value to be 30dB, and calculating 16 elements;
in summary, vector M (f) is mappedk,Δfc,ΔfwDeltaA), 4-dimensional, 15840 features in totalA descriptor;
3) estimating the position of a resonance peak based on an LPC method;
4) statistical analysis, under the same condition, comparing the positions and shapes of the adjacent formants of the internal channel and the original channel, and mapping vector M (f)k,Δfc,ΔfwDelta A), making quantitative statistics on the difference, and enabling the result of each measurement and comparison to fall into a corresponding descriptor; taking the results of multiple comparison tests as a training sample set;
5) correcting formants and waveforms of the internal channels by using the statistical results, performing interpolation calculation, and correcting the voice frequency spectrum curve;
6) performing inverse IFFT to recover time domain signal output, and calculating a according to requirementkG, etc. (see fig. 2 for its definition), for LPC encoding.
(3) LPC based speech synthesis
As mentioned above, the LPC-based speech synthesis method is simple and practical; the invention utilizes the prediction error and the filter coefficient extracted by the characteristics to synthesize the voice signal by adopting a multi-order AR parameter model, has simple method, and can obtain good matching with the original voice signal without the participation of other parameters; the process of synthesizing a speech signal using an AR parametric model is as follows.
(ii) from the extracted p-order { akAnd GnConstructing a single-frame speech power spectrum based on an AR parameter model, as follows:
Figure GDA0002722037920000111
wherein n represents an nth frame speech signal; l represents the l frequency point of the voice signal, and k represents the k time; m represents the number of samples of a frame of speech signal.
And: a is0=1,
Figure GDA0002722037920000112
0≤l≤M-1,p<M。
In the above formula
Figure GDA0002722037920000113
And the power spectral density with the frequency w is represented as a vector, wherein w is 2 pi l/M, and M is the number of sampling points of one frame.
Calculating each denominator item corresponding to the value l by using STFT (short time Fourier transform), and finally obtaining a voice power spectrum of one frame;
and thirdly, calculating a voice time domain signal by IFFT (inverse Fourier transform) to complete the voice synthesis process.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the channel mapping method of the microphone built-in earphone is applied to installing the microphone in an independent cavity inside the earphone, and the influence of ambient noise on call voice is greatly reduced. However, although the signal-to-noise ratio of the internal channel is enhanced, the voice characteristics of the internal channel have a certain difference compared with the original channel, and a mapping model is established between the internal channel and the original channel to realize the correction of the internal channel.
2. The invention adopts the channel mapping based on the characteristic statistics, establishes a mapping model between the internal channel and the original channel, and realizes the correction of the internal channel.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (7)

1. A channel mapping method for a microphone built-in earphone is characterized in that: adopting channel mapping based on characteristic statistics, establishing a mapping model between an internal channel and an original channel, and realizing the correction of the internal channel, the specific steps are as follows:
(1) LPC-based feature extraction
The feature extraction is performed by using a system model as shown in formula I:
a system function:
Figure FDA0002722037910000011
in the formula: z represents the frequency domain, H (z) is the transfer function of the inner channel;
g, representing the filter gain;
akexpressing coefficients of a linear constant coefficient difference equation, wherein k represents the kth moment of the discrete time system;
p, representing the order of the system;
(2) channel mapping based on formant modification;
(3) LPC-based speech synthesis;
the channel mapping based on formant correction described in step (2) specifically includes the following steps:
dividing the voice frequency spectrum into 9 partitions according to the ISO Octave audio partition standard, wherein the resolution of the frequency spectrum is 50Hz, so that low frequencies are combined, two bins are arranged in each frequency band of medium and high frequencies, 15 bins related to frequency space are counted, and a variable f is usedkRepresents; the ISO Octave audio frequency partition standard is a frequency band partition method of octaves;
establishing mapping vector, selecting fk,Δfc,Δfw4 features of Δ A as variables for mapping the internal voice channel to the original voice channel frequency space, and the mapping vector is M (f)k,Δfc,Δfw,ΔA);
Wherein:
fkrepresenting a frequency space, counting 15 elements;
Δfcrepresenting the variation of the center frequency of the resonance peak, distinguishing positive and negative, taking the values as +1 and-1, and counting 2 elements;
Δfwrepresenting the variation of resonance peak Q value, and distinguishing the positive and negative values in the range of [ -4,4 []Every 0.25 is a gradient, counting 33 elements;
delta A represents the variation of the power of the formant and distinguishes positive and negative; determining one element every 2 dB; setting the difference between the maximum value and the minimum value to be 30dB, and calculating 16 elements;
in summary, vector M (f) is mappedk,Δfc,ΔfwΔ A) is4-dimensional, 15840 feature descriptors in total;
deducing the position of a resonance peak based on an LPC method;
fourthly, statistical analysis, under the same condition, comparing the position and the shape of the adjacent formants of the internal channel and the original channel, and mapping the vector M (f)k,Δfc,ΔfwDelta A), making quantitative statistics on the difference, and enabling the result of each measurement and comparison to fall into a corresponding descriptor; taking the results of multiple comparison tests as a training sample set;
fifthly, correcting the formants and the waveforms of the internal channels by using the statistical results, performing interpolation calculation, and correcting the voice frequency spectrum curve;
sixthly, performing inverse IFFT to recover the time domain signal output, and if necessary, performing a in the formula IkAnd G-related parameters, performing LPC encoding.
2. The method of claim 1, wherein the method comprises: the feature extraction in the step (1) is based on short-time frame-by-frame operation, the voice sampling frequency in the feature extraction is 22KHz, short-time signals of 20ms are used for forming a frame, a sliding hamming window is adopted to intercept voice signals for STFT conversion, and the window interval is 10 ms.
3. The method of claim 1, wherein the method comprises: in the step (1), a linear prediction equation set is established by adopting an inverse filtering method and a minimum mean square error criterion, and G, { a, { b, { c and c, {kCarry out recursion calculation.
4. The method of claim 3, wherein the channel mapping method comprises: the pair G, { a ] based on the autocorrelation method and the Levinson-Durbin algorithmkPerforming recursion calculation, and the steps are as follows: firstly, carrying out windowing truncation processing on a speech signal to form a speech frame, then calculating an autocorrelation coefficient Rn (i) of each frame signal, wherein i is 1,21,a2,...,apAnd a gain G.
5. The method of claim 1, wherein the method comprises: in the step (2), the voice spectrum is divided into 9 partitions, and the specific voice spectrum segmentation is as follows:
the lower limit of the frequency is 22Hz, 44Hz, 88Hz, 177Hz, 355Hz, 710Hz, 1420Hz, 2480Hz and 5680 Hz;
the geometric mean center frequency is divided into 31.5Hz, 63Hz, 125Hz, 250Hz, 500Hz, 1000Hz, 2000Hz, 4000Hz and 8000 Hz;
the upper frequency limit is 44Hz, 88Hz, 177Hz, 355Hz, 710Hz, 1420Hz, 2840Hz, 5680Hz and 9000 Hz.
6. The method of claim 1, wherein the method comprises: and (3) synthesizing the voice signal by adopting a multi-stage AR parameter model based on the voice synthesis of LPC.
7. The method of claim 6, wherein the channel mapping method comprises:
in the step (3), the multi-order AR parameter model synthesizes the voice signal, and the specific process is as follows:
1) from extracted p-order { akAnd GnConstructing a single-frame speech power spectrum based on the AR parameter model, as shown in the following formula II:
Figure FDA0002722037910000031
wherein n represents an nth frame speech signal; l represents the l frequency point of the voice signal, and k represents the k time; m represents the number of samples of a frame of speech signal;
and: a is0=1,
Figure FDA0002722037910000032
0≤l≤M-1,p<M;
In the above formula
Figure FDA0002722037910000033
Is a vector, representing a power spectral density with frequency w, w ═ 2 pi l/M;
2) calculating each denominator item corresponding to the l value by using STFT (space time Fourier transform), and finally obtaining a voice power spectrum of a frame;
3) and calculating a voice time domain signal by IFFT to complete the voice synthesis process.
CN201911183807.2A 2019-11-27 2019-11-27 Channel mapping method for built-in earphone of microphone Active CN110691296B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911183807.2A CN110691296B (en) 2019-11-27 2019-11-27 Channel mapping method for built-in earphone of microphone

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911183807.2A CN110691296B (en) 2019-11-27 2019-11-27 Channel mapping method for built-in earphone of microphone

Publications (2)

Publication Number Publication Date
CN110691296A CN110691296A (en) 2020-01-14
CN110691296B true CN110691296B (en) 2021-01-22

Family

ID=69117645

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911183807.2A Active CN110691296B (en) 2019-11-27 2019-11-27 Channel mapping method for built-in earphone of microphone

Country Status (1)

Country Link
CN (1) CN110691296B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN209562756U (en) * 2019-04-02 2019-10-29 东莞市荣泉音响制造有限公司 A kind of blue tooth voice interactive mode earphone of built-in microphone

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100789084B1 (en) * 2006-11-21 2007-12-26 한양대학교 산학협력단 Speech enhancement method by overweighting gain with nonlinear structure in wavelet packet transform
CN108986832B (en) * 2018-07-12 2020-12-15 北京大学深圳研究生院 Binaural voice dereverberation method and device based on voice occurrence probability and consistency

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN209562756U (en) * 2019-04-02 2019-10-29 东莞市荣泉音响制造有限公司 A kind of blue tooth voice interactive mode earphone of built-in microphone

Also Published As

Publication number Publication date
CN110691296A (en) 2020-01-14

Similar Documents

Publication Publication Date Title
EP1739657B1 (en) Speech signal enhancement
JP4764995B2 (en) Improve the quality of acoustic signals including noise
US6691083B1 (en) Wideband speech synthesis from a narrowband speech signal
CN103874002B (en) Apparatus for processing audio including tone artifacts reduction
US8521530B1 (en) System and method for enhancing a monaural audio signal
EP1252621B1 (en) System and method for modifying speech signals
US20070033020A1 (en) Estimation of noise in a speech signal
Kim et al. Nonlinear enhancement of onset for robust speech recognition.
Itoh et al. Environmental noise reduction based on speech/non-speech identification for hearing aids
CN110931034B (en) Pickup noise reduction method for built-in earphone of microphone
CN114141268A (en) Voice processing method, system, storage medium and terminal voice equipment
CN110691296B (en) Channel mapping method for built-in earphone of microphone
CN112669797A (en) Audio processing method and device, electronic equipment and storage medium
CN115604613B (en) Sound interference elimination method based on sound insulation box
CN113921007B (en) Method for improving far-field voice interaction performance and far-field voice interaction system
CN114566179A (en) Time delay controllable voice noise reduction method
CN112037759B (en) Anti-noise perception sensitivity curve establishment and voice synthesis method
JP2002258899A (en) Method and device for suppressing noise
CN110610724A (en) Voice endpoint detection method and device based on non-uniform sub-band separation variance
WO2022141364A1 (en) Audio generation method and system
Yang et al. Environment-Aware Reconfigurable Noise Suppression
Magill et al. Wide‐hand noise reduction of noisy speech
US20130226568A1 (en) Audio signals by estimations and use of human voice attributes
JPS61126600A (en) Sound wave input processing
Schmidt et al. Evaluation of a voiced‐unvoiced‐silence analysis method for telephone‐quality speech

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant