CN107833581B - Method, device and readable storage medium for extracting fundamental tone frequency of sound - Google Patents

Method, device and readable storage medium for extracting fundamental tone frequency of sound Download PDF

Info

Publication number
CN107833581B
CN107833581B CN201710989739.3A CN201710989739A CN107833581B CN 107833581 B CN107833581 B CN 107833581B CN 201710989739 A CN201710989739 A CN 201710989739A CN 107833581 B CN107833581 B CN 107833581B
Authority
CN
China
Prior art keywords
frequency
detected
frequency point
point
variable
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710989739.3A
Other languages
Chinese (zh)
Other versions
CN107833581A (en
Inventor
劳振锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Kugou Computer Technology Co Ltd
Original Assignee
Guangzhou Kugou Computer Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Kugou Computer Technology Co Ltd filed Critical Guangzhou Kugou Computer Technology Co Ltd
Priority to CN201710989739.3A priority Critical patent/CN107833581B/en
Publication of CN107833581A publication Critical patent/CN107833581A/en
Application granted granted Critical
Publication of CN107833581B publication Critical patent/CN107833581B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The invention discloses a method and a device for extracting a pitch frequency of a voice and a readable storage medium. Firstly, acquiring a sound signal to be detected, and converting the sound signal to be detected from a time domain to a frequency domain through short-time Fourier transform; then determining the frequency band range of the sound signal to be detected from the frequency domain, and determining the maximum harmonic frequency of the sound signal to be detected according to the frequency band range; respectively carrying out energy intensity detection on each frequency point in the frequency band range, and determining the frequency point a with the maximum energy intensity according to an intensity detection result; and finally, judging whether the frequency point to be detected of the maximum value point exists or not according to the frequency point a and the maximum harmonic frequency, if so, judging that the frequency point to be detected may be the fundamental tone frequency of the sound signal to be detected or the harmonic component of the fundamental tone frequency, and finally extracting the fundamental tone frequency from the sound signal to be detected. The method for extracting the pitch frequency of the voice can realize higher accuracy by using lower algorithm complexity.

Description

Method, device and readable storage medium for extracting fundamental tone frequency of sound
Technical Field
The present invention relates to the field of audio signal technology, and in particular, to a method, an apparatus, and a readable storage medium for extracting a pitch frequency of a sound.
Background
The fundamental frequency is called fundamental frequency for short, when the sounding body sounds due to vibration, the sound can be generally decomposed into a plurality of pure sine waves, all natural sounds are basically composed of a plurality of sine waves with different frequencies, wherein the sine wave with the lowest frequency is the fundamental frequency, and the other sine waves with higher frequencies are harmonic waves. For example, the pitch frequency is a basic feature that can reflect the pitch of human voice, and it is generally determined whether the intonation of a singing person is correct, and the pitch is obtained by extracting the pitch frequency of human voice.
The existing pitch frequency detection methods include a time domain autocorrelation method, a frequency domain cepstrum calculation method, a frequency domain discrete wavelet transform method and the like, but the pitch frequency detection methods have the defects of complex algorithm, low detection accuracy and the like. The fundamental frequency detection method of the invention realizes higher accuracy rate with lower algorithm complexity.
Disclosure of Invention
The invention mainly aims to provide a method, a device and a readable storage medium for extracting pitch frequency of voice, and aims to solve the problems of higher algorithm complexity and lower detection precision of the existing pitch frequency detection method.
To achieve the above object, the present invention provides a method for extracting a pitch frequency of a voice, the method comprising the steps of:
acquiring a sound signal to be detected, and converting the sound signal to be detected from a time domain to a frequency domain through short-time Fourier transform;
determining the frequency band range of the sound signal to be detected from the frequency domain, and determining the maximum harmonic frequency of the sound signal to be detected according to the frequency band range;
respectively carrying out energy intensity detection on each frequency point in the frequency band range, and determining the frequency point a with the maximum energy intensity according to an intensity detection result;
and extracting fundamental tone frequency from the sound signal to be detected according to the frequency point a and the maximum harmonic frequency.
Preferably, the extracting the fundamental tone frequency from the sound signal to be detected according to the frequency point a and the maximum harmonic frequency specifically includes:
setting a variable n to the maximum harmonic number;
calculating a frequency point to be detected corresponding to the frequency point a according to the variable n;
judging whether each frequency point to be detected meets a first preset condition or not;
and when each frequency point to be detected does not meet the first preset condition, carrying out self-subtraction on the variable n by 1, and returning to the step of calculating the frequency point to be detected corresponding to the frequency point a according to the variable n until each frequency point to be detected meets the first preset condition, and taking the quotient of the frequency point a and the variable n as the fundamental tone frequency of the sound signal to be detected.
Preferably, the calculating the frequency point to be measured corresponding to the frequency point a according to the variable n specifically includes:
setting a variable m to 1;
calculating a frequency point f to be measured corresponding to the frequency point a according to a formula (1);
increasing the variable m by 1, calculating the frequency point to be measured corresponding to the frequency point a again according to the formula (1), and taking each calculated frequency point to be measured as the frequency point to be measured corresponding to the frequency point a when m is equal to n-1;
wherein the formula (1) is
Figure BDA0001440497290000021
Preferably, after the frequency point to be measured corresponding to the frequency point a is calculated according to the variable n, the method further includes:
rounding the frequency points to be detected to get the whole.
Preferably, after the self-decreasing the variable n by 1, the method further includes:
and when the variable n is 2 and each frequency point to be detected does not meet the first preset condition, taking the absolute frequency value of the frequency point a as the fundamental tone frequency of the sound signal to be detected.
Preferably, the determining whether each frequency point to be detected meets a first preset condition specifically includes:
comparing the absolute frequency values of the frequency points to be detected, and acquiring frequency domain energy corresponding to the frequency points to be detected when the comparison result meets a first preset state;
judging whether the frequency domain energy corresponding to each frequency point to be detected is a maximum value point or not;
when the frequency domain energy corresponding to each frequency point to be detected is a maximum value point, selecting the frequency point f with the minimum absolute frequency value from each frequency point to be detectedmin
Judging and the frequency point fminAnd whether the corresponding frequency domain energy is larger than a preset energy threshold value or not, if so, judging that each frequency point to be detected meets the first preset condition, and if not, judging that each frequency point to be detected does not meet the first preset condition.
Preferably, the first preset state is:
and the absolute frequency value of each frequency point to be detected is in an increasing state along with the increasing presentation of m, wherein the absolute frequency value of each frequency point to be detected is respectively smaller than the absolute frequency value of the frequency point a, and the absolute frequency value of each frequency point to be detected is larger than 1.
In addition, to achieve the above object, the present invention also provides an apparatus for extracting a pitch frequency of a voice, the apparatus comprising: a sound sensor for acquiring a sound signal to be detected, a memory, a processor and a pitch frequency program for extracting sound stored on said memory and executable on said processor, said pitch frequency program for extracting sound being configured to implement the steps of the method for extracting pitch frequency of sound as described above.
Furthermore, to achieve the above object, the present invention also proposes a readable storage medium having stored thereon a pitch frequency program for extracting a pitch of a voice, which when executed by a processor implements the steps of the method for extracting a pitch of a voice as described above.
Firstly, converting a to-be-detected sound signal from a time domain to a frequency domain by short-time Fourier transform by acquiring the to-be-detected sound signal; then determining the frequency band range of the sound signal to be detected from the frequency domain, and determining the maximum harmonic frequency of the sound signal to be detected according to the frequency band range; respectively carrying out energy intensity detection on each frequency point in the frequency band range, and determining the frequency point a with the maximum energy intensity according to an intensity detection result; and finally, extracting the fundamental tone frequency from the sound signal to be detected according to the frequency point a and the maximum harmonic frequency, thereby achieving the purpose of realizing the higher accuracy of extracting the fundamental tone frequency by using lower algorithm complexity.
Drawings
FIG. 1 is a schematic diagram of an apparatus in a hardware operating environment according to an embodiment of the present invention;
fig. 2 is a flowchart illustrating a first embodiment of a method for extracting a pitch frequency of a voice according to the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, fig. 1 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present invention.
As shown in fig. 1, the apparatus may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, an audio sensor 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may comprise a Display screen (Display), and the optional user interface 1003 may also comprise a standard wired interface, a wireless interface. The sound sensor 1004 is used to acquire a sound signal to be detected. The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the configuration of the device shown in fig. 1 is not intended to be limiting of the devices described herein and may include more or less components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a storage medium, may include therein an operating system, a sound signal acquiring module, a user interface module, and a pitch frequency program for extracting sounds.
The apparatus of the present invention calls, by the processor 1001, the pitch frequency program of the extracted voice stored in the memory 1005, and performs the following operations:
acquiring a sound signal to be detected, and converting the sound signal to be detected from a time domain to a frequency domain through short-time Fourier transform;
determining the frequency band range of the sound signal to be detected from the frequency domain, and determining the maximum harmonic frequency of the sound signal to be detected according to the frequency band range;
respectively carrying out energy intensity detection on each frequency point in the frequency band range, and determining the frequency point a with the maximum energy intensity according to an intensity detection result;
and extracting fundamental tone frequency from the sound signal to be detected according to the frequency point a and the maximum harmonic frequency.
Further, the processor 1001 may call the pitch frequency program of the extracted sounds stored in the memory 1005, and also perform the following operations:
setting a variable n to the maximum harmonic number;
calculating a frequency point to be detected corresponding to the frequency point a according to the variable n;
judging whether each frequency point to be detected meets a first preset condition or not;
and when each frequency point to be detected does not meet a first preset condition, carrying out self-subtraction on the variable n by 1, and returning to the step of calculating the frequency point to be detected corresponding to the frequency point a according to the variable n until each frequency point to be detected meets the first preset condition, and taking the quotient of the frequency point a and the variable n as the fundamental tone frequency of the sound signal to be detected.
Further, the processor 1001 may call the pitch frequency program of the extracted sounds stored in the memory 1005, and also perform the following operations:
setting a variable m to 1;
calculating a frequency point f to be measured corresponding to the frequency point a according to a formula (1);
increasing the variable m by 1, calculating the frequency point to be measured corresponding to the frequency point a again according to the formula (1), and taking each calculated frequency point to be measured as the frequency point to be measured corresponding to the frequency point a when m is equal to n-1;
wherein the formula (1) is
Figure BDA0001440497290000051
Further, the processor 1001 may call the pitch frequency program of the extracted sounds stored in the memory 1005, and also perform the following operations:
rounding the frequency points to be detected to get the whole.
Further, the processor 1001 may call the pitch frequency program of the extracted sounds stored in the memory 1005, and also perform the following operations:
and when the variable n is 2 and each frequency point to be detected does not meet a first preset condition, taking the absolute frequency values of the frequency points a and the frequency points a as the fundamental tone frequency of the sound signal to be detected.
Further, the processor 1001 may call the pitch frequency program of the extracted sounds stored in the memory 1005, and also perform the following operations:
comparing the absolute frequency values of the frequency points to be detected, and acquiring frequency domain energy corresponding to the frequency points to be detected when the comparison result meets a first preset state;
judging whether the frequency domain energy of each frequency point to be detected is a maximum value point or not;
when the frequency domain energy corresponding to each frequency point to be detected is a maximum value point, selecting the frequency point f with the minimum absolute frequency value from each frequency point to be detectedmin
Judging and the frequency point fminAnd whether the corresponding frequency domain energy is larger than a preset energy threshold value or not, if so, judging that each frequency point to be detected meets a first preset condition, and if not, judging that each frequency point to be detected does not meet the first preset condition.
The method comprises the steps of firstly, obtaining an audio signal to be detected, and converting the audio signal to be detected from a time domain to a frequency domain through short-time Fourier transform; then determining the frequency band range of the sound signal to be detected from the frequency domain, and determining the maximum harmonic frequency of the sound signal to be detected according to the frequency band range; respectively carrying out energy intensity detection on each frequency point in the frequency band range, and determining the frequency point a with the maximum energy intensity according to an intensity detection result; and finally, judging whether a frequency point to be detected of a maximum value point exists or not according to the frequency point a and the maximum harmonic frequency, if so, judging that the frequency point to be detected may be the fundamental tone frequency of the sound signal to be detected or the harmonic component of the fundamental tone frequency, and finally extracting the fundamental tone frequency from the sound signal to be detected.
Based on the hardware structure, the embodiment of the method for extracting the pitch frequency of the voice is provided.
Referring to fig. 2, fig. 2 is a flowchart illustrating a first embodiment of a method for extracting a pitch frequency of a voice according to the present invention.
In this embodiment, the method includes the steps of:
step S10: acquiring a sound signal to be detected, and converting the sound signal to be detected from a time domain to a frequency domain through short-time Fourier transform;
in addition, the present embodiment is described with a processor of the above apparatus as an execution subject;
in a specific implementation, in this embodiment, the to-be-detected sound signal is a digital audio signal obtained by taking 1024 points in steps as 512, that is, firstly, the obtained human sound signal is subjected to short-time fourier transform of 1024 points, so that an effective frequency value of 512 points can be obtained, and an index of each point corresponds to a frequency value. The human voice frequency band is typically 80-1200Hz, for example when the sampling rate of the audio signal is 44100Hz, the corresponding frequency bin index range is 2-27. In this embodiment, the sound signal to be detected is preferably converted from the time domain to the frequency domain by a short-time fourier transform, so that each frame of the sound signal to be detected is relatively stable.
Step S20: determining the frequency band range of the sound signal to be detected from the frequency domain, and determining the maximum harmonic frequency of the sound signal to be detected according to the frequency band range;
understandably, determining the frequency band range (namely frequency band) of the sound signal to be detected, and determining the maximum harmonic frequency of the sound signal to be detected according to the frequency band range; for example, the human voice frequency band is generally 80-1200HZ, an index range corresponding to the frequency band is determined according to the sampling rate of the audio signal, and the maximum allowable harmonic value (i.e. the maximum harmonic number) of the voice signal can be determined according to the index range. Since there is generally only 4 harmonics at most in the vocal index range, i.e. the maximum harmonic number is 4 in the present embodiment.
Step S30: respectively carrying out energy intensity detection on each frequency point in the frequency band range, and determining the frequency point a with the maximum energy intensity according to an intensity detection result;
in a specific implementation, a frequency point a corresponding to the maximum energy value is found in the human voice index range, and the frequency point is at least a fundamental frequency or one of harmonic components of the fundamental frequency. It can be understood that all natural sounds are basically composed of many sine waves with different frequencies, wherein the sine wave with the lowest frequency is the fundamental tone, and the other sine waves with higher frequencies are the harmonic waves; and respectively carrying out energy intensity detection on each frequency point in the frequency band range, determining the frequency point a with the maximum energy intensity according to the intensity detection result, and reducing the frequency point a to a search range which is closer to the final extracted fundamental tone true value, namely the frequency point is at least fundamental frequency or one harmonic component of the fundamental frequency.
Step S40: and extracting fundamental tone frequency from the sound signal to be detected according to the frequency point a and the maximum harmonic frequency.
It can be understood that, a frequency point a corresponding to the maximum energy value is found in the audio frequency band region to be detected, the frequency point a is assumed to be an n-th harmonic component of a gene frequency (i.e., the frequency point a is assumed to be a 4-th harmonic component of a fundamental tone frequency, and n is 4), and then whether a maximum value point (i.e., a wave peak or a wave trough of a waveform) exists in the region of 1 to n, 2 to n, … to n, and (n-1) to n of the frequency point a and meets a first preset condition is found, where it needs to be stated that, 1 to n of the frequency point a, 2 to n of the frequency point a, and (n-1) to n of the frequency point a are collectively referred to as the frequency point to be detected of the frequency point a; if the frequency point to be detected exists, when the frequency point to be detected is the frequency point of 1/n of the frequency point a, the frequency point to be detected is a real fundamental frequency (namely a fundamental tone frequency), and the frequency point a is an n-th harmonic of the fundamental frequency; otherwise, the frequency point a is assumed to be the n-1 harmonic of the fundamental frequency, and whether the fundamental frequency point can be found is judged in the same way, if the fundamental frequency point is not found until n is 2, the frequency point a is judged to be the real fundamental frequency.
In a specific implementation, the present embodiment preferably adopts a double-loop calculation manner to extract a pitch frequency from the sound signal to be detected. The loop with variable n is an outer loop calculation mode. And the step S40 can be divided into three sub-steps
The method comprises the following steps: setting a variable n to the maximum harmonic number;
step two: calculating the frequency points to be detected corresponding to the frequency point a according to the variable n, and judging whether each frequency point to be detected meets a first preset condition;
in a specific implementation, assuming that the frequency point a is an n-th harmonic component of a fundamental frequency, where n is a variable, and setting a loop initial value of the variable n as the maximum harmonic number; generally, only 4 harmonics are present in the range of the pitch index at most, i.e. the maximum harmonic number in this embodiment is 4, and it is assumed that the frequency point a is the 4 th harmonic component of the fundamental frequency.
It should be noted that, the determining whether each frequency point to be measured meets the first preset condition is an inner loop method using m as a variable value.
In a specific implementation, it is assumed that the frequency point a is a4 th harmonic component of the fundamental frequency (since the maximum harmonic frequency is 4 in this embodiment), and the value of m is 1, 2, and 3; then finding out a frequency point to be tested anm corresponding to the frequency point a, wherein when m is 1 and n is 4, the frequency point to be tested anm is denoted as a 41; when m is 2 and n is 4, the frequency point anm to be measured is denoted as a 42; when m is 3 and n is 4, the frequency point anm to be measured is denoted as a 43; calculating the absolute frequency value f of the frequency point to be measured according to the following formula (1):
Figure BDA0001440497290000081
preferably, in order to make the measurement result more accurate, the formula (1) is further optimized by the formula (2); the formula (2) is:
Figure BDA0001440497290000082
wherein
Figure BDA0001440497290000083
Is a pair of
Figure BDA0001440497290000084
Rounding is performed, and in this embodiment, when n is 4, m is 1, m is 2, and m is 3.
The frequency points to be measured obtained by calculation according to the formula (2) are respectively a41 ═ round (a/4), a42 ═ round (2 ═ a/4), a42 ═ round (3 ═ a/4), the absolute frequency values of the frequency points to be measured are compared, when the comparison result meets a first preset state (the first preset state is the state that the absolute frequency values of a plurality of frequency points to be measured of the frequency point a are increased along with the increasing presentation of m, wherein the absolute frequency values of the frequency points to be measured are respectively smaller than the absolute frequency value of the frequency point a, and are respectively greater than 1), and frequency domain energy s (a41), s (a42) and s (a43) corresponding to the frequency points to be measured are obtained; that is, the comparison result should satisfy a > a43> a42> a41, a41>1, a42>1, a43>1 (the fundamental frequency point should be in the human voice band region). Then, whether a41, a42 and a43 are maximum points is judged, that is, whether the frequency domain energies s (a41), s (a42) and s (a43) corresponding to the points satisfy the following model is judged:
Figure BDA0001440497290000091
if the frequency domain energy s (a41), s (a42) and s (a43) satisfy the above model, it is proved that a41, a42 and a43 are maximum points, and then the frequency point a can be predicted to be a harmonic of the fundamental frequency; selecting the frequency point f with the minimum absolute frequency value from all frequency points to be detectedminIf s (a41) is greater than the preset energy threshold, it may be determined that the frequency point a41 to be measured is the base frequency point.
And a third substep: and when each frequency point to be detected does not meet a first preset condition, carrying out self-subtraction on the variable n by 1, and returning to the step of calculating the frequency point to be detected corresponding to the frequency point a according to the variable n until each frequency point to be detected meets the first preset condition, and taking the quotient of the frequency point a and the n as the real fundamental tone frequency of the sound signal to be detected.
It can be understood that, in the third substep, if s (a41) is determined to be smaller than the preset energy threshold, the step of outer loop (with variable n) is performed, that is, it is continuously assumed that a is 3 th harmonic (n equals to 3), and similarly, it is determined whether the true fundamental frequency point can be found. If the base frequency point is not found until a is assumed to be 2 harmonic (n is 2), it is directly determined that a is the true base frequency point.
The method comprises the steps of firstly, obtaining an audio signal to be detected, and converting the audio signal to be detected from a time domain to a frequency domain through short-time Fourier transform; then determining the frequency band range of the sound signal to be detected from the frequency domain, and determining the maximum harmonic frequency of the sound signal to be detected according to the frequency band range; respectively carrying out energy intensity detection on each frequency point in the frequency band range, and determining the frequency point a with the maximum energy intensity according to an intensity detection result; and finally, judging whether a frequency point to be detected of a maximum value point exists or not according to the frequency point a and the maximum harmonic frequency, if so, judging that the frequency point to be detected may be the fundamental tone frequency of the sound signal to be detected or the harmonic component of the fundamental tone frequency, and finally extracting the fundamental tone frequency from the sound signal to be detected.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of software products stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk), and including instructions for causing a device (e.g., a mobile phone, a server, an air conditioner, or a network device) to perform the methods according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (8)

1. A method of extracting a pitch frequency of a sound, the method comprising:
acquiring a sound signal to be detected, and converting the sound signal to be detected from a time domain to a frequency domain through short-time Fourier transform;
determining the frequency band range of the sound signal to be detected from the frequency domain, and determining the maximum harmonic frequency of the sound signal to be detected according to the frequency band range;
respectively carrying out energy intensity detection on each frequency point in the frequency band range, and determining the frequency point a with the maximum energy intensity according to an intensity detection result;
extracting fundamental tone frequency from the sound signal to be detected according to the frequency point a and the maximum harmonic frequency;
wherein, the step of extracting the fundamental tone frequency from the sound signal to be detected according to the frequency point a and the maximum harmonic frequency comprises the following steps:
acquiring all frequency points to be detected corresponding to the frequency point a, wherein the frequency points to be detected are m which is n times of the frequency point a, and the value range of the variable m is [1, n-1 ];
judging whether a maximum value point meeting a first preset condition exists in the frequency points to be detected or not;
if so, taking the frequency point to be detected as the fundamental tone frequency when the frequency point to be detected is the frequency point which is 1/n of the frequency point a;
and if the variable n does not exist, subtracting one from the variable n, returning to the step of acquiring all the frequency points to be measured corresponding to the frequency point a, and if the pitch frequency is not acquired until n is 2, taking the frequency point a as the pitch frequency.
2. The method according to claim 1, wherein said extracting a pitch frequency from the audio signal to be detected according to the frequency point a and the maximum harmonic number specifically comprises:
setting a variable n to the maximum harmonic number;
calculating a frequency point to be detected corresponding to the frequency point a according to the variable n;
judging whether each frequency point to be detected meets a first preset condition or not;
when each frequency point to be detected does not meet the first preset condition, carrying out self-subtraction on the variable n by 1, and returning to the step of calculating the frequency point to be detected corresponding to the frequency point a according to the variable n until each frequency point to be detected meets the first preset condition, and taking the quotient of the frequency point a and the variable n as the fundamental tone frequency of the sound signal to be detected;
the step of calculating the frequency point to be measured corresponding to the frequency point a according to the variable n comprises the following steps:
calculating all frequency points to be measured corresponding to the frequency point a according to a formula (1), wherein the formula (1) is as follows:
f ═ a × m)/n, where the variable m has a value range of [1, n-1 ];
the first preset condition is that the absolute frequency value of each frequency point to be detected is in an increasing state along with the increasing of m, the frequency domain energy corresponding to each frequency point to be detected is a maximum value point, and the frequency domain energy corresponding to the frequency point with the minimum absolute frequency value in each frequency point to be detected is larger than a preset energy threshold value.
3. The method according to claim 2, wherein the calculating the frequency point to be measured corresponding to the frequency point a according to the variable n specifically comprises:
setting a variable m to 1;
calculating a frequency point f to be measured corresponding to the frequency point a according to a formula (1);
increasing the variable m by 1, calculating the frequency point to be measured corresponding to the frequency point a again according to the formula (1), and taking each calculated frequency point to be measured as the frequency point to be measured corresponding to the frequency point a when m is equal to n-1;
wherein the formula (1) is
Figure FDA0002902639360000021
After the frequency point to be measured corresponding to the frequency point a is calculated according to the variable n, the method further comprises the following steps:
rounding the frequency points to be detected to get the whole.
4. The method of claim 2, wherein after the self-decreasing the variable n by 1, the method further comprises:
and when the variable n is 2 and each frequency point to be detected does not meet the first preset condition, taking the absolute frequency value of the frequency point a as the fundamental tone frequency of the sound signal to be detected.
5. The method according to claim 3 or 4, wherein the determining whether each frequency point to be detected meets a first preset condition specifically comprises:
comparing the absolute frequency values of the frequency points to be detected, and acquiring frequency domain energy corresponding to the frequency points to be detected when the comparison result meets a first preset state;
judging whether the frequency domain energy corresponding to each frequency point to be detected is a maximum value point or not;
when the frequency domain energy corresponding to each frequency point to be detected is a maximum value point, selecting the frequency point f with the minimum absolute frequency value from each frequency point to be detectedmin
Determining the frequency of the signalPoint fminWhether the corresponding frequency domain energy is larger than a preset energy threshold value or not is judged, if yes, each frequency point to be detected is judged to meet the first preset condition, and if not, each frequency point to be detected is judged not to meet the first preset condition;
the first preset state is as follows: and the absolute frequency value of each frequency point to be measured presents an increasing state along with the increment of m.
6. The method according to claim 4, wherein the first preset state specifically comprises: and the absolute frequency value of each frequency point to be detected is smaller than that of the frequency point a, and the absolute frequency value of each frequency point to be detected is greater than 1.
7. An apparatus for extracting a pitch frequency of a sound, the apparatus comprising: a sound sensor for acquiring a sound signal to be detected, a memory, a processor and a program for extracting a pitch frequency of a sound stored on the memory and executable on the processor, the program for extracting a pitch frequency of a sound being configured to implement the steps of the method for extracting a pitch frequency of a sound according to any one of claims 1 to 6.
8. A readable storage medium having stored thereon a pitch frequency program based on extracted voices, the pitch frequency program when executed by a processor implementing the steps of the method of extracting pitch frequency of voices of any one of claims 1 to 6.
CN201710989739.3A 2017-10-20 2017-10-20 Method, device and readable storage medium for extracting fundamental tone frequency of sound Active CN107833581B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710989739.3A CN107833581B (en) 2017-10-20 2017-10-20 Method, device and readable storage medium for extracting fundamental tone frequency of sound

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710989739.3A CN107833581B (en) 2017-10-20 2017-10-20 Method, device and readable storage medium for extracting fundamental tone frequency of sound

Publications (2)

Publication Number Publication Date
CN107833581A CN107833581A (en) 2018-03-23
CN107833581B true CN107833581B (en) 2021-04-13

Family

ID=61648624

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710989739.3A Active CN107833581B (en) 2017-10-20 2017-10-20 Method, device and readable storage medium for extracting fundamental tone frequency of sound

Country Status (1)

Country Link
CN (1) CN107833581B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111292758B (en) * 2019-03-12 2022-10-25 展讯通信(上海)有限公司 Voice activity detection method and device and readable storage medium
CN110365371A (en) * 2019-07-05 2019-10-22 深圳市声临科技有限公司 The method and its system, electronic equipment that trigger signal realizes translation system control are provided based on bluetooth equipment
CN110176242A (en) * 2019-07-10 2019-08-27 广州荔支网络技术有限公司 A kind of recognition methods of tone color, device, computer equipment and storage medium
CN112532208B (en) * 2019-09-18 2024-04-05 惠州迪芬尼声学科技股份有限公司 Harmonic generator and method for generating harmonics
CN111292748B (en) * 2020-02-07 2023-07-28 普强时代(珠海横琴)信息技术有限公司 Voice input system adaptable to multiple frequencies
CN111354365B (en) * 2020-03-10 2023-10-31 苏宁云计算有限公司 Pure voice data sampling rate identification method, device and system
CN112086104B (en) * 2020-08-18 2022-04-29 珠海市杰理科技股份有限公司 Method and device for obtaining fundamental frequency of audio signal, electronic equipment and storage medium
CN113205827B (en) * 2021-05-05 2022-02-15 张茜 High-precision extraction method and device for baby voice fundamental frequency and computer equipment
US11545143B2 (en) * 2021-05-18 2023-01-03 Boris Fridman-Mintz Recognition or synthesis of human-uttered harmonic sounds
CN115416577B (en) * 2022-09-26 2024-06-21 东风汽车集团股份有限公司 Rear car horn warning method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR0128851B1 (en) * 1994-12-23 1998-10-01 양승택 Pitch detecting method by spectrum harmonics matching of variable length dual impulse having different polarity
CN1342968A (en) * 2000-09-13 2002-04-03 中国科学院自动化研究所 High-accuracy high-resolution base frequency extracting method for speech recognization
US20050149321A1 (en) * 2003-09-26 2005-07-07 Stmicroelectronics Asia Pacific Pte Ltd Pitch detection of speech signals
CN1912992A (en) * 2005-08-08 2007-02-14 中国科学院声学研究所 Voiced sound detection method based on harmonic characteristic
CN101556795A (en) * 2008-04-09 2009-10-14 展讯通信(上海)有限公司 Method and device for computing voice fundamental frequency
CN105551501A (en) * 2016-01-22 2016-05-04 大连民族大学 Harmonic signal fundamental frequency estimation algorithm and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR0128851B1 (en) * 1994-12-23 1998-10-01 양승택 Pitch detecting method by spectrum harmonics matching of variable length dual impulse having different polarity
CN1342968A (en) * 2000-09-13 2002-04-03 中国科学院自动化研究所 High-accuracy high-resolution base frequency extracting method for speech recognization
US20050149321A1 (en) * 2003-09-26 2005-07-07 Stmicroelectronics Asia Pacific Pte Ltd Pitch detection of speech signals
CN1912992A (en) * 2005-08-08 2007-02-14 中国科学院声学研究所 Voiced sound detection method based on harmonic characteristic
CN101556795A (en) * 2008-04-09 2009-10-14 展讯通信(上海)有限公司 Method and device for computing voice fundamental frequency
CN105551501A (en) * 2016-01-22 2016-05-04 大连民族大学 Harmonic signal fundamental frequency estimation algorithm and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"A comparative performance study of several pitch detection algorithms";L. Rabiner 等;《IEEE Transactions on Acoustics, Speech, and Signal Processing》;19761031;第24卷(第5期);第399-418页 *
"基音周期检测算法研究";王芸;《通信理论与信号处理新进展——2005年通信理论与信号处理年会论文集》;20050630;第377-381页 *

Also Published As

Publication number Publication date
CN107833581A (en) 2018-03-23

Similar Documents

Publication Publication Date Title
CN107833581B (en) Method, device and readable storage medium for extracting fundamental tone frequency of sound
EP2828856B1 (en) Audio classification using harmonicity estimation
JP5732994B2 (en) Music searching apparatus and method, program, and recording medium
US20210193149A1 (en) Method, apparatus and device for voiceprint recognition, and medium
US9451304B2 (en) Sound feature priority alignment
CN109256138B (en) Identity verification method, terminal device and computer readable storage medium
CN111028845A (en) Multi-audio recognition method, device, equipment and readable storage medium
US10249315B2 (en) Method and apparatus for detecting correctness of pitch period
CN110070884B (en) Audio starting point detection method and device
CN111696580A (en) Voice detection method and device, electronic equipment and storage medium
CN106024017A (en) Voice detection method and device
CN112017639A (en) Voice signal detection method, terminal device and storage medium
CN110070885B (en) Audio starting point detection method and device
CN112992190A (en) Audio signal processing method and device, electronic equipment and storage medium
JP2013205830A (en) Tonal component detection method, tonal component detection apparatus, and program
US8725498B1 (en) Mobile speech recognition with explicit tone features
CN110085214B (en) Audio starting point detection method and device
JP5377167B2 (en) Scream detection device and scream detection method
CN111640450A (en) Multi-person audio processing method, device, equipment and readable storage medium
KR20090098891A (en) Method and apparatus for robust speech activity detection
JP2015161718A (en) speech detection device, speech detection method and speech detection program
TWI585756B (en) Method and device for recognizing stuttered speech and computer program product
CN110875043B (en) Voiceprint recognition method and device, mobile terminal and computer readable storage medium
CN112397087A (en) Formant envelope estimation, voice processing method and device, storage medium and terminal
CN112216285A (en) Multi-person session detection method, system, mobile terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant