CN107958672A

CN107958672A - The method and apparatus for obtaining pitch waveform data

Info

Publication number: CN107958672A
Application number: CN201711337024.6A
Authority: CN
Inventors: 肖纯智
Original assignee: Guangzhou Kugou Computer Technology Co Ltd
Current assignee: Guangzhou Kugou Computer Technology Co Ltd
Priority date: 2017-12-12
Filing date: 2017-12-12
Publication date: 2018-04-24

Abstract

The disclosure is directed to a kind of method and apparatus for obtaining pitch waveform data, belong to Audiotechnica field.The described method includes：Pitch extraction is carried out to each audio frame in target audio, obtains the corresponding target frequency of each audio frame；For each audio frame, based on the corresponding target frequency of the audio frame, in the frequency spectrum data of the audio frame, corresponding target amplitude is determined；Based on the corresponding target amplitude of each audio frame and target frequency, the pitch waveform data of the target audio are determined.Using the disclosure, according to the pitch relation directly proportional to the vibration frequency of fundamental tone, the average frequency of fundamental tone in each frame audio is determined by the pitch of each frame audio, the average frequency for being then based on fundamental tone obtains the pitch waveform data of each frame audio, finally obtain the pitch waveform data of target audio, and then it can accurately obtain the fundamental tone Vibration Condition of target audio.

Description

The method and apparatus for obtaining pitch waveform data

Technical field

The disclosure is directed to Audiotechnica field, especially with respect to a kind of method and apparatus for obtaining pitch waveform data

Background technology

With people's the accelerating rhythm of life, sing have become people loosen mood common amusement and leisure mode it One, its tone can be adjusted using multimedia equipment for often singing user out of tune, to approach the mark of respective songs Quasi- pitch data.The standard pitch data of song are stored in usual multimedia equipment in advance, multimedia equipment can be based on should The tone for people's sound audio that standard pitch data sing the user collected is adjusted.

Sound is produced by vibration, including the vibration of fundamental tone and the vibration of overtone, and tone is determined by the vibration of fundamental tone 's.Therefore the key for changing tone is the fundamental tone for obtaining people's sound audio, according to the Vibration Condition of fundamental tone and standard pitch data Contrast, people's sound audio is adjusted, and then realize the change tone stationary tone color of people's sound audio.Therefore, the key of tone is become It is the Vibration Condition for accurately obtaining fundamental tone, in the prior art, time-domain filtering is carried out to audio usually using bandpass filter, its In, the frequency range of passband is arranged to the frequency range of the fundamental tone of general people's sound audio.

During the disclosure is realized, inventor has found to have at least the following problems：

The frequency fluctuation of one complete song its fundamental tone is larger, for example, the frequency in starting stage fundamental tone is relatively low, in Between stage climax stage fundamental tone frequency it is higher, in this way, the frequency range of the passband of bandpass filter must be provided with it is sufficiently wide, Can cover the frequency of all fundamental tones, but so also can covering part overtone frequency, so cannot accurately obtain shaking for fundamental tone Emotionally condition.

The content of the invention

In order to overcome problem present in correlation technique, present disclose provides it is a kind of obtain pitch waveform data method and Device.The technical solution is as follows：

According to the embodiment of the present disclosure, there is provided a kind of method for obtaining pitch waveform data, the described method includes：

Pitch extraction is carried out to each audio frame in target audio, obtains the corresponding target frequency of each audio frame；

For each audio frame, based on the corresponding target frequency of the audio frame, in the frequency spectrum data of the audio frame, Determine corresponding target amplitude；

Based on the corresponding target amplitude of each audio frame and target frequency, the pitch waveform number of the target audio is determined According to.

Optionally, it is described for each audio frame, based on the corresponding target frequency of the audio frame, in the audio frame In frequency spectrum data, corresponding target amplitude is determined, including：

To the audio waveform data of each audio frame, Fourier transformation is carried out respectively, obtains the spectrum number of each audio frame According to；

In the frequency spectrum data of each audio frame, the corresponding target amplitude of target frequency is determined.

Optionally, it is described to be based on the corresponding target amplitude of each audio frame and target frequency, determine the target audio Pitch waveform data, including：

In the frequency spectrum data of each audio frame, keep the corresponding target amplitude of target frequency constant, and by other frequencies Corresponding amplitude zero setting, obtains the frequency spectrum data after the adjustment of each audio frame；

Frequency spectrum data after adjustment to each audio frame carries out inverse Fourier transform, obtains the fundamental tone of the target audio Wave data.

Based on the corresponding target amplitude of each audio frame and target frequency, the frequency after the adjustment of each audio frame is generated respectively Modal data；

Optionally, the method further includes：

Pitch waveform data based on the target audio, store in advance with the corresponding standard pronunciation of the target audio High data, tone adjustment is carried out to the target audio.

According to the embodiment of the present disclosure, there is provided a kind of method of audio frequency process, the described method includes：

By each cycle corresponding frequency values in pitch waveform data described above, respectively with standard pitch data Corresponding standard frequency value is compared on time, if the absolute value of the difference of frequency values and standard frequency value is more than present count Value, then be adjusted the target audio in cycle where the frequency values.

According to the embodiment of the present disclosure, there is provided a kind of device for obtaining pitch waveform data, described device include：

Extraction module, for carrying out pitch extraction to each audio frame in target audio, it is corresponding to obtain each audio frame Target frequency；

First determining module, for for each audio frame, based on the corresponding target frequency of the audio frame, in the sound In the frequency spectrum data of frequency frame, corresponding target amplitude is determined；

Second determining module, for based on the corresponding target amplitude of each audio frame and target frequency, determining the target The pitch waveform data of audio.

Optionally, first determining module, is specifically used for：

Optionally, second determining module, is specifically used for：

Optionally, described device further includes：

Module is adjusted, for pitch waveform data based on the target audio, storing in advance with the target audio Corresponding standard pitch data, tone adjustment is carried out to the target audio.

According to the embodiment of the present disclosure, there is provided a kind of device of audio frequency process, described device include audio adjustment module, use In：

According to the embodiment of the present disclosure, there is provided a kind of terminal, the terminal include processor and memory, in the memory At least one instruction is stored with, described instruction is loaded by the processor and performed to realize acquisition pitch waveform described above The method of data.

According to the first aspect of the embodiment of the present disclosure, there is provided a kind of computer-readable recording medium, in the storage medium At least one instruction is stored with, described instruction is loaded by processor and performed to realize acquisition pitch waveform data described above Method.

The technical scheme provided by this disclosed embodiment can include the following benefits：

In the embodiment of the present disclosure, terminal such as multimedia equipment use the above method first to each audio frame in target audio Pitch extraction is carried out, obtains the corresponding target frequency of each audio frame；For each audio frame, based on the corresponding target of audio frame Frequency, in the frequency spectrum data of audio frame, determines corresponding target amplitude；Based on the corresponding target amplitude of each audio frame and mesh Frequency is marked, determines the pitch waveform data of target audio.It is this directly proportional to the vibration frequency of fundamental tone according to pitch, by each The pitch of frame audio determines the average frequency of fundamental tone in each frame audio, and the average frequency for being then based on fundamental tone obtains each frame sound The pitch waveform data of frequency, the method for finally obtaining the pitch waveform data of target audio, can accurately obtain target audio Fundamental tone Vibration Condition.

It should be appreciated that the general description and following detailed description of the above are only exemplary and explanatory, not The disclosure can be limited.

Brief description of the drawings

Attached drawing herein is merged in specification and forms the part of this specification, shows the implementation for meeting the disclosure Example, and be used to together with specification to explain the principle of the disclosure.In the accompanying drawings：

Fig. 1 is the flow chart according to a kind of method for the acquisition pitch waveform data for implementing to exemplify；

Fig. 2 is the schematic diagram according to a kind of device for the pitch waveform data for implementing to exemplify；

Fig. 3 is the schematic diagram according to a kind of device for the pitch waveform data for implementing to exemplify；

Fig. 4 is the structure diagram according to a kind of terminal for implementing to exemplify.

Pass through above-mentioned attached drawing, it has been shown that the clear and definite embodiment of the disclosure, will hereinafter be described in more detail.These attached drawings It is not intended to limit the scope of disclosure design by any mode with word description, but is by reference to specific embodiment Those skilled in the art illustrate the concept of the disclosure.

Embodiment

Here exemplary embodiment will be illustrated in detail, its example is illustrated in the accompanying drawings.Following description is related to During attached drawing, unless otherwise indicated, the same numbers in different attached drawings represent the same or similar key element.Following exemplary embodiment Described in embodiment do not represent all embodiments consistent with the disclosure.On the contrary, they be only with it is such as appended The example of the consistent apparatus and method of some aspects be described in detail in claims, the disclosure.

An embodiment of the present invention provides a kind of method for obtaining pitch waveform data, this method can be realized by terminal.Its In, terminal can be tablet computer, desktop computer, notebook etc..Terminal can include the portions such as processor, memory Part.Processor, can be CPU (Central Processing Unit, central processing unit) etc., can be used for target sound Each audio frame carries out pitch extraction in frequency, obtains the corresponding target frequency of each audio frame, waits processing.Memory, Ke Yiwei RAM (RandomAccess Memory, random access memory), Flash (flash memory) etc., can be used for storing data, treat Data generated in the data of Cheng Suoxu, processing procedure etc., such as audio.

Terminal can also include transceiver, input block, display unit, audio output part etc..Transceiver, can be used for Carry out data transmission with server, transceiver can include bluetooth component, WiFi (Wireless-Fidelity, Wireless Fidelity Technology) component, antenna, match circuit, modem etc..Input block can be touch-screen, keyboard, mouse etc..Audio is defeated It can be speaker, earphone etc. to go out component.

The embodiment of the present disclosure provides a kind of method for obtaining pitch waveform data, wherein, pitch waveform data that is to say The amplitude of fundamental tone and the data of time relationship.As shown in Figure 1, the process flow of this method can include the steps：

In a step 101, pitch extraction is carried out to each audio frame in target audio, obtains the corresponding mesh of each audio frame Mark frequency.

Wherein, target audio can be people's sound audio or accompaniment sound audio, the present embodiment are shown with people's sound audio Example.

Sound is typically that the different vibration of a series of frequencies for being sent by sounding body, amplitude is combined, these There is the minimum vibration of a frequency in vibration, the sound sent by it is exactly fundamental tone, remaining is overtone.Pitch refers to various different high The height of low sound, i.e. sound, is determined, both are proportional by the vibration frequency of fundamental tone.

In force, terminal-pair target audio carries out time-domain analysis, and target audio is cut into each audio frame, each audio The duration of frame is generally in 10ms between 30ms.Pitch extraction is carried out using pitch extraction algorithm to each audio frame, wherein, should Pitch is the average pitch of each audio frame, since the vibration frequency of pitch and fundamental tone is proportional, and then can be obtained every The corresponding target frequency of a audio frame, which is the average frequency of the fundamental tone of each audio frame.Wherein, common sound High extraction algorithm has auto-relativity function method, Cepstrum Method and the YIN algorithms for being combined auto-relativity function method with Cepstrum Method.

In a step 102, for each audio frame, based on the corresponding target frequency of audio frame, in the spectrum number of audio frame In, corresponding target amplitude is determined.

Optionally, after terminal determines the target frequency of each frame, it may further determine that the corresponding mesh of target frequency Amplitude is marked, corresponding processing can, to the audio waveform data of each audio frame, carry out Fourier transformation respectively, obtain every The frequency spectrum data of a audio frame；In the frequency spectrum data of each audio frame, the corresponding target amplitude of target frequency, the target are determined Amplitude that is to say the corresponding amplitude of average frequency of fundamental tone in each frame audio.

Wherein, audio waveform data is specifically converted into frequency spectrum data, used fourier formula is：

In force, first, will by above-mentioned Fourier's mode after terminal determines the target frequency of each audio frame Each audio frame is converted to the frequency domain data in short-term of each frame from time domain data, wherein, frequency spectrum data namely it is intended to indicate that amplitude With the data of frequency correspondence.Then, terminal determines the corresponding target of target frequency in the frequency spectrum data of each audio frame Amplitude, the target frequency of each frame audio are the average frequency of fundamental tone, its corresponding target amplitude is the amplitude of fundamental tone.

In step 103, based on the corresponding target amplitude of each audio frame and target frequency, the target audio is determined Pitch waveform data.

Wherein, pitch waveform data are the data for representing the amplitude of fundamental tone and the correspondence of time.

In force, target frequency and target amplitude of the terminal based on each audio frame, can further obtain target sound The pitch waveform data of frequency, the step for be also to each frame target audio carry out spectral filtering process, that is to say will be each The corresponding amplitude of frequency of fundamental tone in frame target audio remains, and the corresponding amplitude of the frequency of overtone is decayed to zero.Tool Body can have following two modes：

Wherein, need to use inverse Fourier transform during every frame pitch waveform data are obtained, its formula is：

Mode one, terminal keep the corresponding target amplitude of target frequency constant in the frequency spectrum data of each audio frame, and By the corresponding amplitude zero setting of other frequencies, the frequency spectrum data after the adjustment of each audio frame is obtained；Adjustment to each audio frame Frequency spectrum data afterwards carries out inverse Fourier transform, obtains the pitch waveform data of target audio.

In force, the corresponding amplitude zero setting of other frequencies that is to say terminal by the corresponding amplitude of these frequencies by terminal Zero is decayed to, in each frame audio, terminal and then obtains the corresponding target amplitude of target frequency, non-targeted frequency is corresponding to shake The frequency spectrum data that width is zero.Then terminal-pair comprises only the frequency spectrum data of target amplitude, using above-mentioned inverse Fourier transform, obtains To the Wave data for comprising only target frequency, it that is to say the pitch waveform data of each frame audio, finally obtain target audio Pitch waveform data.

Mode two, terminal are based on the corresponding target amplitude of each audio frame and target frequency, generate each audio frame respectively Adjustment after frequency spectrum data；Frequency spectrum data after adjustment to each audio frame carries out inverse Fourier transform, obtains target sound The pitch waveform data of frequency.

In force, for each frame audio, after terminal determines target frequency and target amplitude, target can be generated The corresponding amplitude of frequency is target amplitude, the frequency spectrum data that the corresponding amplitude of other frequencies is zero, then to the frequency spectrum data Above-mentioned inverse Fourier transform is recycled, can also obtain comprising only the Wave data of target frequency, that is to say each frame audio Pitch waveform data, finally obtain the pitch waveform data of target audio.

Although the mathematical processes of above two mode differ, but the result finally obtained is the same, and mode one is logical Cross the frequency spectrum data after the corresponding amplitude zero setting of non-targeted frequency in every frame frequency modal data is adjusted；And mode two is eventually After end determines target frequency and target amplitude, target frequency and target amplitude are extracted to the other frequencies for being zero with amplitude Rate generates the frequency spectrum data after adjustment together.As it can be seen that the frequency spectrum data after the two obtained adjustment is identical, then by Fourier The pitch waveform data that inverse transformation obtains are also identical.In this way, pitch waveform data of the terminal based on target audio can obtain this The vibration period of target audio and the start time point in each cycle and end time point etc..

Based on described above, terminal carries out pitch to each audio frame in target audio first using the above method and carries Take, obtain the corresponding target frequency of each audio frame, since pitch is directly proportional to the vibration frequency of fundamental tone, and then can determine every The average frequency of fundamental tone in one frame audio, is denoted as the target frequency of the audio frame；Then, each audio frame of terminal-pair, based on sound The corresponding target frequency of frequency frame, in the frequency spectrum data of audio frame, determines corresponding target amplitude；Finally, terminal is based on each The corresponding target amplitude of audio frame and target frequency, determine the pitch waveform data of target audio.It is this to pass through each frame audio Pitch determine the average frequency of fundamental tone in each frame audio, be then based on the average frequency of fundamental tone to each frame audio into line frequency Domain filters to obtain the pitch waveform data of each frame audio, the method for finally obtaining the pitch waveform data of target audio, can be with Accurately obtain the fundamental tone Vibration Condition of target audio.

Optionally, after terminal obtains the pitch waveform data of target audio using the above method, to above-mentioned target audio Carry out tone adjustment, corresponding processing can be, terminal can the pitch waveform data based on target audio, store in advance with The corresponding standard pitch data of target audio, tone adjustment is carried out to target audio.

Wherein, standard pitch data are stored in the form of note data in the terminal, and a note is usually by three data The pitch of the note is formed, the initial time of the pitch and end time, pitch are represented that each pitch is lasting by frequency values Duration is usually several seconds, such as 3 seconds etc..Each frame pitch waveform data in the pitch waveform data obtained by the above method Containing a kind of frequency, therefore each frame pitch waveform data all have periodically, may contain in a frame pitch waveform data Multiple periodic waveform data, in this way, the cycle duration in each frame pitch waveform data is usually several milliseconds.In this way, standard The cycle of multiple pitch waveform data can be covered in pitch data in the duration of each pitch, and then, in relatively fundamental tone ripple Graphic data is with that in standard pitch data, only need to compare the frequency values of the two in corresponding duration.

It is above-mentioned that the adjustment of target audio progress tone (can be somebody's turn to do with the algorithm for target audio become tone stationary tone color Algorithm is also known as Lent and becomes tone stationary tone color algorithm).In force, terminal is using tone stationary tone color algorithm is become, based on target Pitch waveform data, the standard pitch data of audio, carry out tone adjustment, wherein above-mentioned change tone stationary tone color to target audio Algorithm principle can be that corresponding frequency values of each cycle in pitch waveform data are corresponded into the time with standard pitch data Standard frequency value in section is compared, if the frequency values in a certain cycle and corresponding standard frequency value there are difference, The target audio in the cycle is adjusted, if the frequency values in a certain cycle are with corresponding standard frequency value, there is no poor Not, then tone adjustment is not carried out to the target audio in the cycle.Become the algorithm of tone stationary tone color to pitch waveform data and mark The specific comparison of quasi- pitch data can be：

For example, the start time point for a certain cycle of pitch waveform data is 15.050 seconds, end time point is 15.052 seconds, then the cycle corresponding frequency values are 500 hertz, the pitch frequencies in standard pitch data between 15 seconds to 16 seconds For ω₀, 500 hertz and ω₀It is compared.If 500 hertz and ω₀Difference absolute value within a preset range, wherein, this is pre- If scope is the scope close to zero, it may be considered that the frequency values in above-mentioned cycle and the pitch frequencies phase in standard pitch data Deng terminal is not adjusted the target audio in the cycle.If 500 hertz and ω₀Difference absolute value not in default model In enclosing, then terminal is adjusted the target audio in the cycle using the algorithm for becoming tone stationary tone color.

The scene in practical applications of the above method can be：

When user is sung using multimedia equipment, the microphone of multimedia equipment is by people's sound audio of collection, hair Give the processor of multimedia terminal.People's sound audio is divided into multiple audio frames by processor first, and to each audio frame profit Pitch extraction is carried out with pitch extraction algorithm, obtains the target frequency of each audio frame.Then, processor to each audio frame into Row Fourier transformation, is converted to frequency spectrum data, and the corresponding target amplitude of target frequency is determined in each frame frequency modal data.Most Afterwards, the processor in multimedia equipment determines the pitch waveform data of people's sound audio based on target frequency and target amplitude.More matchmakers After body determines the pitch waveform data of people's sound audio, based on pitch waveform data, the standard pitch data of above-mentioned song, to people The Wave data of sound audio adjusts accordingly, so that song and the song of above-mentioned song standard that multimedia equipment outwards exports Relatively.

For example, user is when singing " Qinghai-Tibet Platean " using multimedia equipment, the pitch ratio of " Qinghai-Tibet Platean " in climax parts Higher, user can sing up, this when, and multimedia equipment can be based on the above method, by the user collected in climax portion The tone data divided is adjusted, so that the song standard of comparison that user sings.

In another example for the user that gets out of tune of doing much singing, when being sung using multimedia equipment, multimedia equipment can Using the above method, the tone data of the user collected to be adjusted, so that the song standard of comparison that user sings.

The embodiment of the present disclosure additionally provides a kind of device for obtaining pitch waveform data, which can be above-described embodiment In terminal, as shown in Fig. 2, described device includes：

Extraction module 210, for carrying out pitch extraction to each audio frame in target audio, obtains each audio frame and corresponds to Target frequency；

First determining module 220, for for each audio frame, based on the corresponding target frequency of the audio frame, in institute In the frequency spectrum data for stating audio frame, corresponding target amplitude is determined；

Second determining module 230, for based on the corresponding target amplitude of each audio frame and target frequency, determining the mesh The pitch waveform data of mark with phonetic symbols frequency.

Optionally, first determining module 220, is specifically used for：

Optionally, second determining module 230, is specifically used for：

Optionally, as shown in figure 3, described device further includes：

Module 240 is adjusted, for pitch waveform data based on the target audio, storing in advance with the target sound Frequently corresponding standard pitch data, tone adjustment is carried out to the target audio.

In the embodiment of the present disclosure, terminal such as multimedia equipment use above device first to each audio frame in target audio Pitch extraction is carried out, obtains the corresponding target frequency of each audio frame；For each audio frame, based on the corresponding target of audio frame Frequency, in the frequency spectrum data of audio frame, determines corresponding target amplitude；Based on the corresponding target amplitude of each audio frame and mesh Frequency is marked, determines the pitch waveform data of target audio.It is this directly proportional to the vibration frequency of fundamental tone according to pitch, by each The pitch of frame audio determines the average frequency of fundamental tone in each frame audio, and the average frequency for being then based on fundamental tone obtains each frame sound The pitch waveform data of frequency, the method for finally obtaining the pitch waveform data of target audio, can accurately obtain target audio Fundamental tone Vibration Condition.

It should be noted that：The device for the acquisition pitch waveform data that above-described embodiment provides is obtaining pitch waveform data When, only with the division progress of above-mentioned each function module for example, in practical application, above-mentioned function can be divided as needed With by different function module completions, i.e., the internal structure of device is divided into different function modules, to complete above description All or part of function.In addition, the device for the acquisition pitch waveform data that above-described embodiment provides is with obtaining pitch waveform The embodiment of the method for data belongs to same design, its specific implementation process refers to embodiment of the method, and which is not described herein again.

A kind of device of audio frequency process is additionally provided according to the embodiment of the present disclosure, described device includes audio adjustment module, For：

In the embodiment of the present disclosure, terminal such as multimedia equipment is accurately obtained using the device of above-mentioned acquisition pitch waveform data After the fundamental tone Vibration Condition of target audio, terminal the pitch waveform data based on target audio, standard pitch data again, to mesh Mark with phonetic symbols frequency carries out tone adjustment, and then can make the tone of the tone of target audio and standard pitch data relatively.

It should be noted that：The device for the audio frequency process that above-described embodiment provides is when carrying out audio frequency process, only with above-mentioned The division progress of each function module, can be as needed and by above-mentioned function distribution by different for example, in practical application Function module is completed, i.e., the internal structure of device is divided into different function modules, with complete it is described above whole or Partial function.In addition, the device of audio frequency process and the embodiment of the method for audio frequency process that above-described embodiment provides belong to same structure Think, its specific implementation process refers to embodiment of the method, and which is not described herein again.

A kind of terminal is additionally provided according to the disclosure, the terminal includes processor and memory, deposited in the memory At least one instruction is contained, described instruction is loaded by the processor and performed to realize acquisition pitch waveform number described above According to method.

Fig. 4 shows the structure diagram for the terminal 400 that an illustrative embodiment of the invention provides.The terminal 400 can be with It is：Smart mobile phone, tablet computer, MP3 player (Moving Picture Experts Group Audio Layer III, Dynamic image expert's compression standard audio aspect 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert's compression standard audio aspect 4) player, laptop or desktop computer.Terminal 400 be also possible to by Referred to as other titles such as user equipment, portable terminal, laptop terminal, terminal console.

In general, terminal 400 includes：Processor 401 and memory 402.

Processor 401 can include one or more processing cores, such as 4 core processors, 8 core processors etc..Place Reason device 401 can use DSP (Digital Signal Processing, Digital Signal Processing), FPGA (Field- Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, may be programmed Logic array) at least one of example, in hardware realize.Processor 401 can also include primary processor and coprocessor, main Processor is the processor for being handled data in the awake state, also referred to as CPU (Central Processing Unit, central processing unit)；Coprocessor is the low power processor for being handled data in the standby state. In some embodiments, processor 401 can be integrated with GPU (Graphics Processing Unit, image processor), GPU is used to be responsible for rendering and drawing for content to be shown needed for display screen.In some embodiments, processor 401 can also wrap AI (Artificial Intelligence, artificial intelligence) processor is included, which is used to handle related machine learning Calculate operation.

Memory 402 can include one or more computer-readable recording mediums, which can To be non-transient.Memory 402 may also include high-speed random access memory, and nonvolatile memory, such as one Or multiple disk storage equipments, flash memory device.In certain embodiments, the non-transient computer in memory 402 can Read storage medium to be used to store at least one instruction, which is used for performed by processor 401 to realize this Shen Please in embodiment of the method provide XXXX methods.

In certain embodiments, terminal 400 is also optional includes：Peripheral interface 403 and at least one ancillary equipment. It can be connected between processor 401, memory 402 and peripheral interface 403 by bus or signal wire.Each ancillary equipment It can be connected by bus, signal wire or circuit board with peripheral interface 403.Specifically, ancillary equipment includes：Radio circuit 404th, at least one of touch display screen 405, camera 406, voicefrequency circuit 407, positioning component 408 and power supply 409.

Peripheral interface 403 can be used for I/O (Input/Output, input/output) is relevant at least one outer Peripheral equipment is connected to processor 401 and memory 402.In certain embodiments, processor 401, memory 402 and ancillary equipment Interface 403 is integrated on same chip or circuit board；In some other embodiments, processor 401, memory 402 and outer Any one or two in peripheral equipment interface 403 can realize on single chip or circuit board, the present embodiment to this not It is limited.

Radio circuit 404 is used to receive and launch RF (Radio Frequency, radio frequency) signal, also referred to as electromagnetic signal.Penetrate Frequency circuit 404 is communicated by electromagnetic signal with communication network and other communication equipments.Radio circuit 404 turns electric signal It is changed to electromagnetic signal to be transmitted, alternatively, the electromagnetic signal received is converted to electric signal.Alternatively, radio circuit 404 wraps Include：Antenna system, RF transceivers, one or more amplifiers, tuner, oscillator, digital signal processor, codec chip Group, user identity module card etc..Radio circuit 404 can be carried out by least one wireless communication protocol with other terminals Communication.The wireless communication protocol includes but not limited to：WWW, Metropolitan Area Network (MAN), Intranet, each third generation mobile communication network (2G, 3G, 4G and 5G), WLAN and/or WiFi (Wireless Fidelity, Wireless Fidelity) network.In certain embodiments, penetrate Frequency circuit 404 can also include NFC (Near Field Communicati^on, wireless near field communication) and related circuit, this Shen Please this is not limited.

Display screen 405 is used to show UI (User Interface, user interface).The UI can include figure, text, figure Mark, video and its their any combination.When display screen 405 is touch display screen, display screen 405 also there is collection to show The surface of screen 405 or the ability of the touch signal of surface.The touch signal can be inputted to processor as control signal 401 are handled.At this time, display screen 405 can be also used for providing virtual push button and/or dummy keyboard, also referred to as soft key and/or Soft keyboard.In certain embodiments, display screen 405 can be one, set the front panel of terminal 400；In other embodiments In, display screen 405 can be at least two, be separately positioned on the different surfaces of terminal 400 or in foldover design；In still other reality Apply in example, display screen 405 can be flexible display screen, be arranged on the curved surface of terminal 400 or on fold plane.Even, show Display screen 405 can also be arranged to non-rectangle irregular figure, namely abnormity screen.Display screen 405 can use LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) Prepared etc. material.

CCD camera assembly 406 is used to gather image or video.Alternatively, CCD camera assembly 406 include front camera and Rear camera.In general, front camera is arranged on the front panel of terminal, rear camera is arranged on the back side of terminal.One In a little embodiments, rear camera at least two, is main camera, depth of field camera, wide-angle camera, focal length shooting respectively Head in any one, with realize main camera and the depth of field camera fusion realize background blurring function, main camera and wide-angle Camera fusion realizes that pan-shot and VR (Virtual Reality, virtual reality) shooting functions or other fusions are clapped Camera shooting function.In certain embodiments, CCD camera assembly 406 can also include flash lamp.Flash lamp can be monochromatic warm flash lamp, It can also be double-colored temperature flash lamp.Double-colored temperature flash lamp refers to the combination of warm light flash lamp and cold light flash lamp, can be used for not With the light compensation under colour temperature.

Voicefrequency circuit 407 can include microphone and loudspeaker.Microphone is used for the sound wave for gathering user and environment, and will Sound wave, which is converted to electric signal and inputs to processor 401, to be handled, or input to radio circuit 404 to realize voice communication. For stereo collection or the purpose of noise reduction, microphone can be multiple, be separately positioned on the different parts of terminal 400.Mike Wind can also be array microphone or omnidirectional's collection type microphone.Loudspeaker is then used to that processor 401 or radio circuit will to be come from 404 electric signal is converted to sound wave.Loudspeaker can be traditional wafer speaker or piezoelectric ceramic loudspeaker.When When loudspeaker is piezoelectric ceramic loudspeaker, the audible sound wave of the mankind can be not only converted electrical signals to, can also be by telecommunications Sound wave that the mankind do not hear number is converted to carry out the purposes such as ranging.In certain embodiments, voicefrequency circuit 407 can also include Earphone jack.

Positioning component 408 is used for the current geographic position of positioning terminal 400, to realize navigation or LBS (Location Based Service, location Based service).Positioning component 408 can be the GPS (Global based on the U.S. Positioning System, global positioning system), China dipper system or Russia Galileo system positioning group Part.

Power supply 409 is used to be powered for the various components in terminal 400.Power supply 409 can be alternating current, direct current, Disposable battery or rechargeable battery.When power supply 409 includes rechargeable battery, which can be wired charging electricity Pond or wireless charging battery.Wired charging battery is the battery to be charged by Wireline, and wireless charging battery is by wireless The battery of coil charges.The rechargeable battery can be also used for supporting fast charge technology.

In certain embodiments, terminal 400 has further included one or more sensors 410.The one or more sensors 410 include but not limited to：Acceleration transducer 411, gyro sensor 412, pressure sensor 413, fingerprint sensor 414, Optical sensor 415 and proximity sensor 416.

The acceleration that acceleration transducer 411 can be detected in three reference axis of the coordinate system established with terminal 400 is big It is small.For example acceleration transducer 411 can be used for detecting component of the acceleration of gravity in three reference axis.Processor 401 can With the acceleration of gravity signal gathered according to acceleration transducer 411, control touch display screen 405 is regarded with transverse views or longitudinal direction Figure carries out the display of user interface.Acceleration transducer 411 can be also used for game or the collection of the exercise data of user.

Gyro sensor 412 can be with the body direction of detection terminal 400 and rotational angle, and gyro sensor 412 can To cooperate with collection user to act the 3D of terminal 400 with acceleration transducer 411.Processor 401 is according to gyro sensor 412 The data of collection, it is possible to achieve following function：When action induction (for example changing UI according to the tilt operation of user), shooting Image stabilization, game control and inertial navigation.

Pressure sensor 413 can be arranged on the side frame of terminal 400 and/or the lower floor of touch display screen 405.Work as pressure When sensor 413 is arranged on the side frame of terminal 400, gripping signal of the user to terminal 400 can be detected, by processor 401 The gripping signal gathered according to pressure sensor 413 carries out right-hand man's identification or prompt operation.When pressure sensor 413 is arranged on During the lower floor of touch display screen 405, the pressure operation by processor 401 according to user to touch display screen 405, is realized to UI circle Operability control on face is controlled.Operability control includes button control, scroll bar control, icon control, menu At least one of control.

Fingerprint sensor 414 is used for the fingerprint for gathering user, is collected by processor 401 according to fingerprint sensor 414 The identity of fingerprint recognition user, alternatively, by fingerprint sensor 414 according to the identity of the fingerprint recognition user collected.Identifying When the identity for going out user is trusted identity, the user is authorized to perform relevant sensitive operation, the sensitive operation bag by processor 401 Solution lock screen is included, encryption information is checked, downloads software, payment and change setting etc..Terminal can be set in fingerprint sensor 414 400 front, the back side or side.When being provided with physical button or manufacturer Logo in terminal 400, fingerprint sensor 414 can be with Integrated with physical button or manufacturer Logo.

Optical sensor 415 is used to gather ambient light intensity.In one embodiment, processor 401 can be according to optics The ambient light intensity that sensor 415 gathers, controls the display brightness of touch display screen 405.Specifically, when ambient light intensity is higher When, heighten the display brightness of touch display screen 405；When ambient light intensity is relatively low, the display for turning down touch display screen 405 is bright Degree.In another embodiment, the ambient light intensity that processor 401 can also be gathered according to optical sensor 415, dynamic adjust The acquisition parameters of CCD camera assembly 406.

Proximity sensor 416, also referred to as range sensor, are generally arranged at the front panel of terminal 400.Proximity sensor 416 The distance between front for gathering user and terminal 400.In one embodiment, when proximity sensor 416 detects use When the distance between family and the front of terminal 400 taper into, touch display screen 405 is controlled from bright screen state by processor 401 It is switched to breath screen state；When proximity sensor 416 detects that the distance between front of user and terminal 400 becomes larger, Touch display screen 405 is controlled to be switched to bright screen state from breath screen state by processor 401.

It will be understood by those skilled in the art that the restriction of the structure shown in Fig. 4 not structure paired terminal 400, can wrap Include than illustrating more or fewer components, either combine some components or arranged using different components.

The another embodiment of the disclosure provides a kind of non-transitorycomputer readable storage medium, when the storage medium In instruction by terminal processor perform when so that terminal is able to carry out：

Optionally, the method further includes：

Those skilled in the art will readily occur to the disclosure its after considering specification and putting into practice disclosure disclosed herein Its embodiment.This application is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or Person's adaptive change follows the general principle of the disclosure and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.Description and embodiments are considered only as exemplary, and the true scope and spirit of the disclosure are by above Claim is pointed out.

It should be appreciated that the present disclosure is not limited to the precise structures that have been described above and shown in the drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present disclosure is only limited by appended claim.

Claims

A kind of 1. method for obtaining pitch waveform data, it is characterised in that the described method includes：

Pitch extraction is carried out to each audio frame in target audio, obtains the corresponding target frequency of each audio frame；

For each audio frame, based on the corresponding target frequency of the audio frame, in the frequency spectrum data of the audio frame, determine Corresponding target amplitude；

Based on the corresponding target amplitude of each audio frame and target frequency, the pitch waveform data of the target audio are determined.
2. according to the method described in claim 1, it is characterized in that, described for each audio frame, based on the audio frame pair The target frequency answered, in the frequency spectrum data of the audio frame, determines corresponding target amplitude, including：

To the audio waveform data of each audio frame, Fourier transformation is carried out respectively, obtains the frequency spectrum data of each audio frame；

In the frequency spectrum data of each audio frame, the corresponding target amplitude of target frequency is determined.
3. according to the method described in claim 2, it is characterized in that, described be based on the corresponding target amplitude of each audio frame and mesh Frequency is marked, determines the pitch waveform data of the target audio, including：

In the frequency spectrum data of each audio frame, keep the corresponding target amplitude of target frequency constant, and other frequencies are corresponded to Amplitude zero setting, obtain the frequency spectrum data after the adjustment of each audio frame；

Frequency spectrum data after adjustment to each audio frame carries out inverse Fourier transform, obtains the pitch waveform of the target audio Data.
4. according to the method described in claim 2, it is characterized in that, described be based on the corresponding target amplitude of each audio frame and mesh Frequency is marked, determines the pitch waveform data of the target audio, including：

Based on the corresponding target amplitude of each audio frame and target frequency, the spectrum number after the adjustment of each audio frame is generated respectively According to；

Frequency spectrum data after adjustment to each audio frame carries out inverse Fourier transform, obtains the pitch waveform of the target audio Data.
5. according to claim 1-4 any one of them methods, it is characterised in that the method further includes：

Pitch waveform data based on the target audio, store in advance with the corresponding standard pitch number of the target audio According to target audio progress tone adjustment.
A kind of 6. method of audio frequency process, it is characterised in that the described method includes：

By each cycle corresponding frequency values in claim 1-5 any one of them pitch waveform data, respectively with standard pronunciation Corresponding standard frequency value is compared in time in high data, if the absolute value of the difference of frequency values and standard frequency value More than default value, then the target audio in cycle where the frequency values is adjusted.
7. a kind of device for obtaining pitch waveform data, it is characterised in that described device includes：

Extraction module, for carrying out pitch extraction to each audio frame in target audio, obtains the corresponding target of each audio frame Frequency；

First determining module, for for each audio frame, based on the corresponding target frequency of the audio frame, in the audio frame Frequency spectrum data in, determine corresponding target amplitude；

Second determining module, for based on the corresponding target amplitude of each audio frame and target frequency, determining the target audio Pitch waveform data.
8. device according to claim 7, it is characterised in that first determining module, is specifically used for：

To the audio waveform data of each audio frame, Fourier transformation is carried out respectively, obtains the frequency spectrum data of each audio frame；

In the frequency spectrum data of each audio frame, the corresponding target amplitude of target frequency is determined.
9. device according to claim 8, it is characterised in that second determining module, is specifically used for：

In the frequency spectrum data of each audio frame, keep the corresponding target amplitude of target frequency constant, and other frequencies are corresponded to Amplitude zero setting, obtain the frequency spectrum data after the adjustment of each audio frame；

Frequency spectrum data after adjustment to each audio frame carries out inverse Fourier transform, obtains the pitch waveform of the target audio Data.
10. device according to claim 8, it is characterised in that second determining module, is specifically used for：

Based on the corresponding target amplitude of each audio frame and target frequency, the spectrum number after the adjustment of each audio frame is generated respectively According to；

Frequency spectrum data after adjustment to each audio frame carries out inverse Fourier transform, obtains the pitch waveform of the target audio Data.
11. according to claim 7-10 any one of them devices, it is characterised in that described device further includes：

Adjust module, for the pitch waveform data based on the target audio, store in advance it is opposite with the target audio The standard pitch data answered, tone adjustment is carried out to the target audio.
12. a kind of device of audio frequency process, it is characterised in that described device includes audio adjustment module, is used for：

By each cycle corresponding frequency values in claim 7-11 any one of them pitch waveform data, respectively with standard pronunciation Corresponding standard frequency value is compared in time in high data, if the absolute value of the difference of frequency values and standard frequency value More than default value, then the target audio in cycle where the frequency values is adjusted.
13. a kind of terminal, it is characterised in that the terminal includes processor and memory, is stored with least in the memory One instruction, described instruction are loaded by the processor and performed to realize the acquisition fundamental tone as described in claim 1 to 5 is any The method of Wave data.
14. a kind of computer-readable recording medium, it is characterised in that at least one instruction, institute are stored with the storage medium Instruction is stated to be loaded by processor and performed to realize the method for the acquisition pitch waveform data as described in claim 1 to 5 is any.