CN113409815B - Voice alignment method based on multi-source voice data - Google Patents

Voice alignment method based on multi-source voice data Download PDF

Info

Publication number
CN113409815B
CN113409815B CN202110591658.4A CN202110591658A CN113409815B CN 113409815 B CN113409815 B CN 113409815B CN 202110591658 A CN202110591658 A CN 202110591658A CN 113409815 B CN113409815 B CN 113409815B
Authority
CN
China
Prior art keywords
voice
data
voice data
frame
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110591658.4A
Other languages
Chinese (zh)
Other versions
CN113409815A (en
Inventor
李天洋
胡环环
朱保龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Qunyin Information Service Co ltd
Original Assignee
Hefei Qunyin Information Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Qunyin Information Service Co ltd filed Critical Hefei Qunyin Information Service Co ltd
Priority to CN202110591658.4A priority Critical patent/CN113409815B/en
Publication of CN113409815A publication Critical patent/CN113409815A/en
Application granted granted Critical
Publication of CN113409815B publication Critical patent/CN113409815B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a voice alignment method based on multi-source voice data, belongs to the field of voice processing, relates to a voice alignment technology, and is used for aligning a starting point through a voice alignment method, realizing alignment of voice data, and avoiding the problems that a manual alignment mode costs a large amount of time and processing efficiency and alignment accuracy is low; the method comprises the following steps: the voice data acquisition module is used for acquiring the voice data of the same sound source at different positions and sending the acquired voice data of the sound source to the voice processing module; processing the voice data sent by the voice acquisition modules through the voice processing module; the processed voice data are sent to a voice analysis module; performing voice alignment on the processed voice data through a voice analysis module; sending the aligned voice data to a voice combination module; and carrying out voice combination on the aligned voice data through a voice combination module.

Description

Voice alignment method based on multi-source voice data
Technical Field
The invention belongs to the field of voice processing, relates to a voice alignment technology, and particularly relates to a voice alignment method based on multi-source voice data.
Background
Generally, for the voice of the same speaker in the same recording scene, a plurality of roadbed devices are required to collect voice data, and the starting points of the voice data collected by different recording devices cannot be guaranteed to be completely consistent. Therefore, in order to ensure consistency of the collection starting points of the voice data collected by a plurality of recording devices, and in order to facilitate subsequent processing such as synthesis of the voice data, it is a technical problem how to align the voices.
In the prior art, the alignment operation is generally performed on the voice data in a manual manner. For example, when facing voice data of different collection starting points, technicians need to manually compare sound waves of the voice data and align the starting points to achieve alignment of the voice data. The processing method of manual alignment needs a lot of time, has low processing efficiency and alignment accuracy, and is not beneficial to processing voice data with large data volume.
Therefore, a voice alignment method based on multi-source voice data is provided.
Disclosure of Invention
The invention provides a voice alignment method based on multi-source voice data, which is used for aligning starting points through the voice alignment method, realizing alignment of voice data and avoiding the problems that a large amount of time and processing efficiency are consumed in a manual alignment mode and the alignment accuracy is low. The voice data acquisition module is used for acquiring the voice data of the same sound source at different positions and sending the acquired voice data of the sound source to the voice processing module; processing the voice data sent by the voice acquisition modules through the voice processing module; the processed voice data are sent to a voice analysis module; performing voice alignment on the processed voice data through a voice analysis module; sending the aligned voice data to a voice combination module; the voice analysis module carries out data arrangement on the data characteristic coefficient TZij of the acquired single-frame voice data according to different frame numbers and different voice acquisition modules, and the voice analysis module randomly selects the voice data acquired by one data acquisition module as reference voice data; dividing the data characteristic coefficient of the single-frame voice data by the data characteristic coefficient of the previous single-frame voice data, namely TZij/TZij-1; taking the obtained quotient as a comparison numerical value and marking the quotient as Dij; processing the rest single-frame voice data in the same way to obtain different contrast values; combining different contrast values into different number sequences, comparing Dij in different number sequences with Dij in a reference number sequence respectively, when the contrast values continuously exceed 10 bits and are consistent or the quotient of the contrast values is within (0.95-1.05), indicating that single-frame voice data can be adopted, and marking the adopted single-frame voice data as single-frame voice data to be aligned; and carrying out voice combination on the aligned voice data through a voice combination module.
The purpose of the invention can be realized by the following technical scheme:
a voice alignment method based on multi-source voice data comprises a voice alignment system based on the multi-source voice data, and the voice alignment system comprises a plurality of voice acquisition modules, a voice analysis module, a voice processing module and a voice combination module, wherein the voice acquisition modules are respectively positioned around a sound source and used for acquiring voice data of the same sound source at different positions and sending the acquired voice data of the sound source to the voice processing module;
the voice processing module is used for processing the voice data sent by the voice acquisition modules; the processed voice data are sent to a voice analysis module;
the voice analysis module is used for carrying out voice alignment on the processed voice data; sending the aligned voice data to a voice combination module;
and the voice combination module performs voice combination on the aligned voice data.
It should be noted that the voice acquisition module is specifically some devices with a recording function or a microphone; the voice acquisition modules are distributed around the sound source, have different spatial distances with the sound source and are defaulted to be the same equipment;
the voice acquisition modules send acquired voice data to the voice processing module;
the voice processing module numbers the voice acquisition modules and marks the number as i, wherein the i represents the number of the voice acquisition module; 1,2 … … n;
the voice processing module acquires the space linear distance between the voice acquisition module and the sound source, and marks the space linear distance between the voice acquisition module and the sound source as Li;
the voice processing module acquires voice data, processes the voice data into single-frame voice data, decodes and splits the single-frame voice data, acquires an amplitude value and a frequency value, and marks the amplitude value and the frequency value as Zfij and Plij respectively; where j denotes a number of a single frame of voice data, j is 1,2 … … m;
the voice processing module calculates the data characteristic coefficient TZij of the single-frame voice data by using a calculation formula, wherein the calculation formula is
Figure BDA0003089812720000031
Wherein c is a proportionality coefficient, and c is related to the timbre of the sound sourceClosing;
the voice processing module sends the calculated data characteristic coefficient TZij of the single-frame voice data to the voice analysis module;
the voice analysis module is used for analyzing the data characteristic coefficient TZij of the single-frame voice data, and the specific analysis process comprises the following steps:
the voice analysis module acquires a spatial linear distance Li between the voice acquisition module and a sound source; the voice analysis module acquires a data characteristic coefficient TZij of single-frame voice data;
the voice analysis module carries out data arrangement on the acquired data characteristic coefficient TZij of the single-frame voice data according to different frame numbers and different voice acquisition modules, and the arrangement form is as follows:
TZ11、TZ12、TZ13、TZ14、TZ15……TZ1m;
TZ21、TZ22、TZ23、TZ24、TZ25……TZ2m;
……
TZn1、TZn2、TZn3、TZn4、TZn5……TZnm;
it should be noted that, when different collected voice data are processed into single-frame voice data for different voice collection modules, the total amount of the single-frame voice data may be different, that is, the values of different voice collection modules m may be different;
the voice analysis module randomly selects voice data acquired by one of the data acquisition modules as reference voice data; dividing the data characteristic coefficient of the single-frame voice data by the data characteristic coefficient of the previous single-frame voice data, namely TZij/TZij-1; taking the obtained quotient as a comparison numerical value and marking the quotient as Dij;
processing the rest single-frame voice data in the same way to obtain different contrast values;
combining different contrast values into different number sequences, namely a reference number sequence, a number sequence 1 and a number sequence 2 … … number sequence n-1;
d11, D12, D13, D14, D15 … … D1 m-1; (reference series)
D21, D22, D23, D24, D25 … … D2 m-1; (array 1)
……
Dn1, Dn2, Dn3, Dn4, Dn5 … … Dnm-1; (array n-1)
Comparing Dij in the number sequence 1 and the number sequence 2 … …, the number sequence n-1 with Dij in the reference number sequence respectively, when the contrast value continuously exceeding 10 bits is consistent or the quotient of the contrast value is within (0.95-1.05), indicating that the single-frame voice data can be adopted, and marking the adopted single-frame voice data as the single-frame voice data to be aligned;
the voice analysis module sends the single-frame voice data to be aligned to the voice combination module; the voice combination module obtains a first comparison value continuously exceeding 10-bit comparison values and being consistent or the quotient of the comparison values being within (0.95-1.05), further obtains the position of corresponding single-frame voice data, takes the single-frame voice data as an alignment standard, starts to carry out voice combination one by one from the single-frame voice data, and finally completes voice alignment.
Compared with the prior art, the invention has the beneficial effects that:
1. the voice acquisition module is specifically equipment with a recording function or a microphone; the voice acquisition modules are distributed around the sound source, have different spatial distances with the sound source and are defaulted to be the same equipment; the consistency of the voice data of the sound source is guaranteed, inaccuracy of later-stage voice alignment caused by different acquisition devices is avoided, and accuracy of voice alignment is improved.
2. The voice processing module acquires voice data, processes the voice data into single-frame voice data, decodes and splits the single-frame voice data to acquire an amplitude value and a frequency value, and respectively marks the amplitude value and the frequency value as Zfij and Plij; the voice processing module calculates the data characteristic coefficient TZij of the single-frame voice data by using a calculation formula, wherein the calculation formula is
Figure BDA0003089812720000051
c is related to the timbre of the sound source; the voice processing module sends the calculated data characteristic coefficient TZij of the single-frame voice data to voice analysisA module; by processing the speech data, late stage speech alignment is facilitated.
3. The voice analysis module of the invention arbitrarily selects the voice data collected by one data collection module as the reference voice data; dividing the data characteristic coefficient of the single-frame voice data by the data characteristic coefficient of the previous single-frame voice data, namely TZij/TZij-1; taking the obtained quotient as a comparison numerical value and marking the quotient as Dij; processing the rest single-frame voice data in the same way to obtain different contrast values; combining different contrast values into different number sequences, namely a reference number sequence, a number sequence 1 and a number sequence 2 … … number sequence n-1;
d11, D12, D13, D14, D15 … … D1 m-1; (reference series)
D21, D22, D23, D24, D25 … … D2 m-1; (array 1)
……
Dn1, Dn2, Dn3, Dn4, Dn5 … … Dnm-1; (array n-1)
Comparing Dij in the sequence 1 and the sequence 2 … … and the sequence n-1 with Dij in the reference sequence respectively, when the contrast value continuously exceeding 10 bits is consistent or the quotient of the contrast value is within (0.95-1.05), indicating that the single frame voice data can be adopted, and marking the adopted single frame voice data as the single frame voice data to be aligned. And the alignment of the voice is realized in an array mode.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flow chart of a speech alignment method based on multi-source speech data according to the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, a voice alignment method based on multi-source voice data includes a voice alignment system based on multi-source voice data, including a plurality of voice acquisition modules, a plurality of voice analysis modules, a plurality of voice processing modules and a voice combination module, where the voice acquisition modules are respectively located around a sound source, and the voice acquisition modules are configured to acquire voice data of different positions of the same sound source and send the acquired voice data of the sound source to the voice processing module;
the voice processing module is used for processing the voice data sent by the voice acquisition modules; the processed voice data are sent to a voice analysis module;
the voice analysis module is used for carrying out voice alignment on the processed voice data; sending the aligned voice data to a voice combination module;
and the voice combination module performs voice combination on the aligned voice data.
It should be noted that the voice acquisition module is specifically some devices with a recording function or a microphone; the voice acquisition modules are distributed around the sound source, have different spatial distances with the sound source and are defaulted to be the same equipment;
the voice acquisition modules send acquired voice data to the voice processing module;
the voice processing module numbers the voice acquisition modules and marks the number as i, wherein the i represents the number of the voice acquisition module; 1,2 … … n;
the voice processing module acquires the space linear distance between the voice acquisition module and the sound source, and marks the space linear distance between the voice acquisition module and the sound source as Li;
the voice processing module acquires voice data, processes the voice data into single-frame voice data, decodes and splits the single-frame voice data, acquires an amplitude value and a frequency value, and marks the amplitude value and the frequency value as Zfij and Plij respectively; where j denotes a number of a single frame of voice data, j is 1,2 … … m;
the voice processing module calculates the data characteristic coefficient TZij of the single-frame voice data by using a calculation formula, wherein the calculation formula is
Figure BDA0003089812720000071
Wherein c is a proportionality coefficient, and c is related to the timbre of the sound source;
the voice processing module sends the calculated data characteristic coefficient TZij of the single-frame voice data to the voice analysis module;
the voice analysis module is used for analyzing the data characteristic coefficient TZij of the single-frame voice data, and the specific analysis process comprises the following steps:
the voice analysis module acquires a spatial linear distance Li between the voice acquisition module and a sound source; the voice analysis module acquires a data characteristic coefficient TZij of single-frame voice data;
the voice analysis module carries out data arrangement on the acquired data characteristic coefficient TZij of the single-frame voice data according to different frame numbers and different voice acquisition modules, and the arrangement form is as follows:
TZ11、TZ12、TZ13、TZ14、TZ15……TZ1m;
TZ21、TZ22、TZ23、TZ24、TZ25……TZ2m;
……
TZn1、TZn2、TZn3、TZn4、TZn5……TZnm;
it should be noted that, when different collected voice data are processed into single-frame voice data for different voice collection modules, the total amount of the single-frame voice data may be different, that is, the values of different voice collection modules m may be different;
the voice analysis module randomly selects voice data acquired by one of the data acquisition modules as reference voice data; dividing the data characteristic coefficient of the single-frame voice data by the data characteristic coefficient of the previous single-frame voice data, namely TZij/TZij-1; taking the obtained quotient as a comparison numerical value and marking the quotient as Dij;
processing the rest single-frame voice data in the same way to obtain different contrast values;
combining different contrast values into different number sequences, namely a reference number sequence, a number sequence 1 and a number sequence 2 … … number sequence n-1;
d11, D12, D13, D14, D15 … … D1 m-1; (reference series)
D21, D22, D23, D24, D25 … … D2 m-1; (array 1)
……
Dn1, Dn2, Dn3, Dn4, Dn5 … … Dnm-1; (array n-1)
Comparing Dij in the number sequence 1 and the number sequence 2 … …, the number sequence n-1 with Dij in the reference number sequence respectively, when the contrast value continuously exceeding 10 bits is consistent or the quotient of the contrast value is within (0.95-1.05), indicating that the single-frame voice data can be adopted, and marking the adopted single-frame voice data as the single-frame voice data to be aligned;
the voice analysis module sends the single-frame voice data to be aligned to the voice combination module; the voice combination module obtains a first comparison value continuously exceeding 10-bit comparison values and being consistent or the quotient of the comparison values being within (0.95-1.05), further obtains the position of corresponding single-frame voice data, takes the single-frame voice data as an alignment standard, starts to carry out voice combination one by one from the single-frame voice data, and finally completes voice alignment.
The above formulas are all calculated by removing dimensions and taking numerical values thereof, the formula is a formula which is obtained by acquiring a large amount of data and performing software simulation to obtain the closest real situation, and the preset parameters and the preset threshold value in the formula are set by the technical personnel in the field according to the actual situation or obtained by simulating a large amount of data.
The working principle of the invention is as follows: the voice data acquisition module is used for acquiring the voice data of the same sound source at different positions and sending the acquired voice data of the sound source to the voice processing module; processing the voice data sent by the voice acquisition modules through the voice processing module; the processed voice data are sent to a voice analysis module; performing voice alignment on the processed voice data through a voice analysis module; sending the aligned voice data to a voice combination module; the voice analysis module carries out data arrangement on the data characteristic coefficient TZij of the acquired single-frame voice data according to different frame numbers and different voice acquisition modules, and the voice analysis module randomly selects the voice data acquired by one data acquisition module as reference voice data; dividing the data characteristic coefficient of the single-frame voice data by the data characteristic coefficient of the previous single-frame voice data, namely TZij/TZij-1; taking the obtained quotient as a comparison numerical value and marking the quotient as Dij; processing the rest single-frame voice data in the same way to obtain different contrast values; combining different contrast values into different number sequences, comparing Dij in different number sequences with Dij in a reference number sequence respectively, when the contrast values continuously exceed 10 bits and are consistent or the quotient of the contrast values is within (0.95-1.05), indicating that single-frame voice data can be adopted, and marking the adopted single-frame voice data as single-frame voice data to be aligned; and carrying out voice combination on the aligned voice data through a voice combination module.
In the embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and there may be other divisions when the actual implementation is performed; the modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the method of the embodiment.
It will also be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above examples are only intended to illustrate the technical process of the present invention and not to limit the same, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical process of the present invention without departing from the spirit and scope of the technical process of the present invention.

Claims (4)

1. A voice alignment method based on multi-source voice data is characterized by comprising the following steps:
the method comprises the following steps: the voice data acquisition module is used for acquiring the voice data of the same sound source at different positions and sending the acquired voice data of the sound source to the voice processing module;
step two: processing the voice data sent by the voice acquisition modules through the voice processing module; the processed voice data are sent to a voice analysis module;
the voice processing module numbers the voice acquisition modules and marks the number as i, wherein the i represents the number of the voice acquisition module; i =1,2 … … n;
the voice processing module acquires the space linear distance between the voice acquisition module and the sound source, and marks the space linear distance between the voice acquisition module and the sound source as Li;
the voice processing module acquires voice data, processes the voice data into single-frame voice data, decodes and splits the single-frame voice data, acquires an amplitude value and a frequency value, and marks the amplitude value and the frequency value as Zfij and Plij respectively; where j denotes the number of digits of a single frame of speech data, j =1,2 … … m;
the voice processing module calculates the data characteristic coefficient TZij of the single-frame voice data by using a calculation formula, wherein the calculation formula is
Figure DEST_PATH_IMAGE002
Wherein c is a proportionality coefficient, c being related to the timbre of the sound source;
the voice processing module sends the calculated data characteristic coefficient TZij of the single-frame voice data to the voice analysis module;
step three: performing voice alignment on the processed voice data through a voice analysis module; sending the aligned voice data to a voice combination module;
the voice analysis module carries out data arrangement on the data characteristic coefficient TZij of the acquired single-frame voice data according to different frame numbers and different voice acquisition modules, and the voice analysis module randomly selects the voice data acquired by one data acquisition module as reference voice data; dividing the data characteristic coefficient of the single-frame voice data by the data characteristic coefficient of the previous single-frame voice data, namely TZij/TZij-1; taking the obtained quotient as a comparison numerical value and marking the quotient as Dij;
processing the rest single-frame voice data in the same way to obtain different contrast values;
combining different contrast values into different number sequences, comparing Dij in different number sequences with Dij in a reference number sequence respectively, when the contrast values continuously exceed 10 bits and are consistent or the quotient of the contrast values is within (0.95-1.05), indicating that single-frame voice data can be adopted, and marking the adopted single-frame voice data as single-frame voice data to be aligned;
step four: and carrying out voice combination on the aligned voice data through a voice combination module.
2. The method according to claim 1, wherein the voice acquisition module is specifically a device with a recording function; the voice acquisition modules are distributed around the sound source and have different spatial distances with the sound source.
3. The method according to claim 1, wherein for different speech acquisition modules, when different collected speech data are processed into single-frame speech data, the total amount of the single-frame speech data is different, that is, the values of different speech acquisition modules m are different.
4. The method of claim 1, wherein the speech combination module obtains a first contrast value that is continuously more than 10 contrast values consistent or a quotient of the contrast values is within (0.95-1.05), and further obtains a position of a corresponding single frame of speech data, and uses the single frame of speech data as an alignment standard, and performs speech combination one by one from the single frame of speech data, and finally completes speech alignment.
CN202110591658.4A 2021-05-28 2021-05-28 Voice alignment method based on multi-source voice data Active CN113409815B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110591658.4A CN113409815B (en) 2021-05-28 2021-05-28 Voice alignment method based on multi-source voice data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110591658.4A CN113409815B (en) 2021-05-28 2021-05-28 Voice alignment method based on multi-source voice data

Publications (2)

Publication Number Publication Date
CN113409815A CN113409815A (en) 2021-09-17
CN113409815B true CN113409815B (en) 2022-02-11

Family

ID=77674998

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110591658.4A Active CN113409815B (en) 2021-05-28 2021-05-28 Voice alignment method based on multi-source voice data

Country Status (1)

Country Link
CN (1) CN113409815B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107657947A (en) * 2017-09-20 2018-02-02 百度在线网络技术(北京)有限公司 Method of speech processing and its device based on artificial intelligence
CN111276156A (en) * 2020-01-20 2020-06-12 深圳市数字星河科技有限公司 Real-time voice stream monitoring method
CN111383658A (en) * 2018-12-29 2020-07-07 广州市百果园信息技术有限公司 Method and device for aligning audio signals

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7085717B2 (en) * 2002-05-21 2006-08-01 Thinkengine Networks, Inc. Scoring and re-scoring dynamic time warping of speech
CN105989846B (en) * 2015-06-12 2020-01-17 乐融致新电子科技(天津)有限公司 Multichannel voice signal synchronization method and device
US9697849B1 (en) * 2016-07-25 2017-07-04 Gopro, Inc. Systems and methods for audio based synchronization using energy vectors
CN108682436B (en) * 2018-05-11 2020-06-23 北京海天瑞声科技股份有限公司 Voice alignment method and device
EP3573059B1 (en) * 2018-05-25 2021-03-31 Dolby Laboratories Licensing Corporation Dialogue enhancement based on synthesized speech
CN109192223B (en) * 2018-09-20 2020-10-27 广州酷狗计算机科技有限公司 Audio alignment method and device
CN211628033U (en) * 2019-07-15 2020-10-02 兰州工业学院 Container anti-drop monitoring and transmission system
US11545132B2 (en) * 2019-08-28 2023-01-03 International Business Machines Corporation Speech characterization using a synthesized reference audio signal

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107657947A (en) * 2017-09-20 2018-02-02 百度在线网络技术(北京)有限公司 Method of speech processing and its device based on artificial intelligence
CN111383658A (en) * 2018-12-29 2020-07-07 广州市百果园信息技术有限公司 Method and device for aligning audio signals
CN111276156A (en) * 2020-01-20 2020-06-12 深圳市数字星河科技有限公司 Real-time voice stream monitoring method

Also Published As

Publication number Publication date
CN113409815A (en) 2021-09-17

Similar Documents

Publication Publication Date Title
ES2774018T3 (en) Method and system for evaluating the sound quality of a human voice
CN109599093B (en) Intelligent quality inspection keyword detection method, device and equipment and readable storage medium
JP2022552449A (en) Methods and apparatus for inspecting wind turbine blades, and equipment and storage media therefor
CN105469807B (en) A kind of more fundamental frequency extracting methods and device
CN106375780A (en) Method and apparatus for generating multimedia file
CN106571146A (en) Noise signal determining method, and voice de-noising method and apparatus
CN104240712A (en) Three-dimensional audio multichannel grouping and clustering coding method and three-dimensional audio multichannel grouping and clustering coding system
CN115358718A (en) Noise pollution classification and real-time supervision method based on intelligent monitoring front end
CN117095694B (en) Bird song recognition method based on tag hierarchical structure attribute relationship
CN103730112A (en) Multi-channel voice simulation and acquisition method
CN111508524A (en) Method and system for identifying voice source equipment
CN113409815B (en) Voice alignment method based on multi-source voice data
CN102184733B (en) Audio attention-based audio quality evaluation system and method
CN109088793B (en) Method and apparatus for detecting network failure
CN109299312B (en) Music rhythm analysis method based on big data
CN113270110A (en) ZPW-2000A track circuit transmitter and receiver fault diagnosis method
CN116543739A (en) EMD-based power equipment noise control method
CN111179972A (en) Human voice detection algorithm based on deep learning
CN108010533A (en) The automatic identifying method and device of voice data code check
CN108271017A (en) The audio loudness measuring system and method for digital broadcast television
CN108769874B (en) Method and device for separating audio in real time
Li et al. Output-based objective speech quality measurement using continuous Hidden Markov Models
CN114372513A (en) Training method, classification method, equipment and medium of bird sound recognition model
CN114510517A (en) Data processing method and system for health management of large-scale rotating unit
CN111341321A (en) Matlab-based spectrogram generating and displaying method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant