CN111564163A - RNN-based voice detection method for various counterfeit operations - Google Patents

RNN-based voice detection method for various counterfeit operations Download PDF

Info

Publication number
CN111564163A
CN111564163A CN202010382185.2A CN202010382185A CN111564163A CN 111564163 A CN111564163 A CN 111564163A CN 202010382185 A CN202010382185 A CN 202010382185A CN 111564163 A CN111564163 A CN 111564163A
Authority
CN
China
Prior art keywords
voice
lfcc
rnn
matrix
obtaining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010382185.2A
Other languages
Chinese (zh)
Other versions
CN111564163B (en
Inventor
严迪群
乌婷婷
王让定
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ningbo University
Original Assignee
Ningbo University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningbo University filed Critical Ningbo University
Priority to CN202010382185.2A priority Critical patent/CN111564163B/en
Publication of CN111564163A publication Critical patent/CN111564163A/en
Application granted granted Critical
Publication of CN111564163B publication Critical patent/CN111564163B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a method for detecting various fake operations based on RNN, which comprises the following steps: 1) obtaining an original voice sample, performing M kinds of counterfeiting processing on the original voice sample to obtain M voices after counterfeiting operation and 1 original voice without processing, performing feature extraction on the voices to obtain an LFCC matrix of a training voice sample, and sending the LFCC matrix into an RNN classifier network for training to obtain a multi-classification training model; 2) obtaining a section of test voice, extracting the characteristics of the test voice to obtain an LFCC matrix of the test voice data, sending the LFCC matrix into the RNN classifier trained in the step 1) for classification, obtaining an output probability for each test voice, and combining all the output probabilities as a final prediction result: if the prediction result is the original voice, the test voice is recognized as the original voice; if the prediction result is a voice subjected to a certain falsification operation, the test voice is recognized as a falsified voice subjected to a corresponding falsification operation.

Description

RNN-based voice detection method for various counterfeit operations
Technical Field
The invention relates to a voice detection method, in particular to a voice detection method for various false operations based on RNN.
Background
With the continuous enhancement of the functions of the voice editing software, a non-professional person can easily modify the voice content. If a lawless person maliciously forges and modifies the voice, even the modified voice is used in the fields of news reports, judicial evidence collection, scientific research and the like, which brings huge threats and even causes immeasurable influence on social stability. The digital voice evidence obtaining method is used for detecting the counterfeit operation, plays a vital role in identifying the originality and the authenticity of the audio material, and is a key research topic in the current multimedia evidence obtaining field.
Most of the existing digital voice evidence-obtaining detection technologies detect single counterfeiting operation, namely, a prover assumes that the voice to be detected can pass through a certain specific counterfeiting operation. Mengyu Qiao et al propose a detection algorithm based on statistical features of quantized MDCT coefficients and their derivatives, detect up-converted and down-converted MP3 audio files, generate reference audio signals by recompressing and calibrating the audio, and then classify with a support vector machine, experimental results show that the method effectively detects MP3 double compression and can detect digitally forensified audio processing history. For example, Wanglihua et al propose a history detection of tonal modification speech processing based on convolutional neural network, which performs tonal modification by applying four different tonal modification software to three speech libraries, and performs detection in the speech libraries, among the libraries and among tonal modification methods on tonal modification factors of speech by using CNN, and the detection rate reaches over 90%.
The existing digital voice evidence-obtaining detection technology can detect single counterfeiting operation, and the detection rate can reach a high level. However, in practical applications, the forensics often cannot predict the specific operation of counterfeiting, and a false judgment may occur when a certain operation classifier is used for detection.
At present, most of the existing digital evidence obtaining works suitable for various counterfeiting operations are concentrated on the field of digital images, and research on digital voice evidence obtaining is still less. In the field of digital speech, the luviauqi team designs a convolutional neural network model, which can be used for detecting audio processing operations of default settings in two different audio editing software and provides better results. However, although this experiment pioneers the detection of various voice forgery operations, there are some problems that are not negligible, such as too high computational complexity, and an ideal application scenario for the forgery operation.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a method for detecting various counterfeit operation voices based on RNN, aiming at the defects in the prior art, and the detection accuracy can be improved.
The technical scheme adopted by the invention for solving the technical problems is as follows: a kind of voice detection method of various forging operations based on RNN, characterized by that: the method comprises the following steps:
1) training a network: obtaining an original voice sample, performing M kinds of counterfeiting processing on the original voice sample to obtain M voices after counterfeiting operation and 1 original voice without processing, performing feature extraction on the M voices after counterfeiting and the 1 original voice to obtain an LFCC matrix of a training voice sample, and sending the LFCC matrix into an RNN classifier network for training to obtain a multi-classification training model;
2) and (3) voice recognition: obtaining a section of test voice, extracting the characteristics of the test voice to obtain an LFCC matrix of the test voice data, sending the LFCC matrix into the RNN classifier trained in the step 1) for classification, obtaining an output probability for each test voice, and combining all the output probabilities as a final prediction result: if the prediction result is the original voice, the test voice is recognized as the original voice; if the prediction result is a voice subjected to a certain falsification operation, the test voice is recognized as a falsified voice subjected to a corresponding falsification operation.
Preferably, in steps 1) and 2), the step of obtaining the LFCC matrix is:
1) FFT: firstly, preprocessing voice, and calculating the spectrum energy E (i, k) of each voice frame after FFT:
Figure BDA0002482425550000021
where i is the number of speech frames, k is the frequency component, xi(m) is speech signal data of the i-th frame, N is the number of fourier transforms;
then, calculating the energy of the spectral energy E (i, k) of each frame after passing through a triangular filter bank:
Figure BDA0002482425550000022
Figure BDA0002482425550000023
wherein Hi(k) Representing the frequency response of the triangular filter, f (L) is the filtering function of the ith triangular filter, S (i, L) is the spectral line energy after passing through the triangular filter bank, L represents the number of the triangular filter, and L is the total number of the triangular filters;
2) DCT: the output data lfcc (i, n) of each triangular filter bank is calculated using DCT:
Figure BDA0002482425550000024
wherein n represents a spectral line after DCT of the ith frame;
3) obtaining LFCC statistical moment: taking 12-order LFCC coefficients from LFCC (i, n), calculating a mean value and a correlation coefficient, and obtaining an LFCC matrix extracted from a section of voice, wherein the LFCC matrix is as follows:
Figure BDA0002482425550000031
wherein xs,1…x1,nN LFCCs for the calculated s-th frame of speech data.
Preferably, the RNN classifier includes an LSTM network, a Dropout layer, a full connection layer, and a Softmax layer connected in sequence, where the Dropout layer is connected to the last LSTM network.
Preferably, the LSTM network has two, with parameters set to (64,128) and (128,64), respectively.
Preferably, the LSTM network uses a tanh activation function.
Preferably, the Dropout function value of the Dropout layer is 0.5.
Preferably, the original speech is in WAV format.
Compared with the prior art, the invention has the advantages that: the voice cepstrum features are adopted, the result probability is output in a classified mode through a recurrent neural network, the accuracy of voice detection is improved, the method is more suitable for digital voice carriers, and different forged traces can be recognized; compared with the existing deep learning-based method, the calculation complexity of the method is greatly reduced through the shared parameters in the RNN.
Drawings
FIG. 1 is a diagram illustrating the process of extracting the LFCC statistical moments of the speech detection method according to the embodiment of the present invention;
FIG. 2 is a general framework schematic diagram of a speech detection method according to an embodiment of the present invention;
fig. 3 is a network structure diagram of a voice detection method according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar functions.
In the description of the present invention, it is to be understood that the terms "central," "longitudinal," "lateral," "length," "width," "thickness," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," "clockwise," "counterclockwise," "axial," "radial," "circumferential," and the like are used in the orientations and positional relationships indicated in the drawings for convenience in describing the present invention and to simplify the description, but are not intended to indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and that the directional terms are used for purposes of illustration and are not to be construed as limiting, for example, because the disclosed embodiments of the present invention may be oriented in different directions, "lower" is not necessarily limited to a direction opposite to or coincident with the direction of gravity. Furthermore, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature.
A speech detection method based on RNN (recurrent neural network) for various false operations is realized by constructing a recurrent neural network framework based on cepstrum characteristics. Referring to fig. 2, the frame is made up of two parts: firstly, extracting the cepstrum characteristics of a voice sample, and then sending the cepstrum characteristics into a designed network frame for classification, thereby achieving the task of identifying various counterfeiting operations.
Specifically, in the present invention, feature extraction of speech is realized in the following manner. The cepstrum used in the present invention is characterized by Linear Frequency Cepstral Coeffients (LFCC). The speech cepstrum feature is one of the most commonly used feature parameters in speech technology, characterizes the human auditory features, and is widely used for speaker recognition.
LFCC is the average distribution of the band pass filters from low to high frequency. The LFCC statistical moment extraction process of the invention can be seen in FIG. 1:
1) FFT: firstly, preprocessing voice, and calculating the spectral energy E (i, k) of each voice frame after Fast Fourier Transform (FFT):
Figure BDA0002482425550000041
where i is the number of speech frames, k is the frequency component, xi(m) is speech signal data of the i-th frame, and N is the number of Fourier transforms.
Calculating the energy of the spectral energy E (i, k) of each frame after passing through a triangular filter bank:
Figure BDA0002482425550000042
Figure BDA0002482425550000043
wherein Hi(k) The frequency response of the triangular filter is shown, f (L) is the filtering function of the ith triangular filter, S (i, L) is the spectral line energy after passing through the triangular filter bank, L is the number of the triangular filter, and L is the total number of the triangular filters.
2) DCT: then, the output data lfcc (i, n) of each triangular filter bank is calculated using Discrete Cosine Transform (DCT):
Figure BDA0002482425550000044
wherein n represents the spectral line after DCT of the ith frame.
3) Obtaining LFCC statistical moment: taking LFCC coefficients of 12 orders from LFCC (i, n), calculating a mean value and a correlation coefficient, wherein the steps can be realized by the existing matlab function, and assuming that a certain segment of preprocessed voice has s frames in common, the LFCC matrix extracted from the segment of voice is as follows:
Figure BDA0002482425550000045
wherein xs,1…x1,nN LFCCs for the calculated s-th frame of speech data.
Referring to fig. 3, the network framework employs RNN classifiers, the selection of the number of network layers of which is crucial for the optimization algorithm, and deeper networks can learn more knowledge, but training at the same time also takes a long time and may be overfitting. Therefore, in the present invention, a network structure of the RNN classifier is proposed as shown in fig. 3. The network structure comprises 2 LSTM networks, parameters are respectively set to (64,128) and (128,64), and the performance of the model is improved by using a tanh activation function. The system also comprises a Dropout layer, a full connection layer (dense) and a Softmax layer which are connected in sequence, wherein the Dropout layer is connected with the last LSTM network. Setting the value of the Dropout function to 0.5 helps to reduce overfitting, and using the Softmax layer (Softmax classifier) to output the probability after the fully connected layer dimensionality reduction. The overall iterative training of the network framework was set to 50 rounds. Certain adjustments may be made during specific training.
Referring to fig. 2 again, the voice detection method includes the following steps:
1) the network framework needs to be trained first. Supposing that M kinds of forgery operations are provided, M kinds of forgery processing are respectively carried out on the original voice to obtain M +1 kinds of voice samples, including the voices after the M kinds of forgery operations and 1 original voice without processing. In the invention, certain constraint is provided for the input of the original voice, and a certain amount of WAV format audio sample library is required to be provided as training data of a network framework. Performing feature extraction on the M +1 voice samples to obtain an LFCC matrix of a training voice sample, and sending the LFCC matrix into a designed RNN classifier network for training to obtain a multi-classification training model; a plurality of original voice samples can be stored in a database, and each original voice sample is subjected to feature extraction and is sent to an RNN classifier for training;
2) then, obtaining a detection recognition result through the trained network framework: when a section of test voice is obtained, feature extraction is carried out on the test voice to obtain an LFCC matrix of the test voice data, and the LFCC matrix is sent into a trained RNN classifier to be classified. Each test voice will get an output probability, and all the output probabilities are combined as the final prediction result. If the prediction result is the original voice, the test voice is recognized as the original voice; if the prediction result is a voice subjected to a certain falsification operation, the test voice is recognized as the falsified voice. The acquirer can judge whether a certain voice is subjected to the counterfeiting operation according to the experimental result.

Claims (7)

1. A kind of voice detection method of various forging operations based on RNN, characterized by that: the method comprises the following steps:
1) training a network: obtaining an original voice sample, performing M kinds of counterfeiting processing on the original voice sample to obtain M voices after counterfeiting operation and 1 original voice without processing, performing feature extraction on the M voices after counterfeiting and the 1 original voice to obtain an LFCC matrix of a training voice sample, and sending the LFCC matrix into an RNN classifier network for training to obtain a multi-classification training model;
2) and (3) voice recognition: obtaining a section of test voice, extracting the characteristics of the test voice to obtain an LFCC matrix of the test voice data, sending the LFCC matrix into the RNN classifier trained in the step 1) for classification, obtaining an output probability for each test voice, and combining all the output probabilities as a final prediction result: if the prediction result is the original voice, the test voice is recognized as the original voice; if the prediction result is a voice subjected to a certain falsification operation, the test voice is recognized as a falsified voice subjected to a corresponding falsification operation.
2. The RNN-based voice detection method for multiple false operations according to claim 1, wherein: in steps 1) and 2), the steps of obtaining the LFCC matrix are:
1) FFT: firstly, preprocessing voice, and calculating the spectrum energy E (i, k) of each voice frame after FFT:
Figure FDA0002482425540000011
where i is the number of speech frames, k is the frequency component, xi(m) is speech signal data of the i-th frame, N is the number of fourier transforms;
then, calculating the energy of the spectral energy E (i, k) of each frame after passing through a triangular filter bank:
Figure FDA0002482425540000012
Figure FDA0002482425540000013
wherein Hi(k) Representing the frequency response of the triangular filter, f (l) being the filter function of the ith triangular filterS (i, L) is spectral line energy after passing through the triangular filter group, L represents the number of the triangular filter, and L is the total number of the triangular filter;
2) DCT: the output data lfcc (i, n) of each triangular filter bank is calculated using DCT:
Figure FDA0002482425540000014
wherein n represents a spectral line after DCT of the ith frame;
3) obtaining LFCC statistical moment: taking 12-order LFCC coefficients from LFCC (i, n), calculating a mean value and a correlation coefficient, and obtaining an LFCC matrix extracted from a section of voice, wherein the LFCC matrix is as follows:
Figure FDA0002482425540000021
wherein xs,1…x1,nN LFCCs for the calculated s-th frame of speech data.
3. The RNN-based voice detection method for multiple false operations according to claim 1, wherein: the RNN classifier comprises an LSTM network, and a Dropout layer, a full connection layer and a Softmax layer which are sequentially connected, wherein the Dropout layer is connected with the last LSTM network.
4. An RNN-based multiple false operation voice detection method according to claim 3, wherein: the LSTM network has two, with parameters set to (64,128) and (128,64), respectively.
5. An RNN-based multiple false operation voice detection method according to claim 3, wherein: the LSTM network uses a tanh activation function.
6. An RNN-based multiple false operation voice detection method according to claim 3, wherein: the Dropout function value of the Dropout layer is 0.5.
7. The RNN-based voice detection method for multiple false operations according to claim 1, wherein: the original speech is in WAV format.
CN202010382185.2A 2020-05-08 2020-05-08 RNN-based multiple fake operation voice detection method Active CN111564163B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010382185.2A CN111564163B (en) 2020-05-08 2020-05-08 RNN-based multiple fake operation voice detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010382185.2A CN111564163B (en) 2020-05-08 2020-05-08 RNN-based multiple fake operation voice detection method

Publications (2)

Publication Number Publication Date
CN111564163A true CN111564163A (en) 2020-08-21
CN111564163B CN111564163B (en) 2023-12-15

Family

ID=72071821

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010382185.2A Active CN111564163B (en) 2020-05-08 2020-05-08 RNN-based multiple fake operation voice detection method

Country Status (1)

Country Link
CN (1) CN111564163B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113299315A (en) * 2021-07-27 2021-08-24 中国科学院自动化研究所 Method for generating voice features through continuous learning without original data storage
CN113362814A (en) * 2021-08-09 2021-09-07 中国科学院自动化研究所 Voice identification model compression method fusing combined model information
CN113488027A (en) * 2021-09-08 2021-10-08 中国科学院自动化研究所 Hierarchical classification generated audio tracing method, storage medium and computer equipment
CN113488073A (en) * 2021-07-06 2021-10-08 浙江工业大学 Multi-feature fusion based counterfeit voice detection method and device
CN113555007A (en) * 2021-09-23 2021-10-26 中国科学院自动化研究所 Voice splicing point detection method and storage medium
CN115249487A (en) * 2022-07-21 2022-10-28 中国科学院自动化研究所 Incremental generated voice detection method and system for playback boundary load sample
CN116229960A (en) * 2023-03-08 2023-06-06 江苏微锐超算科技有限公司 Robust detection method, system, medium and equipment for deceptive voice
CN117690455A (en) * 2023-12-21 2024-03-12 合肥工业大学 Sliding window-based partial synthesis fake voice detection method and system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201514943D0 (en) * 2015-08-21 2015-10-07 Validsoft Uk Ltd Replay attack detection
US9299364B1 (en) * 2008-06-18 2016-03-29 Gracenote, Inc. Audio content fingerprinting based on two-dimensional constant Q-factor transform representation and robust audio identification for time-aligned applications
KR20160125628A (en) * 2015-04-22 2016-11-01 (주)사운드렉 A method for recognizing sound based on acoustic feature extraction and probabillty model
WO2018107810A1 (en) * 2016-12-15 2018-06-21 平安科技(深圳)有限公司 Voiceprint recognition method and apparatus, and electronic device and medium
CN108806698A (en) * 2018-03-15 2018-11-13 中山大学 A kind of camouflage audio recognition method based on convolutional neural networks
CN109599116A (en) * 2018-10-08 2019-04-09 中国平安财产保险股份有限公司 The method, apparatus and computer equipment of supervision settlement of insurance claim based on speech recognition
CN110491391A (en) * 2019-07-02 2019-11-22 厦门大学 A kind of deception speech detection method based on deep neural network
US20190384981A1 (en) * 2018-06-15 2019-12-19 Adobe Inc. Utilizing a trained multi-modal combination model for content and text-based evaluation and distribution of digital video content to client devices
CN110931022A (en) * 2019-11-19 2020-03-27 天津大学 Voiceprint identification method based on high-frequency and low-frequency dynamic and static characteristics

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9299364B1 (en) * 2008-06-18 2016-03-29 Gracenote, Inc. Audio content fingerprinting based on two-dimensional constant Q-factor transform representation and robust audio identification for time-aligned applications
KR20160125628A (en) * 2015-04-22 2016-11-01 (주)사운드렉 A method for recognizing sound based on acoustic feature extraction and probabillty model
GB201514943D0 (en) * 2015-08-21 2015-10-07 Validsoft Uk Ltd Replay attack detection
WO2018107810A1 (en) * 2016-12-15 2018-06-21 平安科技(深圳)有限公司 Voiceprint recognition method and apparatus, and electronic device and medium
CN108806698A (en) * 2018-03-15 2018-11-13 中山大学 A kind of camouflage audio recognition method based on convolutional neural networks
US20190384981A1 (en) * 2018-06-15 2019-12-19 Adobe Inc. Utilizing a trained multi-modal combination model for content and text-based evaluation and distribution of digital video content to client devices
CN109599116A (en) * 2018-10-08 2019-04-09 中国平安财产保险股份有限公司 The method, apparatus and computer equipment of supervision settlement of insurance claim based on speech recognition
CN110491391A (en) * 2019-07-02 2019-11-22 厦门大学 A kind of deception speech detection method based on deep neural network
CN110931022A (en) * 2019-11-19 2020-03-27 天津大学 Voiceprint identification method based on high-frequency and low-frequency dynamic and static characteristics

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
KANTHETI SRINIVAS: "Combining Phase-based Features for Replay Spoof Detection System", 《2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING》, pages 151 - 155 *
QIN ZHENZHEN: "Mapping model of network scenarios and routing metrics in DTN", 《 JOURNAL OF NANJING UNIVERSITY OF SCIENCE AND TECHNOLOGY》, vol. 40, no. 3, pages 291 - 296 *
乌婷婷 等: "针对多种伪造操作的数字语音取证算法", 《无线通信技术》, no. 3, pages 37 - 45 *
陈柱欣: "基于深度神经网络的声纹欺骗检测研究", no. 1, pages 136 - 340 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113488073A (en) * 2021-07-06 2021-10-08 浙江工业大学 Multi-feature fusion based counterfeit voice detection method and device
CN113488073B (en) * 2021-07-06 2023-11-24 浙江工业大学 Fake voice detection method and device based on multi-feature fusion
CN113299315A (en) * 2021-07-27 2021-08-24 中国科学院自动化研究所 Method for generating voice features through continuous learning without original data storage
CN113362814A (en) * 2021-08-09 2021-09-07 中国科学院自动化研究所 Voice identification model compression method fusing combined model information
CN113362814B (en) * 2021-08-09 2021-11-09 中国科学院自动化研究所 Voice identification model compression method fusing combined model information
CN113488027A (en) * 2021-09-08 2021-10-08 中国科学院自动化研究所 Hierarchical classification generated audio tracing method, storage medium and computer equipment
CN113555007B (en) * 2021-09-23 2021-12-14 中国科学院自动化研究所 Voice splicing point detection method and storage medium
US11410685B1 (en) 2021-09-23 2022-08-09 Institute Of Automation, Chinese Academy Of Sciences Method for detecting voice splicing points and storage medium
CN113555007A (en) * 2021-09-23 2021-10-26 中国科学院自动化研究所 Voice splicing point detection method and storage medium
CN115249487A (en) * 2022-07-21 2022-10-28 中国科学院自动化研究所 Incremental generated voice detection method and system for playback boundary load sample
CN116229960A (en) * 2023-03-08 2023-06-06 江苏微锐超算科技有限公司 Robust detection method, system, medium and equipment for deceptive voice
CN116229960B (en) * 2023-03-08 2023-10-31 江苏微锐超算科技有限公司 Robust detection method, system, medium and equipment for deceptive voice
CN117690455A (en) * 2023-12-21 2024-03-12 合肥工业大学 Sliding window-based partial synthesis fake voice detection method and system
CN117690455B (en) * 2023-12-21 2024-05-28 合肥工业大学 Sliding window-based partial synthesis fake voice detection method and system

Also Published As

Publication number Publication date
CN111564163B (en) 2023-12-15

Similar Documents

Publication Publication Date Title
CN111564163B (en) RNN-based multiple fake operation voice detection method
Badshah et al. Deep features-based speech emotion recognition for smart affective services
CN109816092A (en) Deep neural network training method, device, electronic equipment and storage medium
CN109949824B (en) City sound event classification method based on N-DenseNet and high-dimensional mfcc characteristics
CN111933124B (en) Keyword detection method capable of supporting self-defined awakening words
CN111783534B (en) Sleep stage method based on deep learning
CN107911346B (en) Intrusion detection method based on extreme learning machine
CN110120230B (en) Acoustic event detection method and device
Huang et al. A novel method for detecting image forgery based on convolutional neural network
CN111652318B (en) Currency identification method, identification device and electronic equipment
CN106910495A (en) Audio classification system and method applied to abnormal sound detection
CN111275165A (en) Network intrusion detection method based on improved convolutional neural network
CN112087442A (en) Time sequence related network intrusion detection method based on attention mechanism
CN113488073A (en) Multi-feature fusion based counterfeit voice detection method and device
CN111863025A (en) Audio source anti-forensics method
CN111191742A (en) Sliding window length self-adaptive adjustment method for multi-source heterogeneous data stream
CN114495950A (en) Voice deception detection method based on deep residual shrinkage network
Mallick et al. Copy move and splicing image forgery detection using cnn
CN113707175B (en) Acoustic event detection system based on feature decomposition classifier and adaptive post-processing
CN113299315B (en) Method for generating voice features through continuous learning without original data storage
CN113450806A (en) Training method of voice detection model, and related method, device and equipment
Jia [Retracted] Music Emotion Classification Method Based on Deep Learning and Explicit Sparse Attention Network
CN117649621A (en) Fake video detection method, device and equipment
CN116229960B (en) Robust detection method, system, medium and equipment for deceptive voice
Qin et al. Multi-branch feature aggregation based on multiple weighting for speaker verification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant