CN114040052A - Method for voice frequency collection and effective voice frequency screening of telephone voiceprint recognition - Google Patents

Method for voice frequency collection and effective voice frequency screening of telephone voiceprint recognition Download PDF

Info

Publication number
CN114040052A
CN114040052A CN202111280727.6A CN202111280727A CN114040052A CN 114040052 A CN114040052 A CN 114040052A CN 202111280727 A CN202111280727 A CN 202111280727A CN 114040052 A CN114040052 A CN 114040052A
Authority
CN
China
Prior art keywords
call
data
recording
real
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111280727.6A
Other languages
Chinese (zh)
Other versions
CN114040052B (en
Inventor
陈萍
施道平
袁哲
陈辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Haobai Technology Co ltd
Original Assignee
Jiangsu Best Tone Information Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Best Tone Information Service Co ltd filed Critical Jiangsu Best Tone Information Service Co ltd
Priority to CN202111280727.6A priority Critical patent/CN114040052B/en
Publication of CN114040052A publication Critical patent/CN114040052A/en
Application granted granted Critical
Publication of CN114040052B publication Critical patent/CN114040052B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/42221Conversation recording systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a method for collecting voice frequency of telephone voiceprint recognition and screening effective voice frequency, firstly, a large amount of processed call records are required to be obtained, and voiceprint algorithm model training is carried out; secondly, providing real-time recording audio in the authentication process so as to confirm the identity of the user in the call; the method mainly comprises two processes of real-time recording acquisition and storage and training identification data preprocessing; the real-time recording obtaining and storing process is to forward, analyze and store a real-time call data packet, and the training data preprocessing process comprises the processes of source data obtaining, effective recording obtaining, data expanding and feature extracting. The invention solves the problems of difficult acquisition of real-time recording data and role separation in the actual telephone scene application of voiceprint recognition and identification, solves the problem of poor quality of processed original data in the preprocessing process of voiceprint model training recognition data, and obviously improves the quality of call data.

Description

Method for voice frequency collection and effective voice frequency screening of telephone voiceprint recognition
Technical Field
The invention relates to the technical field of voiceprint processing, in particular to a real-time recording acquisition method of a telephone channel and a recording data pre-screening processing method for voiceprint model training and recognition.
Background
Along with information technology and artificial intelligence's continuous development, the service that the merchant provided is intelligent, convenient more, and the interaction of phone end conversation dialogue is not constrained in people and people's preceding conversation, and intelligent voice robot is quietly prosperous. The intelligent voice robot can design a set flow to interact with the user according to the service content, and simple and convenient in-depth service is brought to the user. With the development of service content towards personalization and personal customization, no matter manual customer service or intelligent customer service, user privacy is inevitably touched, and in order to prevent lawless persons from impersonating the user to operate, the identity of the user needs to be confirmed, namely authentication is carried out, and the service can be further provided only if the identity of the user is confirmed. It is well known that a person's biometric features, such as voiceprints, fingerprints, palm prints, faces, irises, handwritten signatures, etc., can uniquely identify an identity. Under the limitation of application driving and conversation scenes, the user identification authentication by utilizing the voiceprint characteristics of the user has greater advantages compared with other biological identification modes.
The voiceprint training recognition needs corresponding algorithm and data support, the authentication service of a call scene needs a system with high responsiveness, compared with face-to-face sound collection modes of intelligent assistants such as small degree assistants and small love assistants, a telephone channel is more complex and changeable, sound information is difficult to collect in real time, and the role separation of two parties in the call is a problem which needs to be considered, so that the problem of obtaining single-role call recording data in real time is brought; in addition, the user conversation in the telephone scene has the situations of too fast speech speed, too small voice, short simple conversation sentence and the like, and the data is directly preprocessed by using the existing VAD (voice endpoint detection) technology, so that the problems of unclean data cleaning or insufficient data duration after cleaning can be brought, and further, the voice feature is insufficient; meanwhile, the call environment in a real scene is not a quiet sound studio environment and is often accompanied by a large amount of noise, a simple VAD cannot perform mute cutting on call recording well, a large amount of noise can be reserved, and the simple VAD and the mute cutting are important factors influencing the voiceprint model training and recognition result, so that the data needs to be pre-screened. Meanwhile, how to acquire the real-time call record and then adopt some screening preprocessing strategies to improve the quality of the call record data trained and identified by the voiceprint model becomes a problem to be solved urgently.
Disclosure of Invention
The invention aims to provide a method for collecting voice print recognition voice frequency and screening effective voice frequency of a telephone, in particular to a task method for acquiring single-role call recording data in real time, preprocessing and screening the single-role call recording data to be used as training, recognition and identification of a voice print model and the like in a telephone scene; the method comprises the steps that firstly, a mirror image server is deployed to obtain SIP and RTP data packets sent in the call process in real time, and real-time recording files are analyzed from the SIP and RTP data packets and generated according to the call direction of a calling party and a called party, so that the problems of difficulty in obtaining real-time recording data and role separation in actual call scene application of voiceprint recognition and identification are solved; and secondly, the method combining ASR auxiliary screening and noise reduction and addition solves the problem that the quality of original data is still poor after the original data is processed due to the situations of environmental noise, audio frequency mismatching, user speaking habits and the like in the process of preprocessing the voiceprint model training recognition data, and the quality of the call data is remarkably improved.
The technical scheme adopted by the invention is as follows: a method for voice recognition audio acquisition and effective audio screening of telephone voiceprint comprises the steps of firstly, acquiring a large amount of processed call records, and carrying out voiceprint algorithm model training; secondly, providing real-time recording audio in the authentication process so as to confirm the identity of the user in the call; the method mainly comprises two processes of real-time recording acquisition and storage and training identification data preprocessing; the real-time recording obtaining and storing process is to forward, analyze and store a real-time call data packet, and the training data preprocessing process comprises the processes of source data obtaining, effective recording obtaining, data expanding and feature extracting.
Further, the real-time recording obtaining and storing process includes: step one, acquiring a data stream; step two, data analysis; step three, data forwarding; the method comprises the following specific steps:
the method comprises the following steps: acquiring data flow, wherein when a user is in a call, data information of a real-time call is forwarded and connected through a central network switch, and at the moment, a mirror image service captures data packets of each call in the forwarding connection from the switch in real time and sends the data packets to a voice server;
step two: data analysis, after the voice server successfully receives the data packet which is captured and forwarded, the data packet is analyzed, SIP and RTP data in the data packet are analyzed, information such as IP of a calling party, a called party and call flow is obtained, the name and sound channel information of the stored real-time voice flow file are determined according to the information, media information is obtained from RTP, and the media information is stored according to the name;
step three: and data forwarding, namely when a call recording acquisition request is sent from other servers to be used for model training or real-time voiceprint recognition, acquiring a real-time recording file according to the calling and called information and the date and time information, and performing data forwarding.
Furthermore, in the second step, the call state is confirmed according to the SIP protocol, and after the establishment of the session is confirmed, the real-time voice stream acquisition and storage are started; according to the IP information of the calling party and the called party in the SIP protocol, the IP information of the data source and the IP information of the data destination in the RTP are combined to judge the conversation track, the media information is acquired, and the sound stream is generated into real-time single-channel recording.
Further, the training identification data preprocessing process includes: step one, preparing source data; step two, removing silence and bottom noise; step three, detecting the duration; step four, ASR auxiliary screening; step five, data expansion; step six, training and identifying data preparation; seventhly, extracting vocal print features; step eight, training a model; step nine, identifying a model; the method comprises the following specific steps:
the method comprises the following steps: preparing source data, acquiring and processing the telephone call records of the user under the condition that the user is informed, reserving a call record named by the number of the user for each user, checking the record, converting the call record format into a wav file, and entering the step two, wherein the call record is a single-channel file only containing the conversation sound of the user;
step two: cutting off mute and bottom noise, performing mute detection operation on the recording, and removing the mute of a large section in each call recording; detecting the background noise recording which is smaller than the threshold value and continues to exceed the duration in the recording, cutting off the part, then combining the recordings without the silence and the background noise, and entering the third step;
step three: duration detection, detecting recording duration t after removing silence and noisewavDetecting whether the call recording duration exceeds a threshold value tau or not, ensuring the length of the sample, directly screening out the samples when the duration does not meet the requirement, and entering a fourth step when the duration often meets the threshold value requirement;
step four: carrying out ASR auxiliary detection on the call records screened in the last step, and mainly detecting the speed and the content of the records; detecting the screened recording files, if the screened recording files are the training voiceprint models, entering a fifth step for data expansion, and if the screened recording files are voiceprint recognition authentication, entering a sixth step for continuous preprocessing;
step five: data expansion, namely, carrying out noise adding treatment on the training corpus consisting of a plurality of speaker recordings obtained in the fourth step, adding reverberation or natural noise with a certain signal-to-noise ratio into the training corpus according to a proportion to obtain a mixed training corpus, and entering the sixth step;
step six: training and identifying data preparation, namely performing data frequency conversion on each recording and equally dividing the recording, taking a fixed number of cut call recording fragments to obtain a prepared sample, and entering the seventh step;
step seven: framing each call recording fragment, extracting the MFCC characteristics of each call recording fragment, carrying out model training in the step eight, and carrying out model identification in the step nine;
step eight: model training, namely inputting the prepared call recording characteristics into a model, and performing iterative training until the algorithm is converged;
step nine: and (3) model identification, namely inputting the characteristics of the registered call records of the call participants and the real-time call characteristics into a model, identifying whether the call participants are the same person or not, and returning the result.
Furthermore, in the fourth step, the recorded sound is translated to obtain the translated word number wtransAccording to wtransAnd the call recording time twavScreening out records with too fast/slow speech speed and call records with abnormal conversation in the call records;
first, the speech rate s is calculatedwav
Figure BDA0003330977850000031
Selecting the speed of speech swavE, recording the call of [3, 5);
then, based on the translated result, the conversation content is analyzed to screen out abnormal conversation record such as call hold.
Furthermore, in step seven, the MFCC is taken, and the input continuous speech is pre-emphasized first, that is, the speech signal is passed through a high pass filter h (z) -1- μ z-1Raising the high frequency part; and then, framing is carried out, wherein each frame is 32ms, each frame is windowed, FFT (fast Fourier transform), Mel (Mel) filtering group, logarithm operation and DTC (digital time series) are carried out, and MFCC parameters are finally obtained.
The invention has the beneficial effects that: the invention provides a method for acquiring real-time single-role call records in a telephone scene, and pre-screening and processing the call records to be used as tasks such as voiceprint training recognition.
The invention firstly utilizes a data mirror service to forward the call data, analyzes the information of calling and called numbers, IP and port numbers in SIP and RTP data packets, and acquires and stores the media sound stream according to the call direction of the calling and called parties, thereby realizing the real-time acquisition of single-channel call record and solving the problems of difficult acquisition of real-time call record and difficult separation of user roles in practical application. The method has the advantages that firstly, the method is real-time, can respond to the task of voiceprint real-time authentication which has high requirements on real-time recording acquisition, and secondly, compared with the role separation of dialogue mixed recording, two parties or multiple parties are directly separated from the conversation direction, so that the problem that the algorithm training identification accuracy is influenced due to inaccurate separation is reduced, in addition, the mirror image forwarding service is easy to deploy, and the applicability is strong;
the invention uses the text recognition ASR to carry out auxiliary pre-screening processing on the call records, and assists to screen out the low-quality call records with high noise, short conversation content and invalid conversation content in the training test sample by analyzing the speech speed and the content. The method has the advantages that the characteristics of noisy environment, user conversation content and uncertain length in practical application are considered, compared with direct preprocessing, the method greatly improves the quality of samples and is beneficial to improving model training and recognition results.
Drawings
FIG. 1 is a general flow diagram of the application of the present invention;
FIG. 2 is a block diagram of a real-time data acquisition phase of the present invention;
FIG. 3 is a flow chart of the training data preprocessing stage of the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments.
As shown in fig. 1, assuming that a voiceprint recognition algorithm needs to be trained to support the authentication service of the intelligent call to realize voiceprint recognition in the real-time call, a large amount of training data and real-time call data are required, and the process mainly includes two parts, namely, a real-time recording acquisition and storage (as shown in fig. 2) and a training recognition data preprocessing (as shown in fig. 3).
(1) The real-time recording acquiring and storing process comprises the following steps:
step 1, in the conversation process of a user, a conversation data packet is forwarded and connected through a central network switch, and a mirror image service on a server simultaneously sends the conversation data to a voice server;
step 2, the voice server receives the data packet captured by the mirror image server and analyzes the data;
step 2.1, connection establishment, namely receiving the SIP data packet, analyzing Invite data in the SIP data packet, determining calling and called information of a call, analyzing SDP data, and respectively obtaining a media port number and a media address of the calling and called, wherein the call connection establishment is successful;
step 2.2, saving the voice stream information, receiving an RTP data packet, determining a call object and a call direction according to a source IP and a port number as well as a target IP and a port number, determining a voice stream file, acquiring media information in the RTP packet, namely the voice stream information, and writing the media information into the voice stream file;
step 2.3, repeating the step 2.2 until a bye signaling of the SIP protocol is received, disconnecting the call connection and finishing the call record storage;
and 3, when algorithm training or algorithm recognition tasks need to acquire real-time recording, sending a recording request to the voice server, after receiving the request, determining a voice file by the voice server according to calling and called information, time and other related information, and forwarding the related recording.
(II) training and recognizing data preprocessing:
step 1, after the recording forwarded by the voice server is obtained, storing data information, if the recording is obtained in real time, the recording file may not be completely stored and is a partial recording, and at this time, the related information of the call recording file needs to be completely supplemented, so as to ensure that the recording is not damaged; checking whether the call recording is a single-pass recording or not, storing the call recording in a pcm or wav format, and converting the recording into a single-pass wav storage format;
step 2, detecting the recording segments with the intensity less than the threshold value and the duration more than 400ms in the call recording, deleting the call recording segments, and combining the residual effective sound parts;
step 3, both training and identification have requirements on the audio length, detecting the call recording duration t, and screening out call recordings smaller than 20 s;
step 4, carrying out ASR auxiliary detection on the screened-out voice record to obtain voice record translation content and word number w, calculating the voice speed, storing the call voice record with the voice speed of [3,5 ], and screening out the rest; and then, checking the translation content, performing matching check with an abnormal conversation library, and screening out the conversation records which do not accord with the normal conversation requirement. Combining the tasks, if the training is the voiceprint model training, entering a step 5, and if the voiceprint recognition or the related tasks are performed, entering a step 6;
step 5, adding noise and reverberation to the obtained call recording according to the proportion to obtain a new sample and obtain a mixed training corpus;
and 6, equally cutting the call record into a plurality of recording sections, wherein each recording section is recorded for 4-5 s. Then 4-6 sections of the sound recording are taken, the rest parts are deleted, and the sound recording section file of each sound recording is stored in a folder named by the sound recording name;
step 7, taking the recording segments under the call recording folder, framing each recording segment, extracting the MFCC (Mel frequency cepstrum coefficient) characteristics of the recording segment, training a voiceprint model according to the task, and entering step 8, and identifying the model into step 9;
step 8, inputting the acquired voiceprint characteristics into the model in batches, and performing iterative training until the algorithm is converged or a plurality of rounds of execution are completed to obtain a voiceprint recognition model;
and 9, inputting the characteristics of the registered call record and the real-time call record into a voiceprint recognition model, calculating the score, considering the user as the user if the score is larger than a threshold value, considering the user as not the user if the score is smaller than the threshold value, and returning the calculation result.
The foregoing illustrates and describes the principles, general features, and advantages of the present invention. It should be understood by those skilled in the art that the above embodiments do not limit the scope of the present invention in any way, and all technical solutions obtained by using equivalent substitution methods fall within the scope of the present invention.
The parts not involved in the present invention are the same as or can be implemented using the prior art.

Claims (8)

1. A method for voice recognition audio acquisition and effective audio screening of telephone voiceprint is characterized in that a large amount of processed call records are required to be obtained, and voiceprint algorithm model training is carried out; secondly, providing real-time recording audio in the authentication process so as to confirm the identity of the user in the call; the method mainly comprises two processes of real-time recording acquisition and storage and training identification data preprocessing; the real-time recording obtaining and storing process is to forward, analyze and store a real-time call data packet, and the training data preprocessing process comprises the processes of source data obtaining, effective recording obtaining, data expanding and feature extracting.
2. The method of claim 1, wherein the real-time recording capture and save process comprises: step one, acquiring a data stream; step two, data analysis; and step three, forwarding the data.
3. The method for phone voiceprint recognition audio acquisition and efficient audio screening according to claim 1 or 2, wherein the real-time recording acquisition and storage process comprises the following specific steps:
the method comprises the following steps: acquiring data flow, wherein when a user is in a call, data information of a real-time call is forwarded and connected through a central network switch, and at the moment, a mirror image service captures data packets of each call in the forwarding connection from the switch in real time and sends the data packets to a voice server;
step two: data analysis, after the voice server successfully receives the data packet which is captured and forwarded, the data packet is analyzed, SIP and RTP data in the data packet are analyzed, information such as IP of a calling party, a called party and call flow is obtained, the name and sound channel information of the stored real-time voice flow file are determined according to the information, media information is obtained from RTP, and the media information is stored according to the name;
step three: and data forwarding, namely when a call recording acquisition request is sent from other servers to be used for model training or real-time voiceprint recognition, acquiring a real-time recording file according to the calling and called information and the date and time information, and performing data forwarding.
4. The method according to claim 3, wherein in step two, the call state is confirmed according to the SIP protocol, and after the establishment of the session is confirmed, the real-time voice stream acquisition and storage are started; according to the IP information of the calling party and the called party in the SIP protocol, the IP information of the data source and the IP information of the data destination in the RTP are combined to judge the conversation track, the media information is acquired, and the sound stream is generated into real-time single-channel recording.
5. The method of claim 1, wherein the training identification data preprocessing comprises: step one, preparing source data; step two, removing silence and bottom noise; step three, detecting the duration; step four, ASR auxiliary screening; step five, data expansion; step six, training and identifying data preparation; seventhly, extracting vocal print features; step eight, training a model; and step nine, identifying the model.
6. The method for phone voiceprint recognition audio acquisition and efficient audio screening according to claim 1 or 5, wherein the training data preprocessing process comprises the following specific steps:
the method comprises the following steps: preparing source data, acquiring and processing the telephone call records of the user under the condition that the user is informed, reserving a call record named by the number of the user for each user, checking the record, converting the call record format into a wav file, and entering the step two, wherein the call record is a single-channel file only containing the conversation sound of the user;
step two: cutting off mute and bottom noise, performing mute detection operation on the recording, and removing the mute of a large section in each call recording; detecting the background noise recording which is smaller than the threshold value and continues to exceed the duration in the recording, cutting off the part, then combining the recordings without the silence and the background noise, and entering the third step;
step three: duration detection, detecting recording duration t after removing silence and noisewavDetecting whether the call recording duration exceeds a threshold value tau or not, ensuring the length of the sample, directly screening out the samples when the duration does not meet the requirement, and entering a fourth step when the duration often meets the threshold value requirement;
step four: carrying out ASR auxiliary detection on the call records screened in the last step, and mainly detecting the speed and the content of the records; detecting the screened recording files, if the screened recording files are the training voiceprint models, entering a fifth step for data expansion, and if the screened recording files are voiceprint recognition authentication, entering a sixth step for continuous preprocessing;
step five: data expansion, namely, carrying out noise adding treatment on the training corpus consisting of a plurality of speaker recordings obtained in the fourth step, adding reverberation or natural noise with a certain signal-to-noise ratio into the training corpus according to a proportion to obtain a mixed training corpus, and entering the sixth step;
step six: training and identifying data preparation, namely performing data frequency conversion on each recording and equally dividing the recording, taking a fixed number of cut call recording fragments to obtain a prepared sample, and entering the seventh step;
step seven: framing each call recording fragment, extracting the MFCC characteristics of each call recording fragment, carrying out model training in the step eight, and carrying out model identification in the step nine;
step eight: model training, namely inputting the prepared call recording characteristics into a model, and performing iterative training until the algorithm is converged;
step nine: and (3) model identification, namely inputting the characteristics of the registered call records of the call participants and the real-time call characteristics into a model, identifying whether the call participants are the same person or not, and returning the result.
7. The method of claim 6, wherein in step four, the recording is translated to obtain the number of translated words wtransAccording to wtransAnd the call recording time twavScreening out records with too fast/slow speech speed and call records with abnormal conversation in the call records;
first, the speech rate s is calculatedwav
Figure FDA0003330977840000021
Selecting the speed of speech swavE, recording the call of [3, 5);
then, based on the translated result, the conversation content is analyzed to screen out abnormal conversation record such as call hold.
8. The method of claim 6, wherein in step seven, MFCC is taken, and the input continuous speech is pre-emphasized first, i.e. the speech signal is passed through a high pass filter H (Z) -1- μ z-1Raising the high frequency part; and then, framing is carried out, wherein each frame is 32ms, each frame is windowed, FFT (fast Fourier transform), Mel (Mel) filtering group, logarithm operation and DTC (digital time series) are carried out, and MFCC parameters are finally obtained.
CN202111280727.6A 2021-11-01 2021-11-01 Method for identifying audio collection and effective audio screening of telephone voiceprint Active CN114040052B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111280727.6A CN114040052B (en) 2021-11-01 2021-11-01 Method for identifying audio collection and effective audio screening of telephone voiceprint

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111280727.6A CN114040052B (en) 2021-11-01 2021-11-01 Method for identifying audio collection and effective audio screening of telephone voiceprint

Publications (2)

Publication Number Publication Date
CN114040052A true CN114040052A (en) 2022-02-11
CN114040052B CN114040052B (en) 2024-01-19

Family

ID=80142346

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111280727.6A Active CN114040052B (en) 2021-11-01 2021-11-01 Method for identifying audio collection and effective audio screening of telephone voiceprint

Country Status (1)

Country Link
CN (1) CN114040052B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107731233A (en) * 2017-11-03 2018-02-23 王华锋 A kind of method for recognizing sound-groove based on RNN
CN109346086A (en) * 2018-10-26 2019-02-15 平安科技(深圳)有限公司 Method for recognizing sound-groove, device, computer equipment and computer readable storage medium
CN109754812A (en) * 2019-01-30 2019-05-14 华南理工大学 A kind of voiceprint authentication method of the anti-recording attack detecting based on convolutional neural networks
CN110556114A (en) * 2019-07-26 2019-12-10 国家计算机网络与信息安全管理中心 Speaker identification method and device based on attention mechanism
CN113395284A (en) * 2021-06-16 2021-09-14 中国电信股份有限公司 Multi-scene voice service real-time matching method, system, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107731233A (en) * 2017-11-03 2018-02-23 王华锋 A kind of method for recognizing sound-groove based on RNN
CN109346086A (en) * 2018-10-26 2019-02-15 平安科技(深圳)有限公司 Method for recognizing sound-groove, device, computer equipment and computer readable storage medium
CN109754812A (en) * 2019-01-30 2019-05-14 华南理工大学 A kind of voiceprint authentication method of the anti-recording attack detecting based on convolutional neural networks
CN110556114A (en) * 2019-07-26 2019-12-10 国家计算机网络与信息安全管理中心 Speaker identification method and device based on attention mechanism
CN113395284A (en) * 2021-06-16 2021-09-14 中国电信股份有限公司 Multi-scene voice service real-time matching method, system, equipment and storage medium

Also Published As

Publication number Publication date
CN114040052B (en) 2024-01-19

Similar Documents

Publication Publication Date Title
US10249304B2 (en) Method and system for using conversational biometrics and speaker identification/verification to filter voice streams
CN108922538B (en) Conference information recording method, conference information recording device, computer equipment and storage medium
CN110853646B (en) Conference speaking role distinguishing method, device, equipment and readable storage medium
US20100076770A1 (en) System and Method for Improving the Performance of Voice Biometrics
JP2006079079A (en) Distributed speech recognition system and its method
CN111199751B (en) Microphone shielding method and device and electronic equipment
CN111626061A (en) Conference record generation method, device, equipment and readable storage medium
CN113744742B (en) Role identification method, device and system under dialogue scene
CN113516987B (en) Speaker recognition method, speaker recognition device, storage medium and equipment
CN115862658A (en) System and method for extracting target speaker voice
CN114040052B (en) Method for identifying audio collection and effective audio screening of telephone voiceprint
CN110782901B (en) Method, storage medium and device for identifying voice of network telephone
CN210606618U (en) System for realizing voice and character recording
CN116472705A (en) Conference content display method, conference system and conference equipment
Fabien et al. Open-Set Speaker Identification pipeline in live criminal investigations
US20230005479A1 (en) Method for processing an audio stream and corresponding system
US20230206938A1 (en) Intelligent noise suppression for audio signals within a communication platform
Pao et al. Integration of Negative Emotion Detection into a VoIP Call Center System
KR20230171168A (en) Voice Conversation Method Using A terminal having an app with speaker diarisation technology
CN114598837A (en) Method and system for performing multi-role separation on microphone
CN117877510A (en) Voice automatic test method, device, electronic equipment and storage medium
CN117457008A (en) Multi-person voiceprint recognition method and device based on telephone channel
CN112151070A (en) Voice detection method and device and electronic equipment
CN114842851A (en) Voiceprint recognition method, system, equipment and storage medium
CN115662475A (en) Audio data processing method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: 210006 No. 501 Zhongshan South Road, Nanjing, Jiangsu Province

Patentee after: Jiangsu Haobai Technology Co.,Ltd.

Country or region after: China

Address before: 210000 17F, No. 501, south Zhongshan Road, Nanjing, Jiangsu

Patentee before: JIANGSU BEST TONE INFORMATION SERVICE CO.,LTD.

Country or region before: China

CP03 Change of name, title or address