CN112700781A - Voice interaction system based on artificial intelligence - Google Patents

Voice interaction system based on artificial intelligence Download PDF

Info

Publication number
CN112700781A
CN112700781A CN202011551759.0A CN202011551759A CN112700781A CN 112700781 A CN112700781 A CN 112700781A CN 202011551759 A CN202011551759 A CN 202011551759A CN 112700781 A CN112700781 A CN 112700781A
Authority
CN
China
Prior art keywords
interaction
audio information
user
module
marking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011551759.0A
Other languages
Chinese (zh)
Other versions
CN112700781B (en
Inventor
李本松
许兵兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi Taide Intelligence Technology Co Ltd
Original Assignee
Jiangxi Taide Intelligence Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi Taide Intelligence Technology Co Ltd filed Critical Jiangxi Taide Intelligence Technology Co Ltd
Priority to CN202011551759.0A priority Critical patent/CN112700781B/en
Publication of CN112700781A publication Critical patent/CN112700781A/en
Application granted granted Critical
Publication of CN112700781B publication Critical patent/CN112700781B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters

Landscapes

  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Game Theory and Decision Science (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a voice interaction system based on artificial intelligence, which relates to the technical field of artificial intelligence and comprises a registration login module, a controller, a database, a data acquisition module, a storage module, a voice recognition module, an audio analysis module, a voice library, an input module and a distribution management module; the controller is used for auditing and filtering the received audio information so as to find out a target voiceprint; the voice recognition method has the advantages that the voice information of a person to be recognized can be well recognized, the recognition precision is high, meanwhile, before the target voice information is sent to the voice recognition module, the effectiveness of the target voice information is judged by the voice analysis module in combination with the vowel interval and the vowel strength, the clarity and the accuracy of recognized voice can be effectively guaranteed, and the voice recognition speed is improved; the distribution management module is used for receiving the unsolved signal and distributing corresponding background personnel for remote interaction, and can reasonably distribute the background personnel for remote interaction according to the interaction value of the user, so that the user experience degree is improved.

Description

Voice interaction system based on artificial intelligence
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a voice interaction system based on artificial intelligence.
Background
Artificial intelligence (artificial intelligence), abbreviated in english as AI, is a new technical science of studying, developing theories, methods, techniques and application systems for simulating, extending and expanding human intelligence. Artificial intelligence has gained increasing attention in the computer field and has been widely used in robots, economic and political decisions, control systems, and simulation systems.
HCI is an abbreviation of Human-computer interaction, and means Human-computer interaction, which refers to a medium and a dialogue interface for transferring and exchanging information between a Human and a computer, and is an important component of a computer system. Human-computer interaction has been an important issue for optimizing the utilization of computers. In recent years, with the explosion of artificial intelligence, the development of man-machine interaction is rapidly advanced. The general trend of human-computer interaction is towards a user-centered, more intuitive interaction approach.
However, in the existing voice interaction system, when the voice recognition site is noisy or the number of people speaking at the same time is large, the voice of the person to be recognized cannot be recognized well, the recognition precision is low, and it cannot be guaranteed that the recognized voice is clear and accurate, so that when some problems occur in voice consultation of a user, the regulation and control response is slow, the use feeling is affected, and the voice recognition speed needs to be improved; and when the user does not answer satisfactorily, the problem that background personnel cannot be reasonably allocated to carry out remote interaction exists, and the user experience degree is influenced.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a voice interaction system based on artificial intelligence. The invention can better identify the audio information of the person to be identified when the voice identification site is noisy or the number of people speaking at the same time is large, the identification precision is high, and meanwhile, the validity of the target audio information is judged by combining the vowel interval and the vowel strength before the target audio information is sent to the voice identification module, so that the clarity and the accuracy of the identified voice can be effectively ensured, and the voice identification speed is improved.
The purpose of the invention can be realized by the following technical scheme:
a voice interaction system based on artificial intelligence comprises a registration login module, a controller, a database, a data acquisition module, a storage module, a voice recognition module, an audio analysis module, a voice library, an input module and a distribution management module;
the data acquisition module is in communication connection with a mobile terminal of a user; the data acquisition module is used for acquiring voiceprints and audio information of indoor personnel in real time and sending the acquired voiceprints and audio information to the controller, and the controller is used for auditing and filtering the received audio information and then transmitting corresponding target audio information to the storage module and the voice recognition module;
the audio analysis module is used for acquiring target audio information and judging the effectiveness of the target audio information before sending the target audio information to the voice recognition module; if the target audio information is valid, the target audio information is sent to a voice recognition module; if the audio information is invalid, the audio information is collected again;
the voice recognition module is used for performing voice recognition by using the target audio information distributed by the controller to generate an analysis text and returning the analysis text to the controller, and the controller is used for calling voice library data according to the analysis text and pushing the voice library data to a mobile terminal of a user;
the input module is in communication connection with a mobile terminal of a user; the input module is used for feeding back an evaluation signal to the controller by a user, wherein the evaluation signal comprises a solution signal and an unresolved signal;
the controller is used for receiving the solved signal and the unresolved signal and sending the unresolved signal to the distribution management module when the unresolved signal is received; and the distribution management module is used for receiving the unresolved signals and distributing corresponding background personnel for remote interaction.
Further, the registration login module is used for registering and logging in personal information to form a registered person after a user inputs the personal information through the mobile terminal, and sending the personal information to the controller, wherein the personal information comprises a name, a gender, an age, a real-name authentication mobile phone number and an identification number; the controller is used for sending the personal information of the registered personnel to the database for storage; the controller performs voice training by adopting an NLP algorithm and outputs a result corresponding to the analysis text; the NLP algorithm carries out voice training to generate a corresponding result, and the corresponding result is stored in a voice library; the data acquisition module is in communication connection with a database, the database is used for storing the voiceprint characteristics of each registered person, and the voiceprint characteristics are associated with the identity information of the registered person.
Further, the method for the controller to audit and filter the audio information comprises the following steps:
the method comprises the following steps: when the audio information of a plurality of persons is collected, acquiring the voiceprint characteristics of each piece of audio information through a voiceprint recognition technology, comparing the voiceprint characteristics with the voiceprint characteristics of registered persons stored in a database, finding out the same voiceprint, and marking the same voiceprint as a primary voiceprint;
step two: acquiring a mobile phone number of a mobile terminal, comparing the mobile phone number of the mobile terminal with a real-name authentication mobile phone number of a registered person stored in a database, and acquiring identity information and corresponding voiceprint characteristics of a user; marking the voiceprint characteristics corresponding to the user as standard voiceprints;
step three: comparing the initial selected voiceprint with the standard voiceprint, and marking the initial selected voiceprint consistent with the standard voiceprint as a target voiceprint; and marking the audio information corresponding to the target voiceprint as target audio information.
Further, the specific analysis steps of the audio analysis module are as follows:
SS 1: carrying out noise reduction enhancement processing on the target audio information;
SS 2: acquiring the acquisition time of each vowel in the target audio information and marking the acquisition time as Ti;i=1,…,n;
Using formula Ci=Ti+1-TiCalculating the time difference between two adjacent vowels and marking the time difference as a single interval Ci
SS 3: single interval CiComparing with an interval threshold; the interval thresholds include a first interval threshold G1, a second interval threshold G2; and G1 < G2;
if CiWhen G2 is not less than G, the single interval is marked as an influence interval; the interval threshold corresponding to the influence interval is a second interval threshold G2;
if CiWhen G1 is not more than G, marking the single interval as an influence interval; the interval threshold corresponding to the influence interval is a first interval threshold G1;
counting the occurrence times of the influence intervals, marking the occurrence times as D1, calculating the difference value between the influence intervals and the corresponding interval threshold value to obtain an offset value, and marking the offset value as D2;
SS 4: setting a plurality of offset coefficients and marking as Kc; c is 1, 2, …, w; k1 is more than K2 is more than … is more than Kw; each offset coefficient Kc corresponds to a preset offset value range which is respectively (k1, k 2) in sequence],(k2,k3],…,(kw,kw+1](ii) a K1 is more than k2 is more than … is more than kw + 1;
when D2E is (kw, kw)+1]If yes, the offset coefficient corresponding to the preset offset value range is Kw;
obtaining an influence value D3 corresponding to the offset value by using a formula D3-D2 Kw; summing the influence values corresponding to all the deviation values to obtain a total deviation influence value, and marking the total deviation influence value as D4;
SS 5: obtaining an interval influence value D5 by using a formula D5 ═ D1 × A1+ D4 × A2; wherein A1 and A2 are proportionality coefficients;
SS 6: if the interval influence value D5 is smaller than the corresponding interval influence threshold value, the target audio information is valid, otherwise, the target audio information is invalid;
SS 7: the intensity of each vowel in the target audio information is obtained and marked as QiObtaining a vowel intensity information group; calculating to obtain real-time Q according to a standard deviation calculation formulaiWhen the standard deviation alpha of the information group is smaller than a preset value, the information group is in a state to be verified;
SS 8: when Q isiWhen in the state to be verified, Q is setiObtaining Q according to the sequence from high to lowiIs marked as Qmax, and Q is obtainediIs marked as Qmin;
SS 9: setting the preset intensity of each vowel as QS, calculating QiDifference from preset intensity QSWhen the intensity difference QJi is reached, if all QJi are smaller than the preset intensity difference and the difference between Qmax and Qmin is smaller than the preset intensity difference, the target audio information is valid, otherwise the target audio information is invalid.
Further, the specific allocation steps of the allocation management module are as follows:
p1: the user feeds back an unresolved signal through the mobile terminal, and the mobile terminal is marked as j; marking the moment when the user feeds back the unresolved signal as a signal feedback moment; marking the moment when the data acquisition module acquires the audio information as the interaction starting moment;
calculating the time difference between the interaction starting time and the signal feedback time to obtain waiting time length, and marking the waiting time length as R1;
p2: acquiring a voice interaction record of a mobile terminal j ten days before the current time of the system; the voice interaction record comprises interaction times, interaction starting time and interaction ending time;
accumulating the interaction times to form interaction frequency, and marking as L1;
calculating the time difference between the interaction starting time and the interaction ending time to obtain interaction duration, accumulating all the interaction durations to form total interaction duration, and marking the total interaction duration as L2;
acquiring the number of times of feeding back the unresolved signal by the mobile terminal j, and marking as L3;
p3: the method comprises the steps of obtaining the model of a mobile terminal j, setting a corresponding preset value for each model of the mobile terminal, matching the model of the mobile terminal j with all models to obtain the corresponding preset value of the mobile terminal j, and marking the preset value as L4;
p4: acquiring identity information of a user, comparing the identity information with consumption identity information stored in a big data platform, acquiring a consumption value, and marking the consumption value as X1;
p5: normalizing the waiting time, the interactive frequency, the interactive total time, the times of the unresolved signals, the corresponding preset values and the consumption values and taking the numerical values;
obtaining a user interaction value QW by using a formula QW 1 × a1+ L1 × a2+ L2 × a3+ L4 × a4+ X1 × a5-L3 × a6, wherein a1, a2, a3, a4, a5 and a6 are all proportional coefficients;
p6: comparing the interaction value QW of the user with interaction thresholds, the interaction thresholds comprising YT1, YT 2; and YT1 < YT 2; the method specifically comprises the following steps:
p61: if the QW is larger than or equal to YT2, the interaction level of the user is high, and background personnel of the VIP line are connected for remote interaction;
p62: if YT1 is not more than QW and is less than YT2, the interaction level of the user is moderate, and background personnel of a first-level special line are switched on to carry out remote interaction;
p63: and if QW is less than YT1, the interaction level of the user is general, and background personnel who switch on a common private line perform remote interaction.
Further, the consumption value in step P4 is calculated by:
p41: acquiring identity information of a user, comparing the identity information with consumption identity information stored in a big data platform, and acquiring consumption records of the user;
p42: the yearly average GDP of the user is labeled GP 1; marking the user's monthly average income as GP2 and the user's monthly average consumption as GP 3;
obtain the user's liquidity and label as LD 1; the mobile fund is a current deposit of the user;
p43: the consumption value X1 of the user is obtained by using the formula X1 ═ GP1 × b1+ GP2 × b2+ GP3 × b3+ LD1 × b4, wherein b1, b2, b3 and b4 are all proportional coefficients.
The invention has the beneficial effects that:
1. when the audio information of a plurality of persons is collected, the controller is used for auditing and filtering the received audio information, acquiring the voiceprint characteristics of each piece of audio information through a voiceprint recognition technology, and comparing the voiceprint characteristics with the voiceprint characteristics of registered persons stored in the database so as to find out a target voiceprint; marking the audio information corresponding to the target voiceprint as target audio information; before the target audio information is sent to the voice recognition module, the audio analysis module is used for acquiring the target audio information and judging the effectiveness of the target audio information by combining the vowel interval and the vowel intensity of the target audio information; if the target audio information is valid, the target audio information is sent to a voice recognition module; if the audio information is invalid, the audio information is collected again; the audio information of a person to be identified can be well identified, and the identification precision is high; meanwhile, the clarity and accuracy of the recognized voice are effectively ensured, and the voice recognition speed is improved;
2. the system comprises an input module, a distribution management module, a waiting time length, an interaction frequency, an interaction total time length, the times of unresolved signals, a corresponding preset value and a consumption value, wherein the input module is used for feeding back an evaluation signal to a controller by a user; if the QW is larger than or equal to YT2, switching on background personnel of the guest line for remote interaction; if the YT1 is not more than QW and is less than YT2, the background personnel of the first-level special line are switched on for remote interaction; if QW is less than YT1, switching on background personnel of a common special line for remote interaction; background personnel can be reasonably distributed according to the interaction value of the user to carry out remote interaction, and the user experience is improved.
Drawings
In order to facilitate understanding for those skilled in the art, the present invention will be further described with reference to the accompanying drawings.
FIG. 1 is a block diagram of the system of the present invention;
FIG. 2 is a block diagram of a system according to embodiment 1 of the present invention;
fig. 3 is a system block diagram of embodiment 2 of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1-3, a voice interaction system based on artificial intelligence comprises a registration module, a controller, a database, a data acquisition module, a storage module, a voice recognition module, an audio analysis module, a voice library, an input module, and a distribution management module;
the registration login module is used for registering and logging in personal information after a user inputs the personal information through the mobile terminal to form a registered person, and sending the personal information to the controller, wherein the personal information comprises name, gender, age, real-name authentication mobile phone number and identification number; the controller is used for sending the personal information of the registered personnel to the database for storage;
example 1
As shown in fig. 2, the data acquisition module is in communication connection with the mobile terminal of the user;
the data acquisition module is used for acquiring voiceprints and audio information of indoor personnel in real time and sending the acquired voiceprints and audio information to the controller, and the controller is used for auditing and filtering the received audio information and then transmitting corresponding target audio information to the storage module and the voice recognition module;
the data acquisition module is in communication connection with a database, the database is used for storing the voiceprint characteristics of each registered person, and the voiceprint characteristics are associated with the identity information of the registered person;
the method for auditing and filtering the audio information by the controller comprises the following steps:
the method comprises the following steps: when the audio information of a plurality of persons is collected, acquiring the voiceprint characteristics of each piece of audio information through a voiceprint recognition technology, comparing the voiceprint characteristics with the voiceprint characteristics of registered persons stored in a database, finding out the same voiceprint, and marking the same voiceprint as a primary voiceprint;
step two: acquiring a mobile phone number of a mobile terminal, comparing the mobile phone number of the mobile terminal with a real-name authentication mobile phone number of a registered person stored in a database, and acquiring identity information and corresponding voiceprint characteristics of a user; marking the voiceprint characteristics corresponding to the user as standard voiceprints;
step three: comparing the initial selected voiceprint with the standard voiceprint, and marking the initial selected voiceprint consistent with the standard voiceprint as a target voiceprint; marking the audio information corresponding to the target voiceprint as target audio information;
the audio analysis module is used for acquiring target audio information and judging the effectiveness of the target audio information before sending the target audio information to the voice recognition module; if the target audio information is valid, the target audio information is sent to a voice recognition module; if the audio information is invalid, the audio information is collected again; the specific analysis steps are as follows:
SS 1: carrying out noise reduction enhancement processing on the target audio information;
SS 2: acquiring the acquisition time of each vowel in the target audio information and marking the acquisition time as Ti;i=1,…,n;
Using formula Ci=Ti+1-TiCalculating the time difference between two adjacent vowels and marking the time difference as a single interval Ci
SS 3: single interval CiComparing with an interval threshold; the interval thresholds comprise a first interval threshold G1, a second interval threshold G2; and G1 < G2;
if CiWhen G2 is not less than G, the single interval is marked as an influence interval; the interval threshold corresponding to the influence interval is a second interval threshold G2;
if CiWhen G1 is not more than G, marking the single interval as an influence interval; the interval threshold corresponding to the influence interval is a first interval threshold G1;
counting the occurrence times of the influence intervals, marking the occurrence times as D1, calculating the difference value between the influence intervals and the corresponding interval threshold value to obtain an offset value, and marking the offset value as D2;
SS 4: setting a plurality of offset coefficients and marking as Kc; c is 1, 2, …, w; k1 is more than K2 is more than … is more than Kw; each offset coefficient Kc corresponds to a preset offset value range which is respectively (k1, k 2) in sequence],(k2,k3],…,(kw,kw+1](ii) a And k1 < k2 < … < kw+1
When D2E is (kw, kw)+1]If yes, the offset coefficient corresponding to the preset offset value range is Kw;
obtaining an influence value D3 corresponding to the offset value by using a formula D3-D2 Kw; summing the influence values corresponding to all the deviation values to obtain a total deviation influence value, and marking the total deviation influence value as D4;
SS 5: obtaining an interval influence value D5 by using a formula D5 ═ D1 × A1+ D4 × A2; wherein A1 and A2 are proportionality coefficients; for example, a1 takes on a value of 0.44, a2 takes on a value of 0.67;
SS 6: if the interval influence value D5 is smaller than the corresponding interval influence threshold value, the target audio information is valid, otherwise, the target audio information is invalid;
SS 7: the intensity of each vowel in the target audio information is obtained and marked as QiObtaining a vowel intensity information group; calculating to obtain real-time Q according to a standard deviation calculation formulaiWhen the standard deviation alpha of the information group is smaller than a preset value, the information group is in a state to be verified;
SS 8: when Q isiWhen in the state to be verified, Q is setiObtaining Q according to the sequence from high to lowiIs marked as Qmax, and Q is obtainediIs marked as Qmin;
SS 9: setting the preset intensity of each vowel as QS, calculating QiObtaining an intensity difference QJi from the difference between the preset intensity QS, if all QJi values are smaller than the preset intensity difference and the difference between Qmax and Qmin is smaller than the preset intensity difference, the target audio information is valid, otherwise the target audio information is invalid;
in the embodiment 1, when a speech recognition site is noisy or the number of people speaking at the same time is large, the audio information of a person to be recognized can be well recognized, the recognition precision is high, and meanwhile, before the target audio information is sent to the speech recognition module, the effectiveness of the target audio information is judged by combining the vowel interval and the vowel strength, so that the clarity and accuracy of the recognized speech can be effectively guaranteed, and the speech recognition speed is improved;
the voice recognition module is used for performing voice recognition by using the target audio information distributed by the controller to generate an analysis text and returning the analysis text to the controller, and the controller is used for calling voice library data according to the analysis text and pushing the voice library data to a mobile terminal of a user;
the controller performs voice training by adopting an NLP algorithm and outputs a result corresponding to the analysis text; the NLP algorithm carries out voice training to generate a corresponding result, and the corresponding result is stored in a voice library;
example 2
As shown in fig. 3; the input module is in communication connection with a mobile terminal of a user; the input module is used for feeding back an evaluation signal to the controller by a user, the evaluation signal is used for evaluating the voice database data pushed to the mobile terminal by the user, and the evaluation signal comprises a solution signal and an unsolved signal;
the controller is used for receiving the solved signal and the unresolved signal and sending the unresolved signal to the distribution management module when the unresolved signal is received;
the distribution management module is used for receiving the unresolved signals and distributing corresponding background personnel for remote interaction, and the specific distribution steps are as follows:
p1: the user feeds back an unresolved signal through the mobile terminal, and the mobile terminal is marked as j; marking the moment when the user feeds back the unresolved signal as a signal feedback moment; marking the moment when the data acquisition module acquires the audio information as the interaction starting moment;
calculating the time difference between the interaction starting time and the signal feedback time to obtain waiting time length, and marking the waiting time length as R1;
p2: acquiring a voice interaction record of a mobile terminal j ten days before the current time of the system; the voice interaction record comprises interaction times, interaction starting time and interaction ending time;
accumulating the interaction times to form interaction frequency, and marking as L1;
calculating the time difference between the interaction starting time and the interaction ending time to obtain interaction duration, accumulating all the interaction durations to form total interaction duration, and marking the total interaction duration as L2;
acquiring the number of times of feeding back the unresolved signal by the mobile terminal j, and marking as L3;
p3: the method comprises the steps of obtaining the model of a mobile terminal j, setting a corresponding preset value for each model of the mobile terminal, matching the model of the mobile terminal j with all models to obtain the corresponding preset value of the mobile terminal j, and marking the preset value as L4;
p4: acquiring identity information of a user, comparing the identity information with consumption identity information stored in a big data platform, acquiring a consumption value, and marking the consumption value as X1;
p5: normalizing the waiting time, the interactive frequency, the interactive total time, the times of the unresolved signals, the corresponding preset values and the consumption values and taking the numerical values;
obtaining an interaction value QW of the user by using a formula QW-R1 × a1+ L1 × a2+ L2 × a3+ L4 × a4+ X1 × a5-L3 × a6, wherein a1, a2, a3, a4, a5 and a6 are all proportional coefficients, for example, a1 takes a value of 0.35, a2 takes a value of 0.42, a3 takes a value of 0.51, a4 takes a value of 0.19, a5 takes a value of 0.48 and a6 takes a value of 0.87;
p6: comparing the interaction value QW of the user with interaction thresholds, the interaction thresholds comprising YT1, YT 2; and YT1 < YT 2; the method specifically comprises the following steps:
p61: if the QW is larger than or equal to YT2, the interaction level of the user is high, and background personnel of the VIP line are connected for remote interaction;
p62: if YT1 is not more than QW and is less than YT2, the interaction level of the user is moderate, and background personnel of a first-level special line are switched on to carry out remote interaction;
p63: if QW is less than YT1, it indicates that the user's interaction level is general, and connects the background personnel of the common private line for remote interaction;
the calculation method of the consumption value in the step P4 is as follows:
p41: acquiring identity information of a user, comparing the identity information with consumption identity information stored in a big data platform, and acquiring consumption records of the user;
p42: the yearly average GDP of the user is labeled GP 1; marking the user's monthly average income as GP2 and the user's monthly average consumption as GP 3;
obtain the user's liquidity and label as LD 1; the mobile fund is the current deposit of the user;
p43: obtaining a consumption value X1 of the user by using a formula X1 ═ GP1 × b1+ GP2 × b2+ GP3 × b3+ LD1 × b4, wherein b1, b2, b3 and b4 are all proportional coefficients, for example, b1 takes a value of 0.44, b2 takes a value of 0.61, b3 takes a value of 0.38 and b4 takes a value of 0.74;
this embodiment 2 can judge whether the user problem is solved according to the evaluation signal that the user fed back, and when the user problem was not solved, can rationally distribute backstage personnel according to user's interactive value and carry out remote interaction, improves user experience.
The working principle of the invention is as follows:
a voice interaction system based on artificial intelligence is characterized in that when the voice interaction system works, a data acquisition module is used for acquiring voiceprints and audio information of indoor personnel in real time; the controller is used for auditing and filtering the received audio information; when the audio information of a plurality of persons is collected, acquiring the voiceprint characteristics of each piece of audio information through a voiceprint recognition technology, comparing the voiceprint characteristics with the voiceprint characteristics of registered persons stored in a database, finding out the same voiceprint, and marking the same voiceprint as a primary voiceprint; acquiring a mobile phone number of a mobile terminal, comparing the mobile phone number of the mobile terminal with a real-name authentication mobile phone number of a registered person stored in a database, and acquiring identity information and corresponding voiceprint characteristics of a user; marking the voiceprint characteristics corresponding to the user as standard voiceprints; comparing the initial selected voiceprint with the standard voiceprint, and marking the initial selected voiceprint consistent with the standard voiceprint as a target voiceprint; marking the audio information corresponding to the target voiceprint as target audio information; before the target audio information is sent to the voice recognition module, the audio analysis module is used for acquiring the target audio information and judging the effectiveness of the target audio information by combining the vowel interval and the vowel intensity of the target audio information; if the target audio information is valid, the target audio information is sent to a voice recognition module; if the audio information is invalid, the audio information is collected again; the method can effectively ensure the clearness and accuracy of the recognized voice and improve the voice recognition speed;
the voice recognition module is used for performing voice recognition by using the target audio information distributed by the controller to generate an analysis text and returning the analysis text to the controller, and the controller is used for calling voice library data according to the analysis text and pushing the voice library data to a mobile terminal of a user; the input module is used for feeding back an evaluation signal to the controller by a user, and the controller is used for receiving a solution signal and an unresolved signal and sending the unresolved signal to the distribution management module when receiving the unresolved signal; the distribution management module is used for receiving the unresolved signals and distributing corresponding background personnel for remote interaction to obtain an interaction value QW of the user; if the QW is larger than or equal to YT2, switching on background personnel of the guest line for remote interaction; if the YT1 is not more than QW and is less than YT2, the background personnel of the first-level special line are switched on for remote interaction; if QW is less than YT1, switching on background personnel of a common special line for remote interaction; background personnel can be reasonably distributed according to the interaction value of the user to carry out remote interaction, and the user experience is improved.
The formula and the proportionality coefficient are both obtained by collecting a large amount of data to perform software simulation and performing parameter setting processing by corresponding experts, and the formula and the proportionality coefficient which are consistent with real results are obtained.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims (6)

1. A voice interaction system based on artificial intelligence is characterized by comprising a registration login module, a controller, a database, a data acquisition module, a storage module, a voice recognition module, an audio analysis module, a voice library, an input module and a distribution management module;
the data acquisition module is in communication connection with a mobile terminal of a user; the data acquisition module is used for acquiring voiceprints and audio information of indoor personnel in real time and sending the acquired voiceprints and audio information to the controller, and the controller is used for auditing and filtering the received audio information and then transmitting corresponding target audio information to the storage module and the voice recognition module;
the audio analysis module is used for acquiring target audio information and judging the effectiveness of the target audio information before sending the target audio information to the voice recognition module; if the target audio information is valid, the target audio information is sent to a voice recognition module; if the audio information is invalid, the audio information is collected again;
the voice recognition module is used for performing voice recognition by using the target audio information distributed by the controller to generate an analysis text and returning the analysis text to the controller, and the controller is used for calling voice library data according to the analysis text and pushing the voice library data to a mobile terminal of a user;
the input module is in communication connection with a mobile terminal of a user; the input module is used for feeding back an evaluation signal to the controller by a user, wherein the evaluation signal comprises a solution signal and an unresolved signal;
the controller is used for receiving the solved signal and the unresolved signal and sending the unresolved signal to the distribution management module when the unresolved signal is received; and the distribution management module is used for receiving the unresolved signals and distributing corresponding background personnel for remote interaction.
2. The artificial intelligence based voice interaction system of claim 1, wherein the registration login module is configured to log in to become a registrant after a user enters personal information through a mobile terminal, and send the personal information to the controller, wherein the personal information includes a name, a gender, an age, a real-name authentication phone number and an identification number; the controller is used for sending the personal information of the registered personnel to the database for storage; the controller performs voice training by adopting an NLP algorithm and outputs a result corresponding to the analysis text; the NLP algorithm carries out voice training to generate a corresponding result, and the corresponding result is stored in a voice library; the data acquisition module is in communication connection with a database, the database is used for storing the voiceprint characteristics of each registered person, and the voiceprint characteristics are associated with the identity information of the registered person.
3. The artificial intelligence based voice interaction system of claim 1, wherein the method for the controller to perform audit filtering on the audio information comprises:
the method comprises the following steps: when the audio information of a plurality of persons is collected, acquiring the voiceprint characteristics of each piece of audio information through a voiceprint recognition technology, comparing the voiceprint characteristics with the voiceprint characteristics of registered persons stored in a database, finding out the same voiceprint, and marking the same voiceprint as a primary voiceprint;
step two: acquiring a mobile phone number of a mobile terminal, comparing the mobile phone number of the mobile terminal with a real-name authentication mobile phone number of a registered person stored in a database, and acquiring identity information and corresponding voiceprint characteristics of a user; marking the voiceprint characteristics corresponding to the user as standard voiceprints;
step three: comparing the initial selected voiceprint with the standard voiceprint, and marking the initial selected voiceprint consistent with the standard voiceprint as a target voiceprint; and marking the audio information corresponding to the target voiceprint as target audio information.
4. The artificial intelligence based voice interaction system of claim 1, wherein the audio analysis module comprises the following specific analysis steps:
SS 1: carrying out noise reduction enhancement processing on the target audio information;
SS 2: acquiring the acquisition time of each vowel in the target audio information and marking the acquisition time as Ti;i=1,…,n;
Using formula Ci=Ti+1-TiCalculating the time difference between two adjacent vowels and marking the time difference as a single interval Ci
SS 3: single interval CiComparing with an interval threshold; the interval thresholds include a first interval threshold G1, a second interval threshold G2; and G1 < G2;
if CiWhen G2 is not less than G, the single interval is marked as an influence interval; the interval threshold corresponding to the influence interval is a second interval threshold G2;
if CiWhen G1 is not more than G, marking the single interval as an influence interval; the interval threshold corresponding to the influence interval is a first interval threshold G1;
counting the occurrence times of the influence intervals, marking the occurrence times as D1, calculating the difference value between the influence intervals and the corresponding interval threshold value to obtain an offset value, and marking the offset value as D2;
SS 4: setting a plurality of offset coefficients and marking as Kc; c is 1, 2, …, w; and K1 < K2< … < Kw; each offset coefficient Kc corresponds to a preset offset value range which is respectively (k1, k 2) in sequence],(k2,k3],…,(kw,kw+1](ii) a And k1 < k2 < … < kw +1
When D2 is belonged to (kw, kw +1]If yes, the offset coefficient corresponding to the preset offset value range is Kw;
obtaining an influence value D3 corresponding to the offset value by using a formula D3-D2 Kw; summing the influence values corresponding to all the deviation values to obtain a total deviation influence value, and marking the total deviation influence value as D4;
SS 5: obtaining an interval influence value D5 by using a formula D5 ═ D1 × A1+ D4 × A2; wherein A1 and A2 are proportionality coefficients;
SS 6: if the interval influence value D5 is smaller than the corresponding interval influence threshold value, the target audio information is valid, otherwise, the target audio information is invalid;
SS 7: the intensity of each vowel in the target audio information is obtained and marked as QiObtaining a vowel intensity information group; calculating to obtain real-time Q according to a standard deviation calculation formulaiWhen the standard deviation alpha of the information group is smaller than a preset value, the information group is in a state to be verified;
SS 8: when Q isiWhen in the state to be verified, Q is setiObtaining Q according to the sequence from high to lowiIs marked as Qmax, and Q is obtainediIs marked as Qmin;
SS 9: setting the preset intensity of each vowel as QS, calculating QiThe difference between the target audio information and the preset intensity QS is obtained as an intensity difference QJi, if all QJi values are smaller than the preset intensity difference and the difference between Qmax and Qmin is smaller than the preset intensity difference, the target audio information is valid, otherwise the target audio information is invalid.
5. The artificial intelligence based voice interaction system of claim 1, wherein the specific allocation steps of the allocation management module are as follows:
p1: the user feeds back an unresolved signal through the mobile terminal, and the mobile terminal is marked as j; marking the moment when the user feeds back the unresolved signal as a signal feedback moment; marking the moment when the data acquisition module acquires the audio information as the interaction starting moment;
calculating the time difference between the interaction starting time and the signal feedback time to obtain waiting time length, and marking the waiting time length as R1;
p2: acquiring a voice interaction record of a mobile terminal j ten days before the current time of the system; the voice interaction record comprises interaction times, interaction starting time and interaction ending time;
accumulating the interaction times to form interaction frequency, and marking as L1;
calculating the time difference between the interaction starting time and the interaction ending time to obtain interaction duration, accumulating all the interaction durations to form total interaction duration, and marking the total interaction duration as L2;
acquiring the number of times of feeding back the unresolved signal by the mobile terminal j, and marking as L3;
p3: the method comprises the steps of obtaining the model of a mobile terminal j, setting a corresponding preset value for each model of the mobile terminal, matching the model of the mobile terminal j with all models to obtain the corresponding preset value of the mobile terminal j, and marking the preset value as L4;
p4: acquiring identity information of a user, comparing the identity information with consumption identity information stored in a big data platform, acquiring a consumption value, and marking the consumption value as X1;
p5: normalizing the waiting time, the interactive frequency, the interactive total time, the times of the unresolved signals, the corresponding preset values and the consumption values and taking the numerical values;
obtaining a user interaction value QW by using a formula QW 1 × a1+ L1 × a2+ L2 × a3+ L4 × a4+ X1 × a5-L3 × a6, wherein a1, a2, a3, a4, a5 and a6 are all proportional coefficients;
p6: comparing the interaction value QW of the user with interaction thresholds, the interaction thresholds comprising YT1, YT 2; and YT1 < YT 2; the method specifically comprises the following steps:
p61: if the QW is larger than or equal to YT2, the interaction level of the user is high, and background personnel of the VIP line are connected for remote interaction;
p62: if YT1 is not more than QW and is less than YT2, the interaction level of the user is moderate, and background personnel of a first-level special line are switched on to carry out remote interaction;
p63: and if QW is less than YT1, the interaction level of the user is general, and background personnel who switch on a common private line perform remote interaction.
6. The artificial intelligence based voice interaction system of claim 5, wherein the consumption value in step P4 is calculated by:
p41: acquiring identity information of a user, comparing the identity information with consumption identity information stored in a big data platform, and acquiring consumption records of the user;
p42: the yearly average GDP of the user is labeled GP 1; marking the user's monthly average income as GP2 and the user's monthly average consumption as GP 3; obtain the user's liquidity and label as LD 1;
p43: the consumption value X1 of the user is obtained by using the formula X1 ═ GP1 × b1+ GP2 × b2+ GP3 × b3+ LD1 × b4, wherein b1, b2, b3 and b4 are all proportional coefficients.
CN202011551759.0A 2020-12-24 2020-12-24 Voice interaction system based on artificial intelligence Active CN112700781B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011551759.0A CN112700781B (en) 2020-12-24 2020-12-24 Voice interaction system based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011551759.0A CN112700781B (en) 2020-12-24 2020-12-24 Voice interaction system based on artificial intelligence

Publications (2)

Publication Number Publication Date
CN112700781A true CN112700781A (en) 2021-04-23
CN112700781B CN112700781B (en) 2022-11-11

Family

ID=75509993

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011551759.0A Active CN112700781B (en) 2020-12-24 2020-12-24 Voice interaction system based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN112700781B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113409820A (en) * 2021-06-09 2021-09-17 合肥群音信息服务有限公司 Quality evaluation method based on voice data
CN113448514A (en) * 2021-06-02 2021-09-28 合肥群音信息服务有限公司 Automatic processing system of multisource voice data
CN114125494A (en) * 2021-09-29 2022-03-01 阿里巴巴(中国)有限公司 Content auditing auxiliary processing method and device and electronic equipment
CN114743541A (en) * 2022-04-24 2022-07-12 广东海洋大学 Interactive system for English listening and speaking learning
CN116002270A (en) * 2023-02-10 2023-04-25 德明尚品科技集团有限公司 Warehouse goods storage management method and system based on Internet of things
CN116913277A (en) * 2023-09-06 2023-10-20 北京惠朗时代科技有限公司 Voice interaction service system based on artificial intelligence

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018072697A (en) * 2016-11-02 2018-05-10 日本電信電話株式会社 Phoneme collapse detection model learning apparatus, phoneme collapse section detection apparatus, phoneme collapse detection model learning method, phoneme collapse section detection method, program
CN109036413A (en) * 2018-09-18 2018-12-18 深圳市优必选科技有限公司 Voice interactive method and terminal device
CN109040484A (en) * 2018-07-16 2018-12-18 安徽信尔联信息科技有限公司 A kind of Auto-matching contact staff method
US20180374499A1 (en) * 2017-06-21 2018-12-27 Ajit Arun Zadgaonkar System and method for determining cardiac parameters and physiological conditions by analysing speech samples
CN111061831A (en) * 2019-10-29 2020-04-24 深圳绿米联创科技有限公司 Method and device for switching machine customer service to manual customer service and electronic equipment
CN111081257A (en) * 2018-10-19 2020-04-28 珠海格力电器股份有限公司 Voice acquisition method, device, equipment and storage medium
CN111862913A (en) * 2020-07-16 2020-10-30 广州市百果园信息技术有限公司 Method, device, equipment and storage medium for converting voice into rap music
CN111933141A (en) * 2020-08-31 2020-11-13 江西台德智慧科技有限公司 Artificial intelligence voice interaction system based on big data
CN111988208A (en) * 2020-08-28 2020-11-24 广东台德智联科技有限公司 Household control system and method based on intelligent sound box

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018072697A (en) * 2016-11-02 2018-05-10 日本電信電話株式会社 Phoneme collapse detection model learning apparatus, phoneme collapse section detection apparatus, phoneme collapse detection model learning method, phoneme collapse section detection method, program
US20180374499A1 (en) * 2017-06-21 2018-12-27 Ajit Arun Zadgaonkar System and method for determining cardiac parameters and physiological conditions by analysing speech samples
CN109040484A (en) * 2018-07-16 2018-12-18 安徽信尔联信息科技有限公司 A kind of Auto-matching contact staff method
CN109036413A (en) * 2018-09-18 2018-12-18 深圳市优必选科技有限公司 Voice interactive method and terminal device
CN111081257A (en) * 2018-10-19 2020-04-28 珠海格力电器股份有限公司 Voice acquisition method, device, equipment and storage medium
CN111061831A (en) * 2019-10-29 2020-04-24 深圳绿米联创科技有限公司 Method and device for switching machine customer service to manual customer service and electronic equipment
CN111862913A (en) * 2020-07-16 2020-10-30 广州市百果园信息技术有限公司 Method, device, equipment and storage medium for converting voice into rap music
CN111988208A (en) * 2020-08-28 2020-11-24 广东台德智联科技有限公司 Household control system and method based on intelligent sound box
CN111933141A (en) * 2020-08-31 2020-11-13 江西台德智慧科技有限公司 Artificial intelligence voice interaction system based on big data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
T.SAKAI,等: "The Automatic Speech Recognition System for Conversational Sound", 《IEEE TRANSACTIONS ON ELECTRONIC COMPUTERS》, 31 December 1963 (1963-12-31), pages 835 - 846 *
张利平,等: "基于元音检测的汉语连续语音端点检测方法", 《计算机工程与应用》, 31 December 2010 (2010-12-31), pages 114 - 115 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113448514A (en) * 2021-06-02 2021-09-28 合肥群音信息服务有限公司 Automatic processing system of multisource voice data
CN113448514B (en) * 2021-06-02 2022-03-15 合肥群音信息服务有限公司 Automatic processing system of multisource voice data
CN113409820A (en) * 2021-06-09 2021-09-17 合肥群音信息服务有限公司 Quality evaluation method based on voice data
CN113409820B (en) * 2021-06-09 2022-03-15 合肥群音信息服务有限公司 Quality evaluation method based on voice data
CN114125494A (en) * 2021-09-29 2022-03-01 阿里巴巴(中国)有限公司 Content auditing auxiliary processing method and device and electronic equipment
CN114743541A (en) * 2022-04-24 2022-07-12 广东海洋大学 Interactive system for English listening and speaking learning
CN116002270A (en) * 2023-02-10 2023-04-25 德明尚品科技集团有限公司 Warehouse goods storage management method and system based on Internet of things
CN116913277A (en) * 2023-09-06 2023-10-20 北京惠朗时代科技有限公司 Voice interaction service system based on artificial intelligence
CN116913277B (en) * 2023-09-06 2023-11-21 北京惠朗时代科技有限公司 Voice interaction service system based on artificial intelligence

Also Published As

Publication number Publication date
CN112700781B (en) 2022-11-11

Similar Documents

Publication Publication Date Title
CN112700781B (en) Voice interaction system based on artificial intelligence
CN107818798B (en) Customer service quality evaluation method, device, equipment and storage medium
CN107291867B (en) Dialog processing method, device and equipment based on artificial intelligence and computer readable storage medium
CN109635098B (en) Intelligent question and answer method, device, equipment and medium
CN112346567A (en) Virtual interaction model generation method and device based on AI (Artificial Intelligence) and computer equipment
CN107895011A (en) Processing method, system, storage medium and the electronic equipment of session information
CN111651571A (en) Man-machine cooperation based session realization method, device, equipment and storage medium
CN106203050A (en) The exchange method of intelligent robot and device
CN107077279A (en) A kind of method and device of pressure detecting
CN110399473A (en) The method and apparatus for determining answer for customer problem
CN110414784A (en) Work attendance method, device, electronic equipment and storage medium
CN110910874A (en) Interactive classroom voice control method, terminal equipment, server and system
CN110704618A (en) Method and device for determining standard problem corresponding to dialogue data
CN111126629A (en) Model generation method, system, device and medium for identifying brushing behavior
WO2024104241A1 (en) Message-pushing method and apparatus based on implicit multi-target fusion of models
CN111709825A (en) Abnormal product identification method and system
CN116402166A (en) Training method and device of prediction model, electronic equipment and storage medium
CN111680514A (en) Information processing and model training method, device, equipment and storage medium
CN115602160A (en) Service handling method and device based on voice recognition and electronic equipment
CN116842143A (en) Dialog simulation method and device based on artificial intelligence, electronic equipment and medium
CN113360625B (en) Intelligent dialogue marketing customer acquisition method and system based on NLP
CN112364136B (en) Keyword generation method, device, equipment and storage medium
WO2021107386A1 (en) Definite value and estimated value-based data quantization method
CN113781247A (en) Protocol data recommendation method and device, computer equipment and storage medium
CN112905748A (en) Speech effect evaluation system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant