CN112700781B - Voice interaction system based on artificial intelligence - Google Patents

Voice interaction system based on artificial intelligence Download PDF

Info

Publication number
CN112700781B
CN112700781B CN202011551759.0A CN202011551759A CN112700781B CN 112700781 B CN112700781 B CN 112700781B CN 202011551759 A CN202011551759 A CN 202011551759A CN 112700781 B CN112700781 B CN 112700781B
Authority
CN
China
Prior art keywords
interaction
marking
audio information
user
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011551759.0A
Other languages
Chinese (zh)
Other versions
CN112700781A (en
Inventor
李本松
许兵兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi Taide Intelligence Technology Co Ltd
Original Assignee
Jiangxi Taide Intelligence Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi Taide Intelligence Technology Co Ltd filed Critical Jiangxi Taide Intelligence Technology Co Ltd
Priority to CN202011551759.0A priority Critical patent/CN112700781B/en
Publication of CN112700781A publication Critical patent/CN112700781A/en
Application granted granted Critical
Publication of CN112700781B publication Critical patent/CN112700781B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters

Landscapes

  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Game Theory and Decision Science (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a voice interaction system based on artificial intelligence, which relates to the technical field of artificial intelligence and comprises a registration login module, a controller, a database, a data acquisition module, a storage module, a voice recognition module, an audio analysis module, a voice library, an input module and a distribution management module; the controller is used for auditing and filtering the received audio information so as to find out a target voiceprint; the voice recognition method has the advantages that the voice information of a person to be recognized can be well recognized, the recognition accuracy is high, meanwhile, before the target voice information is sent to the voice recognition module, the effectiveness of the target voice information is judged by the voice analysis module in combination with the vowel interval and the vowel strength, the clarity and the accuracy of recognized voice can be effectively guaranteed, and the voice recognition speed is improved; the distribution management module is used for receiving the unsolved signal and distributing corresponding background personnel for remote interaction, and can reasonably distribute the background personnel for remote interaction according to the interaction value of the user, so that the user experience degree is improved.

Description

Voice interaction system based on artificial intelligence
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a voice interaction system based on artificial intelligence.
Background
Artificial intelligence (artificial intelligence), abbreviated in english as AI, is a new technical science of studying, developing theories, methods, techniques and application systems for simulating, extending and expanding human intelligence. Artificial intelligence has gained increasing attention in the computer field and has been widely used in robots, economic and political decisions, control systems, and simulation systems.
HCI is an abbreviation of Human-computer interaction, and means Human-computer interaction, which refers to a medium and a dialogue interface for transferring and exchanging information between a Human and a computer, and is an important component of a computer system. Human-computer interaction has been an important issue for optimizing the utilization of computers. In recent years, with the explosion of artificial intelligence, the development of man-machine interaction is rapidly advanced. The general trend of human-computer interaction is towards a user-centered, more intuitive interaction approach.
However, in the existing voice interaction system, when the voice recognition site is noisy or the number of people speaking at the same time is large, the voice of the person to be recognized cannot be recognized well, the recognition precision is low, and it cannot be guaranteed that the recognized voice is clear and accurate, so that when some problems occur in voice consultation of a user, the regulation and control response is slow, the use feeling is affected, and the voice recognition speed needs to be improved; and when the user does not answer satisfactorily, the problem that background personnel cannot be reasonably allocated to carry out remote interaction exists, and the user experience degree is influenced.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a voice interaction system based on artificial intelligence. The invention can better identify the audio information of the person to be identified when the voice identification site is noisy or the number of people speaking at the same time is large, the identification precision is high, and meanwhile, the validity of the target audio information is judged by combining the vowel interval and the vowel strength before the target audio information is sent to the voice identification module, so that the clarity and the accuracy of the identified voice can be effectively ensured, and the voice identification speed is improved.
The purpose of the invention can be realized by the following technical scheme:
a voice interaction system based on artificial intelligence comprises a registration login module, a controller, a database, a data acquisition module, a storage module, a voice recognition module, an audio analysis module, a voice library, an input module and a distribution management module;
the data acquisition module is in communication connection with a mobile terminal of a user; the data acquisition module is used for acquiring voiceprints and audio information of indoor personnel in real time and sending the acquired voiceprints and audio information to the controller, and the controller is used for auditing and filtering the received audio information and then transmitting corresponding target audio information to the storage module and the voice recognition module;
the audio analysis module is used for acquiring target audio information and judging the effectiveness of the target audio information before sending the target audio information to the voice recognition module; if the target audio information is valid, the target audio information is sent to a voice recognition module; if the audio information is invalid, the audio information is collected again;
the voice recognition module is used for performing voice recognition by using the target audio information distributed by the controller to generate an analysis text and returning the analysis text to the controller, and the controller is used for calling voice library data according to the analysis text and pushing the voice library data to a mobile terminal of a user;
the input module is in communication connection with a mobile terminal of a user; the input module is used for feeding back an evaluation signal to the controller by a user, wherein the evaluation signal comprises a solution signal and an unresolved signal;
the controller is used for receiving the solved signal and the unresolved signal and sending the unresolved signal to the distribution management module when the unresolved signal is received; and the distribution management module is used for receiving the unresolved signals and distributing corresponding background personnel for remote interaction.
Further, the registration login module is used for registering and logging in personal information to form a registered person after a user inputs the personal information through the mobile terminal, and sending the personal information to the controller, wherein the personal information comprises a name, a gender, an age, a real-name authentication mobile phone number and an identification number; the controller is used for sending the personal information of the registered personnel to the database for storage; the controller performs voice training by adopting an NLP algorithm and outputs a result corresponding to the analysis text; the NLP algorithm carries out voice training to generate a corresponding result, and the corresponding result is stored in a voice library; the data acquisition module is in communication connection with a database, the database is used for storing the voiceprint characteristics of each registered person, and the voiceprint characteristics are associated with the identity information of the registered person.
Further, the method for the controller to audit and filter the audio information comprises the following steps:
the method comprises the following steps: when the audio information of a plurality of persons is collected, acquiring the voiceprint characteristics of each piece of audio information through a voiceprint recognition technology, comparing the voiceprint characteristics with the voiceprint characteristics of registered persons stored in a database, finding out the same voiceprint, and marking the same voiceprint as a primary voiceprint;
step two: acquiring a mobile phone number of a mobile terminal, comparing the mobile phone number of the mobile terminal with a real-name authentication mobile phone number of a registered person stored in a database, and acquiring identity information and corresponding voiceprint characteristics of a user; marking the corresponding voiceprint characteristics of the user as standard voiceprints;
step three: comparing the initial selected voiceprint with the standard voiceprint, and marking the initial selected voiceprint consistent with the standard voiceprint as a target voiceprint; and marking the audio information corresponding to the target voiceprint as target audio information.
Further, the specific analysis steps of the audio analysis module are as follows:
and (4) SS1: carrying out noise reduction enhancement processing on the target audio information;
and (4) SS2: acquiring the acquisition time of each vowel in the target audio information and marking the acquisition time as T i ;i=1,…,n;
Using formula C i =T i+1 -T i Calculating the time difference between two adjacent vowels and marking the time difference as a single interval C i
And (4) SS3: single interval C i Comparing with an interval threshold; the interval threshold comprises a first interval threshold G1 and a second interval threshold G2; and G1 is less than G2;
if C i When G2 is more than or equal to G2, marking the single interval as an influence interval; at this time, the interval threshold corresponding to the influence interval is a second interval threshold G2;
if C i When G1 is less than or equal to G1, marking the single interval as an influence interval; at this time, the interval threshold corresponding to the influence interval is a first interval threshold G1;
counting the occurrence times of the influence intervals and marking the occurrence times as D1, and calculating the difference value between the influence intervals and the corresponding interval threshold value to obtain an offset value and marking the offset value as D2;
and (4) SS: setting a plurality of offset coefficients and marking the offset coefficients as Kc; c =1,2, …, w; k1 is more than K2 and less than … and less than Kw; each offset coefficient Kc corresponds to a preset offset value range and is sequentially (k 1, k 2)],(k2,k3],…,(kw,kw +1 ](ii) a And k1 is more than k2 and less than … and more than kw and less than kw +1;
when D2 is epsilon (kw, kw) +1 ]If yes, the offset coefficient corresponding to the preset offset value range is Kw;
obtaining an influence value D3 corresponding to the offset value by using a formula D3= D2 multiplied by Kw; summing the influence values corresponding to all the deviation values to obtain a total deviation influence value, and marking the total deviation influence value as D4;
and SS5: obtaining an interval influence value D5 by using a formula D5= D1 × A1+ D4 × A2; wherein A1 and A2 are both proportional coefficients;
and SS6: if the interval influence value D5 is smaller than the corresponding interval influence threshold value, the target audio information is valid, otherwise, the target audio information is invalid;
and (7) SS: obtaining the strength of each vowel in the target audio information and marking the strength as Q i Obtaining a vowel intensity information group; calculating to obtain real-time Q according to a standard deviation calculation formula i When the standard deviation alpha of the information group is smaller than a preset value, the information group is in a state to be verified;
and SS8: when Q is i When in the state to be verified, Q is set i Obtaining Q according to the sequence from high to low i Is marked as Qmax, Q is obtained i Is marked as Qmin;
and SS9: setting the preset intensity of each vowel as QS, calculating Q i And obtaining an intensity difference QJi by the difference value between the target audio information and the preset intensity QS, wherein if all QJi are smaller than the preset intensity difference value and the difference value between Qmax and Qmin is smaller than the preset intensity difference value, the target audio information is valid, otherwise, the target audio information is invalid.
Further, the specific allocation steps of the allocation management module are as follows:
p1: the user feeds back an unresolved signal through the mobile terminal, and the mobile terminal is marked as j; marking the moment when the user feeds back the unresolved signal as a signal feedback moment; marking the moment when the data acquisition module acquires the audio information as the interaction starting moment;
calculating the time difference between the interaction starting time and the signal feedback time to obtain waiting time length, and marking the waiting time length as R1;
p2: acquiring a voice interaction record of a mobile terminal j ten days before the current time of the system; the voice interaction record comprises interaction times, interaction starting time and interaction ending time;
accumulating the interaction times to form interaction frequency, and marking the interaction frequency as L1;
calculating the time difference between the interaction starting time and the interaction ending time to obtain interaction duration, accumulating all the interaction durations to form total interaction duration, and marking the total interaction duration as L2;
acquiring the frequency of the feedback of the unresolved signal of the mobile terminal j, and marking the frequency as L3;
p3: the method comprises the steps of obtaining the model of a mobile terminal j, setting each model of the mobile terminal to have a corresponding preset value, matching the model of the mobile terminal j with all models to obtain the corresponding preset value of the mobile terminal j, and marking the preset value as L4;
p4: acquiring identity information of a user, comparing the identity information with consumption identity information stored in a big data platform, acquiring a consumption value, and marking the consumption value as X1;
p5: normalizing the waiting time, the interactive frequency, the interactive total time, the times of the unresolved signals, the corresponding preset values and the consumption values and taking the numerical values;
obtaining an interaction value QW of a user by using a formula QW = R1 × a1+ L1 × a2+ L2 × a3+ L4 × a4+ X1 × a5-L3 × a6, wherein a1, a2, a3, a4, a5 and a6 are all proportionality coefficients;
p6: comparing the interaction value QW of the user with interaction thresholds, wherein the interaction thresholds comprise YT1 and YT2; YT1 is less than YT2; the method comprises the following specific steps:
p61: if the QW is larger than or equal to YT2, the interaction level of the user is high, and background personnel of the VIP special line are connected for remote interaction;
p62: if YT1 is not more than QW and is less than YT2, the interaction level of the user is moderate, and background personnel of a first-level special line are connected for remote interaction;
p63: if QW is less than YT1, the interaction level of the user is general, and background personnel of a common special line are connected for remote interaction.
Further, the calculation method of the consumption value in step P4 is:
p41: acquiring identity information of a user, comparing the identity information with consumption identity information stored in a big data platform, and acquiring consumption records of the user;
p42: the annual average GDP of the user is marked as GP1; marking the monthly average income of the user as GP2 and the monthly average consumption of the user as GP3;
acquiring the mobile fund of a user and marking as LD1; the mobile fund is a current deposit of the user;
p43: the consumption value X1 of the user is obtained by using the formula X1= GP1 × b1+ GP2 × b2+ GP3 × b3+ LD1 × b4, where b1, b2, b3, and b4 are all proportional coefficients.
The invention has the beneficial effects that:
1. when the audio information of a plurality of persons is collected, the controller is used for auditing and filtering the received audio information, acquiring the voiceprint characteristics of each piece of audio information through a voiceprint recognition technology, and comparing the voiceprint characteristics with the voiceprint characteristics of registered persons stored in the database so as to find out a target voiceprint; marking the audio information corresponding to the target voiceprint as target audio information; before the target audio information is sent to the voice recognition module, the audio analysis module is used for acquiring the target audio information and judging the effectiveness of the target audio information by combining the vowel interval and the vowel intensity of the target audio information; if the target audio information is valid, the target audio information is sent to a voice recognition module; if the audio information is invalid, acquiring the audio information again; the audio information of a person to be identified can be well identified, and the identification precision is high; meanwhile, the clarity and accuracy of the recognized voice are effectively ensured, and the voice recognition speed is improved;
2. the system comprises an input module, a distribution management module, a waiting time length, an interaction frequency, an interaction total time length, the times of unresolved signals, a corresponding preset value and a consumption value, wherein the input module is used for feeding back an evaluation signal to a controller by a user; if QW is more than or equal to YT2, the background personnel of the special line of the visitors are communicated for remote interaction; if YT1 is not more than QW and less than YT2, the background personnel of the first-level special line are communicated for remote interaction; if QW is less than YT1, switching on background personnel of a common special line to carry out remote interaction; background personnel can be reasonably distributed according to the interaction value of the user to carry out remote interaction, and the user experience is improved.
Drawings
To facilitate understanding for those skilled in the art, the present invention will be further described with reference to the accompanying drawings.
FIG. 1 is a block diagram of the system of the present invention;
FIG. 2 is a block diagram of a system according to embodiment 1 of the present invention;
fig. 3 is a system block diagram of embodiment 2 of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1-3, a voice interaction system based on artificial intelligence comprises a registration module, a controller, a database, a data acquisition module, a storage module, a voice recognition module, an audio analysis module, a voice library, an input module, and a distribution management module;
the registration login module is used for registering and logging in personal information after a user inputs the personal information through the mobile terminal to form a registered person, and sending the personal information to the controller, wherein the personal information comprises name, gender, age, real-name authentication mobile phone number and identification number; the controller is used for sending the personal information of the registered personnel to the database for storage;
example 1
As shown in fig. 2, the data acquisition module is in communication connection with the mobile terminal of the user;
the data acquisition module is used for acquiring voiceprints and audio information of indoor personnel in real time and sending the acquired voiceprints and audio information to the controller, and the controller is used for auditing and filtering the received audio information and then transmitting corresponding target audio information to the storage module and the voice recognition module;
the data acquisition module is in communication connection with a database, the database is used for storing the voiceprint characteristics of each registered person, and the voiceprint characteristics are associated with the identity information of the registered person;
the method for auditing and filtering the audio information by the controller comprises the following steps:
the method comprises the following steps: when the audio information of a plurality of persons is collected, acquiring the voiceprint characteristics of each piece of audio information through a voiceprint recognition technology, comparing the voiceprint characteristics with the voiceprint characteristics of registered persons stored in a database, finding out the same voiceprint, and marking the same voiceprint as a primary voiceprint;
step two: acquiring a mobile phone number of a mobile terminal, comparing the mobile phone number of the mobile terminal with a real-name authentication mobile phone number of a registered person stored in a database, and acquiring identity information and corresponding voiceprint characteristics of a user; marking the voiceprint characteristics corresponding to the user as standard voiceprints;
step three: comparing the initial selected voiceprint with the standard voiceprint, and marking the initial selected voiceprint consistent with the standard voiceprint as a target voiceprint; marking the audio information corresponding to the target voiceprint as target audio information;
the audio analysis module is used for acquiring target audio information and judging the effectiveness of the target audio information before sending the target audio information to the voice recognition module; if the target audio information is valid, sending the target audio information to a voice recognition module; if the audio information is invalid, the audio information is collected again; the specific analysis steps are as follows:
and (4) SS1: carrying out noise reduction enhancement processing on the target audio information;
and SS2: acquiring the acquisition time of each vowel in the target audio information and marking the acquisition time as T i ;i=1,…,n;
Using the formula C i =T i+1 -T i Calculating the time difference between two adjacent vowels and marking the time difference as a single interval C i
And (4) SS3: single interval C i Comparing with an interval threshold; the interval threshold comprises a first interval threshold G1 and a second interval threshold G2; and G1 is less than G2;
if C i If G2 is more than or equal to the preset threshold, marking the single interval as an influence interval; at this time, the interval threshold corresponding to the influence interval is a second interval threshold G2;
if C i When the value is less than or equal to G1, marking the single interval as an influence interval; at this time, the interval threshold corresponding to the influence interval is a first interval threshold G1;
counting the occurrence frequency of the influence intervals and marking the frequency as D1, and calculating the difference value of the influence intervals and the corresponding interval threshold value to obtain an offset value and marking the offset value as D2;
and (4) SS: setting a plurality of offset coefficients and marking as Kc; c =1,2, …, w; k1 is more than K2 and less than … and less than Kw; each offset coefficient Kc corresponds to a preset offset value range and is sequentially (k 1, k 2)],(k2,k3],…,(kw,kw +1 ](ii) a And k1 < k2 < … < kw +1
When D2 is equal to (kw, kw) +1 ]If yes, the offset coefficient corresponding to the preset offset value range is Kw;
obtaining an influence value D3 corresponding to the offset value by using a formula D3= D2 multiplied by Kw; summing the influence values corresponding to all the deviation values to obtain a total deviation influence value, and marking the total deviation influence value as D4;
and SS5: obtaining an interval influence value D5 by using a formula D5= D1 × A1+ D4 × A2; wherein A1 and A2 are both proportionality coefficients; for example, A1 takes a value of 0.44 and A2 takes a value of 0.67;
and SS6: if the interval influence value D5 is less than the corresponding interval influence threshold value, the target audio information is valid, otherwise, the target audio information is invalid;
and (7) SS: the intensity of each vowel in the target audio information is obtained and marked as Q i Obtaining a vowel intensity information group; calculating to obtain real-time Q according to a standard deviation calculation formula i Standard deviation alpha of information group, when alpha is less than preWhen the value is set, the state is to be verified;
and SS8: when Q is i When in the state to be verified, Q is set i Obtaining Q according to the sequence from high to low i Is marked as Qmax, and Q is obtained i Is marked as Qmin;
and SS9: setting the preset intensity of each vowel as QS, calculating Q i Obtaining an intensity difference QJi by the difference value between the current intensity QS and the preset intensity QS, wherein if all QJi are smaller than the preset intensity difference value and the difference value between Qmax and Qmin is smaller than the preset intensity difference value, the target audio information is valid, otherwise, the target audio information is invalid;
in this embodiment 1, when a speech recognition site is noisy or there are many people speaking at the same time, the audio information of a person to be recognized can be well recognized, the recognition accuracy is high, and meanwhile, before the target audio information is sent to the speech recognition module, the validity of the target audio information is judged by combining the vowel interval and the vowel intensity, so that the clarity and accuracy of the recognized speech can be effectively ensured, and the speech recognition speed is increased;
the voice recognition module is used for performing voice recognition by using the target audio information distributed by the controller to generate an analysis text and returning the analysis text to the controller, and the controller is used for calling voice library data according to the analysis text and pushing the voice library data to a mobile terminal of a user;
the controller performs voice training by adopting an NLP algorithm and outputs a result corresponding to the analysis text; the NLP algorithm carries out voice training to generate a corresponding result, and the corresponding result is stored in a voice library;
example 2
As shown in fig. 3; the input module is in communication connection with a mobile terminal of a user; the input module is used for feeding back an evaluation signal to the controller by a user, the evaluation signal is used for evaluating the voice database data pushed to the mobile terminal by the user, and the evaluation signal comprises a solution signal and an unsolved signal;
the controller is used for receiving the solved signal and the unresolved signal and sending the unresolved signal to the distribution management module when the unresolved signal is received;
the distribution management module is used for receiving the unresolved signals and distributing corresponding background personnel for remote interaction, and the specific distribution steps are as follows:
p1: the user feeds back an unresolved signal through the mobile terminal, and the mobile terminal is marked as j; marking the moment when the user feeds back the unresolved signal as the signal feedback moment; marking the moment when the data acquisition module acquires the audio information as the interaction starting moment;
calculating the time difference between the interaction starting time and the signal feedback time to obtain waiting time length, and marking the waiting time length as R1;
p2: acquiring a voice interaction record of a mobile terminal j ten days before the current time of the system; the voice interaction record comprises interaction times, interaction starting time and interaction ending time;
accumulating the interaction times to form interaction frequency, and marking the interaction frequency as L1;
calculating the time difference between the interaction starting time and the interaction ending time to obtain interaction duration, accumulating all the interaction durations to form interaction total duration, and marking the interaction total duration as L2;
acquiring the frequency of the feedback of the unresolved signal of the mobile terminal j, and marking the frequency as L3;
p3: the method comprises the steps of obtaining the model of a mobile terminal j, setting a corresponding preset value for each model of the mobile terminal, matching the model of the mobile terminal j with all models to obtain the corresponding preset value of the mobile terminal j, and marking the preset value as L4;
p4: acquiring identity information of a user, comparing the identity information with consumption identity information stored in a big data platform, acquiring a consumption value, and marking the consumption value as X1;
p5: normalizing the waiting time, the interactive frequency, the interactive total time, the times of the unresolved signals, the corresponding preset values and the consumption values and taking the numerical values;
obtaining an interaction value QW of a user by using a formula QW = R1 × a1+ L1 × a2+ L2 × a3+ L4 × a4+ X1 × a5-L3 × a6, wherein a1, a2, a3, a4, a5 and a6 are proportionality coefficients, for example, a1 takes 0.35, a2 takes 0.42, a3 takes 0.51, a4 takes 0.19, a5 takes 0.48 and a6 takes 0.87;
p6: comparing the interaction value QW of the user with interaction thresholds, wherein the interaction thresholds comprise YT1 and YT2; YT1 is less than YT2; the method specifically comprises the following steps:
p61: if the QW is larger than or equal to YT2, the interaction level of the user is high, and background personnel of the VIP special line are connected for remote interaction;
p62: if YT1 is not more than QW and is less than YT2, the interaction level of the user is moderate, and background personnel of a first-level special line are connected for remote interaction;
p63: if QW is less than YT1, the interaction level of the user is general, and background personnel of a common special line are connected for remote interaction;
the calculation method of the consumption value in the step P4 is as follows:
p41: acquiring identity information of a user, comparing the identity information with consumption identity information stored in a big data platform, and acquiring consumption records of the user;
p42: the annual average GDP of the user is marked as GP1; marking the monthly average income of the user as GP2 and the monthly average consumption of the user as GP3;
acquiring the mobile fund of a user and marking as LD1; the mobile fund is the current deposit of the user;
p43: obtaining a consumption value X1 of the user by using a formula X1= GP1 × b1+ GP2 × b2+ GP3 × b3+ LD1 × b4, wherein b1, b2, b3, and b4 are proportional coefficients, for example, b1 takes a value of 0.44, b2 takes a value of 0.61, b3 takes a value of 0.38, and b4 takes a value of 0.74;
in this embodiment 2, whether the user problem is solved can be judged according to the evaluation signal fed back by the user, and when the user problem is not solved, backstage staff can be reasonably allocated according to the interaction value of the user to perform remote interaction, so that the user experience is improved.
The working principle of the invention is as follows:
a voice interaction system based on artificial intelligence is characterized in that when the voice interaction system works, a data acquisition module is used for acquiring voiceprints and audio information of indoor personnel in real time; the controller is used for auditing and filtering the received audio information; when the audio information of a plurality of persons is collected, acquiring the voiceprint characteristics of each piece of audio information through a voiceprint recognition technology, comparing the voiceprint characteristics with the voiceprint characteristics of registered persons stored in a database, finding out the same voiceprint, and marking the same voiceprint as a primary voiceprint; acquiring a mobile phone number of a mobile terminal, comparing the mobile phone number of the mobile terminal with a real-name authentication mobile phone number of a registered person stored in a database, and acquiring identity information and corresponding voiceprint characteristics of a user; marking the corresponding voiceprint characteristics of the user as standard voiceprints; comparing the initial selected voiceprint with the standard voiceprint, and marking the initial selected voiceprint consistent with the standard voiceprint as a target voiceprint; marking the audio information corresponding to the target voiceprint as target audio information; before the target audio information is sent to the voice recognition module, the audio analysis module is used for acquiring the target audio information and judging the effectiveness of the target audio information by combining the vowel interval and the vowel intensity of the target audio information; if the target audio information is valid, the target audio information is sent to a voice recognition module; if the audio information is invalid, the audio information is collected again; the method can effectively ensure the clearness and accuracy of the recognized voice and improve the voice recognition speed;
the voice recognition module is used for performing voice recognition by utilizing the target audio information distributed by the controller to generate an analysis text and returning the analysis text to the controller, and the controller is used for calling voice library data according to the analysis text and pushing the voice library data to a mobile terminal of a user; the input module is used for feeding back an evaluation signal to the controller by a user, and the controller is used for receiving a solution signal and an unresolved signal and sending the unresolved signal to the distribution management module when receiving the unresolved signal; the distribution management module is used for receiving the unresolved signals and distributing corresponding background personnel for remote interaction to obtain an interaction value QW of the user; if the QW is larger than or equal to YT2, switching on background personnel of the guest private line for remote interaction; if YT1 is not more than QW and less than YT2, the background personnel of the first-level special line are communicated for remote interaction; if QW is less than YT1, switching on background personnel of a common special line to carry out remote interaction; background personnel can be reasonably distributed according to the interaction value of the user to carry out remote interaction, and the user experience is improved.
The formula and the proportionality coefficient are both obtained by collecting a large amount of data to perform software simulation and performing parameter setting processing by corresponding experts, and the formula and the proportionality coefficient which are consistent with real results are obtained.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims (5)

1. A voice interaction system based on artificial intelligence is characterized by comprising a registration login module, a controller, a database, a data acquisition module, a storage module, a voice recognition module, an audio analysis module, a voice library, an input module and a distribution management module;
the data acquisition module is in communication connection with a mobile terminal of a user; the data acquisition module is used for acquiring voiceprints and audio information of indoor personnel in real time and sending the acquired voiceprints and audio information to the controller, and the controller is used for auditing and filtering the received audio information and then transmitting corresponding target audio information to the storage module and the voice recognition module;
the audio analysis module is used for acquiring target audio information and judging the effectiveness of the target audio information before sending the target audio information to the voice recognition module; if the target audio information is valid, sending the target audio information to a voice recognition module; if the audio information is invalid, the audio information is collected again; the specific analysis steps are as follows:
and (4) SS1: carrying out noise reduction enhancement processing on the target audio information;
and (4) SS2: acquiring the acquisition time of each vowel in the target audio information and marking as Ti; i =1, …, n;
calculating the time difference of two adjacent vowels by using a formula Ci = Ti +1-Ti and marking the time difference as a single interval Ci;
and (4) SS3: comparing the single interval Ci with an interval threshold; the interval threshold comprises a first interval threshold G1 and a second interval threshold G2; and G1 is less than G2;
if Ci is larger than or equal to G2, marking the single interval as an influence interval; at this time, the interval threshold corresponding to the influence interval is a second interval threshold G2;
if Ci is less than or equal to G1, marking the single interval as an influence interval; at this time, the interval threshold corresponding to the influence interval is a first interval threshold G1;
counting the occurrence frequency of the influence intervals and marking the frequency as D1, and calculating the difference value of the influence intervals and the corresponding interval threshold value to obtain an offset value and marking the offset value as D2;
and SS4: setting a plurality of offset coefficients and marking as Kc; c =1,2, …, w; k1 is more than K2 and less than … and less than Kw; each offset coefficient Kc corresponds to a preset offset value range and is respectively (k 1, k 2), (k 2, k 3) …, (kw, kw + 1) in sequence, wherein k1 is more than k2 and less than … and less than kw +1;
when D2 belongs to (Kw, kw + 1), the offset coefficient corresponding to the preset offset value range is Kw;
obtaining an influence value D3 corresponding to the offset value by using a formula D3= D2 multiplied by Kw; summing the influence values corresponding to all the deviation values to obtain a total deviation influence value, and marking the total deviation influence value as D4;
and SS5: obtaining an interval influence value D5 by using a formula D5= D1 × A1+ D4 × A2; wherein A1 and A2 are both proportionality coefficients;
and SS6: if the interval influence value D5 is less than the corresponding interval influence threshold value, the target audio information is valid, otherwise, the target audio information is invalid;
and (7) SS: acquiring the intensity of each vowel in the target audio information, and marking the intensity as Qi to obtain a vowel intensity information group; calculating to obtain a standard deviation alpha of the real-time Qi information group according to a standard deviation calculation formula, and when the alpha is smaller than a preset value, keeping the standard deviation alpha in a state to be verified;
and SS8: when Qi is in a state to be verified, sequencing Qi from high to low, acquiring the maximum value of Qi and marking as Qmax, and acquiring the minimum value of Qi and marking as Qmin;
and SS9: setting the preset intensity of each vowel as QS, calculating the difference value between Qi and the preset intensity QS to obtain an intensity difference QJi, if all QJi are smaller than the preset intensity difference value and the difference value between Qmax and Qmin is smaller than the preset intensity difference value, the target audio information is valid, otherwise, the target audio information is invalid;
the voice recognition module is used for performing voice recognition by using the target audio information distributed by the controller to generate an analysis text and returning the analysis text to the controller, and the controller is used for calling voice library data according to the analysis text and pushing the voice library data to a mobile terminal of a user;
the input module is in communication connection with a mobile terminal of a user; the input module is used for feeding back an evaluation signal to the controller by a user, wherein the evaluation signal comprises a solution signal and an unresolved signal;
the controller is used for receiving the solved signal and the unresolved signal and sending the unresolved signal to the distribution management module when the unresolved signal is received; and the distribution management module is used for receiving the unresolved signals and distributing corresponding background personnel for remote interaction.
2. The artificial intelligence based voice interaction system of claim 1, wherein the registration login module is configured to log in to become a registrant after a user enters personal information through a mobile terminal, and send the personal information to the controller, wherein the personal information includes a name, a gender, an age, a real-name authentication phone number and an identification number; the controller is used for sending the personal information of the registered personnel to the database for storage; the controller performs voice training by adopting an NLP algorithm and outputs a result corresponding to the analysis text; the NLP algorithm carries out voice training to generate a corresponding result, and the corresponding result is stored in a voice library; the data acquisition module is in communication connection with a database, the database is used for storing the voiceprint characteristics of each registered person, and the voiceprint characteristics are associated with the identity information of the registered person.
3. The artificial intelligence based voice interaction system of claim 1, wherein the controller performs the auditing and filtering of the audio information by:
the method comprises the following steps: when the audio information of a plurality of persons is acquired, acquiring the voiceprint characteristics of each piece of audio information through a voiceprint recognition technology, comparing the voiceprint characteristics with the voiceprint characteristics of registered persons stored in a database, finding out the same voiceprint, and marking the same voiceprint as a primary selected voiceprint;
step two: acquiring a mobile phone number of a mobile terminal, comparing the mobile phone number of the mobile terminal with a real-name authentication mobile phone number of a registered person stored in a database, and acquiring identity information and corresponding voiceprint characteristics of a user; marking the voiceprint characteristics corresponding to the user as standard voiceprints;
step three: comparing the initial selected voiceprint with the standard voiceprint, and marking the initial selected voiceprint consistent with the standard voiceprint as a target voiceprint; and marking the audio information corresponding to the target voiceprint as target audio information.
4. The artificial intelligence based voice interaction system of claim 1, wherein the specific allocation steps of the allocation management module are as follows:
p1: the user feeds back an unresolved signal through the mobile terminal, and the mobile terminal is marked as j; marking the moment when the user feeds back the unresolved signal as a signal feedback moment; marking the moment when the data acquisition module acquires the audio information as the interaction starting moment;
calculating the time difference between the interaction starting time and the signal feedback time to obtain waiting time length, and marking the waiting time length as R1;
p2: acquiring a voice interaction record of a mobile terminal j ten days before the current time of the system; the voice interaction record comprises interaction times, interaction starting time and interaction ending time;
accumulating the interaction times to form interaction frequency, and marking the interaction frequency as L1;
calculating the time difference between the interaction starting time and the interaction ending time to obtain interaction duration, accumulating all the interaction durations to form interaction total duration, and marking the interaction total duration as L2;
acquiring the frequency of the feedback of the unresolved signal of the mobile terminal j, and marking the frequency as L3;
p3: the method comprises the steps of obtaining the model of a mobile terminal j, setting each model of the mobile terminal to have a corresponding preset value, matching the model of the mobile terminal j with all models to obtain the corresponding preset value of the mobile terminal j, and marking the preset value as L4;
p4: acquiring identity information of a user, comparing the identity information with consumption identity information stored in a big data platform, acquiring a consumption value, and marking the consumption value as X1;
p5: normalizing the waiting time, the interactive frequency, the interactive total time, the times of the unresolved signals, the corresponding preset values and the consumption values and taking the numerical values;
obtaining an interaction value QW of a user by using a formula QW = R1 × a1+ L1 × a2+ L2 × a3+ L4 × a4+ X1 × a5-L3 × a6, wherein a1, a2, a3, a4, a5 and a6 are all proportionality coefficients;
p6: comparing the user's interaction value QW with interaction thresholds, said interaction thresholds comprising YT1, YT2; YT1 is less than YT2; the method specifically comprises the following steps:
p61: if QW is larger than or equal to YT2, the interaction level of the user is high, and background personnel of the special line of the VIP are connected for remote interaction;
p62: if YT1 is not more than QW and less than YT2, the interaction level of the user is moderate, and background personnel of a first-level special line are connected for remote interaction;
p63: if QW is less than YT1, the interaction level of the user is general, and background personnel of a common special line are connected for remote interaction.
5. The artificial intelligence based voice interaction system of claim 4, wherein the consumption value in step P4 is calculated by:
p41: acquiring identity information of a user, comparing the identity information with consumption identity information stored in a big data platform, and acquiring consumption records of the user;
p42: the annual average GDP of the user is marked as GP1; marking the monthly average income of the user as GP2 and the monthly average consumption of the user as GP3; acquiring the mobile fund of a user and marking as LD1;
p43: the consumption value X1 of the user is obtained by using the formula X1= GP1 × b1+ GP2 × b2+ GP3 × b3+ LD1 × b4, where b1, b2, b3, and b4 are all proportional coefficients.
CN202011551759.0A 2020-12-24 2020-12-24 Voice interaction system based on artificial intelligence Active CN112700781B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011551759.0A CN112700781B (en) 2020-12-24 2020-12-24 Voice interaction system based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011551759.0A CN112700781B (en) 2020-12-24 2020-12-24 Voice interaction system based on artificial intelligence

Publications (2)

Publication Number Publication Date
CN112700781A CN112700781A (en) 2021-04-23
CN112700781B true CN112700781B (en) 2022-11-11

Family

ID=75509993

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011551759.0A Active CN112700781B (en) 2020-12-24 2020-12-24 Voice interaction system based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN112700781B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113448514B (en) * 2021-06-02 2022-03-15 合肥群音信息服务有限公司 Automatic processing system of multisource voice data
CN113409820B (en) * 2021-06-09 2022-03-15 合肥群音信息服务有限公司 Quality evaluation method based on voice data
CN114125494B (en) * 2021-09-29 2024-06-18 阿里巴巴(中国)有限公司 Content auditing auxiliary processing method and device and electronic equipment
CN114743541B (en) * 2022-04-24 2023-03-17 广东海洋大学 Interactive system for English listening and speaking learning
CN116002270B (en) * 2023-02-10 2023-08-04 梅煜轩 Warehouse goods storage management method and system based on Internet of things
CN116913277B (en) * 2023-09-06 2023-11-21 北京惠朗时代科技有限公司 Voice interaction service system based on artificial intelligence

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018072697A (en) * 2016-11-02 2018-05-10 日本電信電話株式会社 Phoneme collapse detection model learning apparatus, phoneme collapse section detection apparatus, phoneme collapse detection model learning method, phoneme collapse section detection method, program
CN109036413A (en) * 2018-09-18 2018-12-18 深圳市优必选科技有限公司 Voice interactive method and terminal device
CN109040484A (en) * 2018-07-16 2018-12-18 安徽信尔联信息科技有限公司 A kind of Auto-matching contact staff method
CN111061831A (en) * 2019-10-29 2020-04-24 深圳绿米联创科技有限公司 Method and device for switching machine customer service to manual customer service and electronic equipment
CN111081257A (en) * 2018-10-19 2020-04-28 珠海格力电器股份有限公司 Voice acquisition method, device, equipment and storage medium
CN111862913A (en) * 2020-07-16 2020-10-30 广州市百果园信息技术有限公司 Method, device, equipment and storage medium for converting voice into rap music
CN111933141A (en) * 2020-08-31 2020-11-13 江西台德智慧科技有限公司 Artificial intelligence voice interaction system based on big data
CN111988208A (en) * 2020-08-28 2020-11-24 广东台德智联科技有限公司 Household control system and method based on intelligent sound box

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10643639B2 (en) * 2017-06-21 2020-05-05 Ajit Arun Zadgaonkar System and method for determining cardiac parameters and physiological conditions by analysing speech samples

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018072697A (en) * 2016-11-02 2018-05-10 日本電信電話株式会社 Phoneme collapse detection model learning apparatus, phoneme collapse section detection apparatus, phoneme collapse detection model learning method, phoneme collapse section detection method, program
CN109040484A (en) * 2018-07-16 2018-12-18 安徽信尔联信息科技有限公司 A kind of Auto-matching contact staff method
CN109036413A (en) * 2018-09-18 2018-12-18 深圳市优必选科技有限公司 Voice interactive method and terminal device
CN111081257A (en) * 2018-10-19 2020-04-28 珠海格力电器股份有限公司 Voice acquisition method, device, equipment and storage medium
CN111061831A (en) * 2019-10-29 2020-04-24 深圳绿米联创科技有限公司 Method and device for switching machine customer service to manual customer service and electronic equipment
CN111862913A (en) * 2020-07-16 2020-10-30 广州市百果园信息技术有限公司 Method, device, equipment and storage medium for converting voice into rap music
CN111988208A (en) * 2020-08-28 2020-11-24 广东台德智联科技有限公司 Household control system and method based on intelligent sound box
CN111933141A (en) * 2020-08-31 2020-11-13 江西台德智慧科技有限公司 Artificial intelligence voice interaction system based on big data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
T.SAKAI,等.The Automatic Speech Recognition System for Conversational Sound.《IEEE TRANSACTIONS ON ELECTRONIC COMPUTERS》.1963,第835-846页. *
张利平,等.基于元音检测的汉语连续语音端点检测方法.《计算机工程与应用》.2010,第114-116页. *

Also Published As

Publication number Publication date
CN112700781A (en) 2021-04-23

Similar Documents

Publication Publication Date Title
CN112700781B (en) Voice interaction system based on artificial intelligence
CN111145732B (en) Processing method and system after multi-task voice recognition
CN107895011A (en) Processing method, system, storage medium and the electronic equipment of session information
CN111062444B (en) Credit risk prediction method, credit risk prediction system, credit risk prediction terminal and storage medium
CN105046303A (en) Distributed data interaction based biological identification method and system
CN111667843B (en) Voice wake-up method and system for terminal equipment, electronic equipment and storage medium
CN101291239A (en) Method and apparatus for enhancing effect of meeting
CN107077279A (en) A kind of method and device of pressure detecting
CN106203050A (en) The exchange method of intelligent robot and device
CN110414784A (en) Work attendance method, device, electronic equipment and storage medium
CN112542169A (en) Voice recognition processing method and device
CN111653274A (en) Method, device and storage medium for awakening word recognition
CN111126629A (en) Model generation method, system, device and medium for identifying brushing behavior
CN111178816B (en) Dormitory monitoring management method and device, electronic equipment and storage medium
CN114791771A (en) Interaction management system and method for intelligent voice mouse
CN113220828B (en) Method, device, computer equipment and storage medium for processing intention recognition model
CN109754224A (en) Organizational affiliation map construction method, apparatus and computer storage medium
CN1213398C (en) Method and system for non-intrusive speaker verification using behavior model
CN111709825A (en) Abnormal product identification method and system
CN111443615A (en) Electric equipment control system, method and equipment
CN111680514A (en) Information processing and model training method, device, equipment and storage medium
CN112364136B (en) Keyword generation method, device, equipment and storage medium
CN115954019A (en) Environmental noise identification method and system integrating self-attention and convolution operation
CN115602160A (en) Service handling method and device based on voice recognition and electronic equipment
CN109785155A (en) Method and Related product based on medical insurance reimbursement model adjustment medical insurance strategy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant