CN111488433A

CN111488433A - Artificial intelligence interactive system suitable for bank and capable of improving field experience

Info

Publication number: CN111488433A
Application number: CN202010594489.5A
Authority: CN
Inventors: 丁晓平; 姜欣荣; 吴迪
Original assignee: Bank Of Nanjing Co ltd
Current assignee: Bank Of Nanjing Co ltd
Priority date: 2020-06-28
Filing date: 2020-06-28
Publication date: 2020-08-04
Anticipated expiration: 2040-06-28
Also published as: CN111488433B

Abstract

The invention discloses an artificial intelligence interaction system suitable for banks and used for improving on-site experience sense, which is applied to a bank business handling terminal and comprises a client side, a server side and a server side, wherein the client side is used for performing face-to-face interaction with a user through digital staff, receiving voice input information of the user and sending the voice input information to the server side, responding and displaying response content fed back by the server side, and the voice input information of the user comprises business handling requirement information; and the server is used for responding to the voice input information of the user to respond, and is also used for configuring the information of the client and the server. The invention can fully understand the intention of the user in the continuous interaction process through natural language understanding, context understanding and slot lifting technology, performs multi-round interaction with the user, and is displayed to the client for viewing by digital staff through multimedia resources, thereby being more anthropomorphic and intelligent and improving the experience of the client. In addition, intelligent interruption can be realized, the requirements of users can be responded in time, human-computer interaction is more efficient, quicker and more natural, and the service handling efficiency is improved.

Description

Artificial intelligence interactive system suitable for bank and capable of improving field experience

Technical Field

The invention belongs to the field of financial AI (Artificial intelligence) and particularly relates to an artificial intelligence interaction system suitable for banks and capable of improving on-site experience.

Background

With the vigorous development of artificial intelligence technology, the landing scenes of artificial intelligence become richer. The increasing number of various intelligent terminals enables the combination of artificial intelligence and bank artificial service to be raised to a new starting point. The digitization and the mobility of the traditional banking business are rapidly developed, a wider development space is brought to the manual service aspect of the traditional banking industry and other financial industries, and the diversified business handling modes can enable users to obtain more service experiences, improve the handling efficiency of the business and save the handling time of customers.

At present, in the financial field such as traditional banking industry, text prompts, manual explanation and the like are still largely adopted to serve customers, wherein the text prompts comprise short messages, mails and the like, the manual explanation comprises manual voice, on-site explanation and the like, and on-line customer service and hall managers are common, and the customer is explained with financial service knowledge, financial products, customer questions, customer problems and the like in the working time. At present, banks have popularized some interactive systems to overcome the problems, but the interactive systems only stay at the level of an intelligent customer service system, and can realize basic question and answer, wherein the question and answer system mainly comprises three directions of task oriented, knowledge oriented and chat oriented.

The existing intelligent customer service system has the following problems: 1. the intention of the customer cannot be accurately identified: the existing intelligent customer service system can only realize the handling of basic services such as transferring accounts, checking balance, checking points and the like by jumping to a primary page, most of the intelligent customer service system searches the voice and characters of a client, and when a matched keyword such as 'transferring accounts' is searched, the intelligent customer service system directly jumps to the primary page of transferring accounts, and only reduces the operation that the client clicks a transfer button to jump to the page by himself; 2. the service scene is split and cannot form closed-loop management of the whole process: although the intelligent voice customer service system can realize the functions of chatting and simple service handling through interaction, the services are independent from each other, and the intelligent voice customer service system does not have the function of returning to the intelligent voice customer service system with parameters after jumping to a primary page for handling the services, so that a new round of interaction is required to be initiated again for handling new services for customers; 3. the selection of roles cannot be realized, and the scene sense is weak: the existing intelligent customer service system does not have a real image as a support, but provides service for customers by using the name of a robot, does not have role substitution feeling and field experience feeling, and reduces the trust degree of the customers to a certain extent; 4. the security is low and can not be verified when the system is started:

the intelligent customer service system does not have the capability of identity recognition when voice interaction operation is executed, and the safety needs to be improved.

Disclosure of Invention

The invention aims to provide an artificial intelligence interaction system which is suitable for banks and can improve the on-site experience for solving the problems in the prior art.

The technical solution for realizing the purpose of the invention is as follows: an artificial intelligence interactive system suitable for banks and capable of improving on-site experience sense is applied to a banking business handling terminal and comprises a client and a server;

the client is used for performing face-to-face interaction with a user through digital staff, receiving voice input information of the user, sending the voice input information to the server, responding to and displaying response content fed back by the server; the user voice input information comprises service handling requirement information;

and the server is used for responding to the voice input information of the user to respond, and is also used for configuring the information of the client and the server.

Further, the server includes:

the system comprises a response library construction subsystem, a service transaction response subsystem and a service management subsystem, wherein the response library construction subsystem is used for constructing a knowledge response library interacted between a client and a user, and the knowledge response library comprises a service transaction response library;

the intelligent communication subsystem is used for extracting and filtering noise information in the user voice input information and sending the filtered user voice input information to the ASR subsystem;

the ASR subsystem is used for analyzing the user voice input information into characters and sending the characters to the information analysis subsystem;

the information analysis subsystem is used for analyzing the characters, identifying user requirements including service handling requirements and sending the required information to the response subsystem;

the response subsystem is used for responding the demand information, extracting response content from the response library and pushing the response content to the client; the response content comprises digital employee response voice, characters, driving information and a service interaction window; the digital employee driving information comprises an action, an expression and a mouth shape;

and the data information configuration subsystem is used for managing and configuring the data information of the client and the server.

Further, the answer base construction subsystem includes:

the first response element sub-library generating module is used for generating a handling process of each service, and comprises a multi-stage service interaction process;

the second response element sub-library generating module is used for converting each level of service interaction process into corresponding interactive voice and characters and generating multi-level interactive voice and characters;

the third response element sub-library generating module is used for generating a service interaction window corresponding to each level of service interaction process;

the fourth response element sub-library generating module is used for generating digital employee driving information corresponding to each level of interactive voice and characters;

and the mapping relation establishing module is used for establishing the mapping relation between each level of service interaction process and the corresponding interaction voice and character, the mapping relation between each level of service interaction process and the corresponding service interaction window, and the mapping relation between each level of interaction voice and character and the corresponding digital employee driving information.

Further, the reply response subsystem comprises:

the first response module is used for driving the first response element sub-library to respond to the demand information, triggering a certain level of service interaction process corresponding to the demand information and generating a first trigger instruction;

the second response module is used for driving the second response element sub-library to respond to the first trigger instruction, triggering a certain level of interactive voice and characters corresponding to the certain level of service interactive process, sending the certain level of interactive voice and characters to the response output module, and generating a second trigger instruction at the same time;

the third response module is used for driving the third response element sub-library to respond to the second trigger instruction, triggering a service interaction window corresponding to the certain level of interaction voice and characters, and sending the service interaction window to the response output module;

the fourth response module is used for driving the fourth response element sub-library to respond to the second trigger instruction, triggering digital employee driving information corresponding to the certain level of interactive voice and characters, sending the digital employee driving information to the response output module, and generating a third trigger instruction at the same time;

and the response output module is used for responding to the third trigger instruction and synchronously pushing the certain level of interactive voice, text and business interactive window corresponding to the demand information and the digital employee driving information to the client.

Further, the server also includes a security verification subsystem, which includes:

the security identity library construction module is used for constructing a security identity library comprising user audio data; the safety identity library comprises a registration library and a verification library, wherein the registration library comprises voice audio registered by a user, and the verification library comprises voice audio input by the user to a certain specified text;

the first safety verification module is used for performing secondary identity safety verification after primary user identity safety verification is performed on the banking business handling terminal, and starting the artificial intelligent interaction system after the verification is passed;

and the second security verification module is used for judging whether the user is a security identity associated with the bank account in real time according to the voice input information of the user, triggering and locking the artificial intelligence interaction system if the user is not the security identity, and further triggering and locking the banking business handling terminal.

Further, the client comprises an authority setting subsystem for setting a plurality of security identities associated with the bank account, wherein each security identity information comprises voice information thereof.

Further, the server further includes a data switching subsystem, which includes:

the role switching module is used for responding to the demand information and switching the digital employee role matched with the demand information;

and the scene switching module is used for responding to the demand information and switching the service interaction scene matched with the demand information.

And the server also comprises an interruption subsystem which is used for receiving new user voice input information in real time in the process of outputting the response voice by the digital staff, pausing the output of the current response voice when receiving the new user voice input information, simultaneously generating and sending a trigger command to the ASR subsystem, and carrying out a new round of response on the new user voice input information.

Furthermore, the server also comprises a dormancy subsystem, which is used for monitoring the voice input information of the user in real time and controlling the artificial intelligence interactive system to enter a dormant state when the voice input information of the user is not monitored and often exceeds a preset threshold value; and in the sleeping process, if the voice input information of the user is monitored, immediately awakening the artificial intelligence interactive system to recover to the state before sleeping.

Further, the server also includes a storage query subsystem, which includes:

the interactive record storage module is used for recording and storing interactive information of the digital staff and the user, wherein the interactive information comprises voice and characters;

and the interaction record query module is used for querying the information stored by the interaction record storage module.

Furthermore, the server also comprises an authority control subsystem for controlling the authority of the user for accessing each subsystem of the server.

Further, the data information configuration subsystem comprises:

the first configuration module is used for configuring the information of the digital staff;

the second configuration module is used for configuring the information of the response library;

the third configuration module is used for configuring the authentication information of the identity security authentication performed by the security authentication subsystem;

a fourth configuration module, configured to configure switching data information of the data switching subsystem;

a fifth configuration module, configured to configure information that the hibernation subsystem enters or suspends a hibernation state;

the sixth configuration module is used for configuring the storage query subsystem to realize recording and storing information;

and the seventh configuration module is used for configuring the authority information of the authority control subsystem.

Compared with the prior art, the invention has the following remarkable advantages: 1) digital staff are constructed through an image modeling technology, and compared with other intelligent customer service systems, the system provides more temperature and more humanized services for customers; 2) the user's intention can be fully understood in the course of continuous interaction through natural language understanding, context understanding and slot-lifting techniques, the method and the system have the advantages that multiple rounds of interaction are carried out with a user (compared with the prior intelligent customer service system which can only directly jump to a primary page for service handling, the method and the system can directly realize the extraction of key service elements, complete the filling of relevant information of the service handling, only the confirmation of the user on the primary page is needed, the convenience of the service handling of the user is improved, the method and the system jump to the primary page for service handling and then return, new service handling can be continuously carried out, and the complete new round of interaction is not needed to be initiated from the initial starting interaction), and the digital staff can be displayed for the user through multimedia resources, so that the method and the system are closer to the mode of manual service handling, are more humane and intelligent, and compared with manual service, the efficiency of the service handling is higher; 3) the multi-scene switching can be realized, the multi-scene switching comprises a hall scene, a transfer scene, a financing scene and the like, the experience of the customer on the spot is brought, and the trust of the user is enhanced to a certain extent; 4) intelligent interruption at any time can be realized, the requirements of a user can be responded in time, the human-computer interaction is more efficient, quicker and natural, and the enhancement of the customer experience is facilitated; 5) the system has stronger security function, and can guarantee the information security and account security of the client in real time and all-round through multi-level security verification; 6) the provided feature extraction method comprehensively considers the tone, the Mel frequency cepstrum coefficient and the textural features of the audio, can improve the calculation precision of the subsequent similarity to a certain extent, and further improves the safety verification precision; 7) when the safety identity is judged through voice, two-stage safety judgment in different modes is adopted, the first-stage safety judgment process is relatively simple, and if the first-stage safety judgment is finished directly, the complexity of an algorithm can be reduced; the second-level safety judgment is based on a deep learning method to construct a similarity model, and the calculation precision is higher.

The present invention is described in further detail below with reference to the attached drawing figures.

Drawings

FIG. 1 is a schematic diagram of an artificial intelligence interaction system adapted for use in a bank to enhance on-site experience in one embodiment.

Fig. 2 is a schematic structural diagram of an artificial intelligence interaction system for improving on-site experience suitable for banks in an embodiment.

FIG. 3 is a block diagram of a reply library construction subsystem in one embodiment.

FIG. 4 is a block diagram of an acknowledgement response subsystem in one embodiment.

FIG. 5 is a flow diagram of digital lobby transfer in one embodiment.

Figure 6 is a flow diagram of a digital lobby credit card bill inquiry in one embodiment.

FIG. 7 is a diagram illustrating a service interaction window corresponding to a service interaction process at a certain level in the transfer service in one embodiment.

FIG. 8 is a flow diagram that illustrates determining whether a user is a secure identity associated with a bank account, under an embodiment.

FIG. 9 is a flow diagram of constructing a deep learning based acoustic similarity calculation model, under an embodiment.

FIG. 10 is a flow diagram of feature extraction in one embodiment.

FIG. 11 is a flow diagram of feature extraction in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In one embodiment, with reference to fig. 1 and fig. 2, an artificial intelligence interaction system suitable for banks and capable of improving on-site experience is provided, and is applied to a banking transaction terminal, where the system includes a client and a server;

here, the artificial intelligence interactive system may also be referred to as a digital business hall, and the banking transaction terminal includes a mobile banking APP and the like, and the digital business hall is embedded into the mobile banking APP in the form of an SDK. A customer can jump into the digital business hall by clicking the digital business hall entrance and the floating frame of the primary page in the mobile phone bank APP, and a WEBRTC session channel can be automatically established after the digital business hall entrance and the floating frame enter.

The client is used for performing face-to-face voice interaction with a user through digital staff, receiving voice input information of the user, sending the voice input information to the server, responding to and displaying response content fed back by the server; the user voice input information comprises service handling requirement information;

here, the digital staff is a 'live person' with expression, action and speech, which is made by one or more pictures of a person at different angles and a plurality of sentences of voice, so as to realize a 'digital clone' of a person.

Here, the voice interaction mode supports a call mode, a session state is always kept, and a real-time session is realized; the talkback mode is supported, conversation can be carried out by pressing the voice button for a long time, environmental noise is prevented from being captured and analyzed in real time, and response efficiency and accuracy are improved.

Further, in one embodiment, the server includes:

here, the knowledge response base also includes a daily communication response base unrelated to the business.

here, a Speech Recognition technology (ASR) is a technology for a machine to understand human language through a Recognition and understanding process. The voice recognition technology is a key technology of man-machine interaction in information technology, and is widely applied to call centers, telecommunication value-added services and enterprise informatization systems at present. With the deep application of speech recognition in brand new application fields such as speech search, speech control, etc., the speech recognition technology is praised by authority people in the industry as a key technology which may possibly lead to the revolution in the field of human-computer interfaces. In recent years, the voice recognition technology has sufficiently advanced, the invention preferably adopts the most advanced DNN (deep neural network) technology at present, and a powerful, stable and easy-to-use core power engine is provided for constructing high-value self-service voice service.

The information analysis subsystem is used for analyzing the voice input information and characters of the user, identifying user requirements including service handling requirements, and sending the required information to the response subsystem;

here, the parsing of the text is specifically realized by a natural language processing N L P technology, and the user needs are identified by semantic understanding, context understanding, and the like.

By adopting the mode, the intention of the user can be fully understood aiming at different expression words of the user to the same requirement, and the accuracy of analysis is improved. For example, for the demand of transfer business, the user does not need to accurately describe professional words such as 'transfer' and 'remittance', and even if the words are described as 'i want to transfer money' and 'i want to remit money', the information analysis subsystem can fully understand that the intention of the user is 'transfer' and 'remittance'.

By adopting the mode, the information analysis subsystem can directly analyze and respond the final requirement of the user, the final requirement can be a certain link in a certain service handling process, rather than only analyzing the service to be handled, and then the response is sequentially inquired from the initial link of the service handling until the final requirement is reached, so that the method is more pertinent, the information analysis efficiency is improved, and the subsequent response efficiency is further improved. For example, if the user needs to "i want to transfer 100 yuan to zhang san", the information analysis subsystem may directly analyze the transfer and the transfer object, and then directly answer the transfer object interaction window as shown in fig. 7.

Further, in one embodiment, in combination with fig. 3, the responder library construction subsystem comprises:

here, each level of service interaction process corresponds to one question and answer in the service handling process, and the main body of the question and answer can be a user or a digital employee. All the questions and answers are alternately and continuously, and a complete business handling process is realized. Fig. 5 and 6 show the processes of transfer and credit card bill inquiry, respectively, and each judgment corresponds to a question and answer.

here, the conversion of each level of service interaction process into corresponding interactive speech is realized by a speech synthesis TTS technology (also called text-to-speech conversion), which is a technology for converting text information into speech data, relates to various advanced high-tech technologies such as acoustics, linguistics, digital signal processing, multimedia and the like, and is an advanced technology in the field of chinese information processing. The invention is preferably realized by adopting an artificial intelligence speech synthesis system STTS. The synthesis engine of a speech synthesis system can be simply broken down into two steps of processing in the process of completing the conversion of text to speech data: the text is firstly subjected to grammatical analysis at the front end, processed by a dictionary and rules to obtain a format specification, carries information of a grammar layer and is transmitted to the rear end; the back end obtains rhythm information of voice time length, pitch and the like through rhythm analysis processing on the basis of the front end analysis result, then generates a voice unit (a temporary synthesis mode) according to the rhythm information or selects the most appropriate voice unit (a splicing mode is recorded in advance) in a voice library, and finally voice data can be obtained through adjusting and splicing the voice units.

A third response element sub-library generating module, configured to generate a service interaction window corresponding to each level of service interaction process, as shown in fig. 5, which is a service interaction process "user: i transfer 100 Yuan to Zhang III; digital staff: and finding the following account information of Zhang III, and asking which account to transfer to the corresponding business interaction window.

Further, in one embodiment, with reference to fig. 4, the response subsystem includes:

here, the natural language processing N L P technology is used to perform context semantic understanding on the demand information, and obtain a certain level of service interaction process corresponding to the demand information, where the level of service interaction may be an answer to a user question by a digital employee or an active question to the user.

here, the third responding module and the fourth responding module may not be limited to the above-mentioned sequential execution, may respond synchronously, or may respond first by the fourth responding module.

Here, the third trigger instruction may be generated by a third corresponding module.

Here, exemplarily, in combination with fig. 5 and fig. 7, taking the transfer as an example, assuming that the initial requirement of the user is "i want to transfer 100 yuan to zhang san", the first response module triggers a certain level of business interaction process through conversational analysis and semantic understanding: the digital staff asks the user to find the account information of the third account and asks for which account to transfer to, and then the second response module, the fourth response module and the response output module are triggered successively to acquire the response voice, the characters and the interaction window corresponding to the account information of the third account and the account to which the digital staff asks for transfer to, and synchronously push the response voice, the characters and the interaction window to the client, so that the interaction of the digital staff and the user on the service interaction process is realized. Then, a user can input new voice information aiming at the questions of the digital staff from the subjective angle, and therefore, the intelligent communication subsystem, the ASR subsystem, the information analysis subsystem and the response subsystem are triggered to carry out a new round of response, and multiple rounds of interaction are achieved in a circulating mode until the user exits the system.

By adopting the scheme of the embodiment, the intention of the user is fully understood in the continuous interaction process, the user carries out multi-round interaction, digital staff can check the user through multimedia resources, the mode of manual service handling is closer, the service handling is more humanoid and intelligent, and the efficiency of service handling is higher compared with manual service.

Further, in one embodiment, the server further includes a security verification subsystem, which includes:

here, the secondary identity security verification includes voiceprint verification, face recognition verification, or the like.

By adopting the scheme of the embodiment, the safety of the personal information and the account information of the user can be ensured in real time and in multiple ways.

Further, in an embodiment, with reference to fig. 8, the determining whether the user is a secure identity associated with a bank account in real time according to the voice input information of the user includes:

extracting a certain voice audio of the user associated with the current bank account from the security identity library, and respectively extracting the characteristics of the voice audio and the voice audio input by the current user to obtain respective corresponding audio vectors;

calculating the similarity of the two audio vectors by using a cosine similarity formula, and recording the similarity as a first similarity;

judging whether the first similarity is higher than a first preset threshold value, if so, judging that the current user is a safe identity, otherwise, executing the next step;

constructing a sound similarity calculation model based on deep learning; with reference to fig. 9, the specific process includes:

extracting the voice audio of the user A from the registry, extracting the characteristics of the audio to obtain corresponding audio vectors which are recorded as audio characteristicsa；

Extracting the voice audio of each user in the verification library to obtain corresponding audio vectors which are respectively recorded as audio features

Mechanism for preventing the generation of dustSet of audio featuresB；nTo verify the number of speech tones in the library;

from the set of audio featuresBRandomly extracting a plurality of audio features, wherein each audio feature is respectively corresponding to the audio featuresaForming a sample to form a training sample set;

inputting the training sample set into L STM neural network for training to generate a sound similarity calculation model;

the audio vector of the voice audio input by the current user and the audio feature setBInputting the audio features corresponding to the user into the sound similarity calculation model, and outputting the similarity of the two, and recording the similarity as a second similarity;

and judging whether the second similarity is higher than a second preset threshold, if so, judging that the current user is a safe identity, otherwise, judging that the current user is an unsafe identity.

Here, the first preset threshold and the second preset threshold are preferably set to 80% and 50%, respectively.

By adopting the scheme of the embodiment, two-stage different modes of safety judgment are adopted, the first-stage safety judgment process is relatively simple, and the complexity of the algorithm can be reduced if the first-stage direct judgment is finished; the second-level safety judgment is based on a deep learning method to construct a similarity model, and the calculation precision is higher.

Further, in one embodiment, with reference to fig. 10, the specific process of feature extraction includes:

extracting tone features in the voice audio, wherein the specific process comprises the following steps:

processing the voice audio signal by utilizing a YIN algorithm;

detecting the pitch feature, namely the period of the fundamental frequency of the signal processed by the YIN algorithm, specifically comprises the following steps:

extracting minimum value points of the signal processed by the YIN algorithm;

judging whether the minimum value point is smaller than a preset judgment threshold value or not, and if not, taking the offset of the minimum value point and the preset judgment threshold value as the period of the fundamental frequency; if yes, executing the following steps;

acquiring all minimum value points smaller than a preset judgment threshold, classifying the minimum value points of which the relative absolute difference of the minimum values is smaller than a preset error into the same minimum value point sequence to obtain a plurality of minimum value point sequences, and performing ascending arrangement on the minimum value points in each sequence;

calculating the weight of each minimum value point sequence according to the number of the minimum value points in each minimum value point sequence;

acquiring the minimum value point in the minimum value point sequence with the maximum weight value, and taking the offset between the minimum value point and the preset judgment threshold value as the period of the fundamental frequency;

extracting Mel frequency cepstrum coefficient of the voice audio;

extracting the texture features of the voice audio; the texture features may include: uniform local binary patterns, histogram fourier of local binary patterns, local phase quantization, etc.;

and splicing the pitch feature, the Mel frequency cepstrum coefficient and the texture feature to form an audio vector.

Further, in one embodiment, with reference to fig. 11, the specific process of feature extraction includes:

constructing a function:

wherein the content of the first and second substances,

the formula is an AMDF function transformation formula, and the formula is as follows:

in the formula (I), the compound is shown in the specification,xwhich represents the audio signal of the speech to be transmitted,Lis the length of the data frame or frames,

is an offset, and the value range is 0 toL-1；x _iIn the presentation data frameiA signal at the location;

the formula is the sum of the minimum values of the respective data in the two frame signals:

in the formula (I), the compound is shown in the specification,

；

the formula is the sum of the maximum values of the respective data in the two frame signals:

the total number of data points corresponding to the opposite sign of the position data value in the two frames of signals is represented by the following formula:

the minimum value of the total number of data points in which adjacent data points change in a positive direction and the total number of data points in which adjacent data points change in a negative direction in the two-frame signal is represented by the following formula:

transforming a speech audio signal using the function;

detecting the period of the fundamental frequency of the signal after the function transformation, namely the pitch characteristic, specifically as follows:

extracting minimum value points of the signals after function transformation;

extracting Mel frequency cepstrum coefficient of the voice audio;

extracting the texture features of the voice audio;

By adopting the scheme of the embodiment, the pitch of the audio, the Mel frequency cepstrum coefficient and the textural features are comprehensively considered, the calculation precision of the subsequent similarity can be improved to a certain extent, and the safety verification precision is further improved.

Further, in one embodiment, the client includes an authority setting subsystem for setting a plurality of security identities associated with the bank account, and each security identity information preferably includes voice information thereof, since the present invention is a system prone to voice interaction.

Here, the secure identity information may also be a voiceprint, a face, or the like.

Further, in one embodiment, the server further includes a data switching subsystem, where the subsystem includes:

here, if the demand is business consultation or other daily consultation, the role of the matched digital staff can be a hall manager; if the requirement is business handling, the matched digital employee role can be a teller or a business handler.

The service interaction scene is constructed according to the actual scene of the bank outlets, and covers various scenes according to function division, wherein the scenes comprise marketing product recommendation and the like such as welcome, service handling, financing and the like, and can be dynamic videos or propaganda diagrams and the like. And the welcome scene is automatically switched when the artificial intelligent interaction system is awakened.

By adopting the scheme of the embodiment, the field experience (including pleasure, trust and the like) of the user can be improved.

Further, in one embodiment, the server further includes an interrupt subsystem, configured to receive new user voice input information in real time during a process of outputting the response voice by the digital staff, suspend output of a current response voice when receiving the new user voice input information, and simultaneously generate and send a trigger command to the ASR subsystem to perform a new round of response to the new user voice input information.

Compared with the existing interruption mode, the scheme of the embodiment is as follows: the small degree is interrupted by means of the nominal call triggering statement 'small degree', and siri can be interrupted only by triggering the dialogue button. In addition, even if the user does not know the system operation manual and trigger sentences, the system can be used without barriers, is more humanoid and intelligent, is like normal person-to-person communication, and reduces the use threshold of the system.

Further, in one embodiment, the server further includes a sleep subsystem, configured to monitor the user voice input information in real time, and control the artificial intelligence interaction system to enter a sleep state when the user voice input information is not monitored and often exceeds a preset threshold; and in the sleeping process, if the voice input information of the user is monitored, immediately awakening the artificial intelligence interactive system to recover to the state before sleeping.

Here, the dormant state may be locked by the artificial intelligence interactive system, or may be advertised to the client for pushing activities or products to perform scrolling playing.

By adopting the scheme of the embodiment, the business handling state can not be restarted for many times due to automatic interruption of unexpected events of the user, the current business handling state can be kept uninterrupted and awakened in time (unless the user actively proposes a flow for quitting the current business handling), and the experience of the user is improved.

By adopting the scheme of the embodiment, the safety of the user account information can be further ensured through locking.

By adopting the scheme of the embodiment, the bank business and the products can be promoted in real time by fully utilizing resources through propaganda and pushing, so that the cost of artificial promotion and the success rate of promotion are improved to a certain extent.

Further, in one embodiment, the server further includes a storage query subsystem, which includes:

By adopting the scheme of the embodiment, the follow-up further analysis of the clients is facilitated, such as the classification of the client intentions is performed, and the most frequent requirements of the clients are obtained, so that the clients can be visited and followed in a targeted manner, and the good feeling of the clients is improved.

Further, in one embodiment, the server further includes an authority control subsystem for controlling the authority of the user for accessing the subsystems of the server.

By adopting the scheme of the embodiment, the safety of the whole manual interaction system and the privacy of the user can be fully guaranteed.

Further, in one embodiment, the data information configuration subsystem includes:

here, the information of the digital employee includes the character identity, image, and the like of the digital employee.

here, the information of the response library includes the specification, functional area division, data description method, etc. of the response library itself, the number, tone, etc. of the response voice units, the font, arrangement, etc. of the response text, the display method (including animation, color, display position, etc.) and display content of the business interaction window, and the expression, action, mouth shape, etc. of the digital employee.

here, the different authentication manners correspond to different authentication information, and the authentication information may include one or more of a user name and a password, a face, a fingerprint, a voice, and the like.

here, the switching data information includes all data to be switched (roles, scenes, and the like), and instructions for switching, a switching manner, and the like.

here, the information includes a specific value of the preset threshold, a trigger instruction to enter or suspend the hibernation state, a specific manner of the hibernation state, and the like.

here, the information includes data content (recording date, time point, business transaction, etc.) to be stored, format or manner of data storage, etc. (for example, data in a certain period of time may form a data packet, which facilitates later data search and analysis).

Here, the authority information includes authority of the access subsystem corresponding to each service end user (for example, whether the access is authorized, valid time period of the existence of the authority, and the like), type of the authority (read only, modifiable, and the like), and the like.

The systems and modules illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. For convenience of description, the above system is described with the functions divided into various modules. Of course, when implementing one or more of the present description, the functions of each module may be implemented in one or more software and/or hardware, or a module implementing the same function may be implemented by a combination of multiple sub-modules or sub-units, etc. The above-described system embodiments are merely illustrative, and for example, the division of the modules is only one logical division, and other divisions may be realized in practice, for example, a plurality of modules or unit components may be combined or integrated into another system, or some features may be omitted or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical mechanical or other form.

The foregoing illustrates and describes the principles, general features, and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. An artificial intelligence interactive system suitable for banks and capable of improving on-site experience sense is applied to a banking business handling terminal and is characterized by comprising a client and a server;

the server side is used for responding to the voice input information of the user for response and configuring the information of the client side and the server side;

the server side comprises:

2. The artificial intelligence interaction system for improving on-site experience suitable for banks according to claim 1, wherein the answering bank construction subsystem includes:

3. The bank-adapted artificial intelligence interaction system for improving on-site experience according to claim 2, wherein the response subsystem comprises:

4. The artificial intelligence interaction system for improving on-site experience suitable for banks according to claim 1, wherein the server further comprises a security verification subsystem, which comprises:

5. The artificial intelligence interaction system applicable to banks and capable of improving on-site experience sense according to claim 4, wherein the real-time judgment of whether the user is a security identity associated with a bank account or not is performed according to the voice input information of the user, and the specific process includes:

constructing a sound similarity calculation model based on deep learning; the specific process comprises the following steps:

Forming an audio feature setB；nTo verify the number of speech tones in the library;

6. The artificial intelligence interaction system applicable to banks and capable of improving experience sense in the field is characterized in that the specific process of feature extraction comprises the following steps:

processing the voice audio signal by utilizing a YIN algorithm;

extracting minimum value points of the signal processed by the YIN algorithm;

extracting Mel frequency cepstrum coefficient of the voice audio;

extracting the texture features of the voice audio;

7. The artificial intelligence interaction system applicable to banks and capable of improving experience sense in the field is characterized in that the specific process of feature extraction comprises the following steps:

constructing a function:

wherein the content of the first and second substances,

in the formula (I), the compound is shown in the specification,

；

transforming a speech audio signal using the function;

extracting minimum value points of the signals after function transformation;

extracting Mel frequency cepstrum coefficient of the voice audio;

extracting the texture features of the voice audio;

8. The artificial intelligence interaction system suitable for banks for improving experience in the field according to claim 1, wherein the server further comprises a data switching subsystem, which comprises:

the scene switching module is used for responding to the demand information and switching a service interaction scene matched with the demand information;

the server side also comprises an interruption subsystem which is used for receiving new user voice input information in real time in the process of outputting the response voice by the digital staff, pausing the output of the current response voice when receiving the new user voice input information, simultaneously generating and sending a trigger command to the ASR subsystem, and carrying out a new round of response on the new user voice input information.

9. The artificial intelligence interaction system for improving on-site experience for banks according to claim 1,

the server side also comprises a dormancy subsystem which is used for monitoring the voice input information of the user in real time and controlling the artificial intelligent interaction system to enter a dormant state when the voice input information of the user is not monitored and often exceeds a preset threshold value; in the sleeping process, if the voice input information of the user is monitored, the artificial intelligent interactive system is immediately awakened to be restored to the state before sleeping;

the server also comprises a storage query subsystem, which comprises:

the interactive record inquiry module is used for inquiring the information stored by the interactive record storage module;

the server also comprises an authority control subsystem used for controlling the authority of the user for accessing each subsystem of the server.

10. The artificial intelligence interaction system for improving the on-site experience of the bank as claimed in any one of claims 2 to 9, wherein the data information configuration subsystem comprises: