CN116542780A

CN116542780A - Data analysis method, device, equipment and storage medium based on artificial intelligence

Info

Publication number: CN116542780A
Application number: CN202310498118.0A
Authority: CN
Inventors: 李雨洁
Original assignee: Ping An Property and Casualty Insurance Company of China Ltd
Current assignee: Ping An Property and Casualty Insurance Company of China Ltd
Priority date: 2023-05-05
Filing date: 2023-05-05
Publication date: 2023-08-04

Abstract

The embodiment of the application belongs to the field of artificial intelligence, and relates to a data analysis method based on artificial intelligence, which comprises the following steps: acquiring an initial claim evidence text; segmenting the initial evidence text of the claim to obtain the evidence text of the claim; converting the evidence text of the claim settlement through a word vector model to generate word vector data; calling a plurality of text classification models corresponding to a plurality of evidence chain categories one by one; respectively inputting word vector data into each text classification model to obtain classification label results respectively output by each text classification model; integrating the classification label results to obtain target classification label results; and analyzing the target classification label result to generate an evidence integrity result. The application also provides a data analysis device, computer equipment and storage medium based on the artificial intelligence. Furthermore, the present application relates to blockchain techniques in which evidence integrity results may be stored. The method and the device improve the processing efficiency of the evidence integrity analysis of the evidence text of the claim.

Description

Data analysis method, device, equipment and storage medium based on artificial intelligence

Technical Field

The present disclosure relates to the field of artificial intelligence development, and in particular, to an artificial intelligence-based data analysis method, apparatus, computer device, and storage medium.

Background

With the rapid development of economy and society, the public's insurance awareness is increasingly enhanced, and claim settlement service is also one of very important links of clients and is receiving more and more attention from society. Along with the continuous deepening of the digital degree of the insurance industry, each insurance company is continuously improving the claim settlement service level of the company by depending on digital transformation.

At present, for the solution of evidence analysis about insurance claims, which takes life and health of people as insurance targets, the dangerous types such as life insurance and health of people are not more perfect, the integrity of the evidence is completely analyzed by relying on the claim materials submitted by manual auditing clients, and the processing mode needs higher labor cost, and the manual auditing has the humanized advantage and is finer, but a large amount of claim materials are backlogged, so that auditing efficiency is reduced.

Disclosure of Invention

An object of the embodiments of the present application is to provide a data analysis method, apparatus, computer device and storage medium based on artificial intelligence, so as to solve the existing solution of evidence analysis about insurance claims, and completely rely on the claims materials submitted by the artificial auditing clients to analyze the integrity of the evidence, which requires higher labor cost, thereby resulting in the technical problem of reduced auditing efficiency.

In order to solve the above technical problems, the embodiments of the present application provide a data analysis method based on artificial intelligence, which adopts the following technical scheme:

acquiring an initial claim settlement evidence text corresponding to a claim settlement case input by a user;

word segmentation processing is carried out on the initial claim evidence text to obtain claim evidence text;

converting the evidence text of the claim settlement through a preset word vector model to generate corresponding word vector data;

calling a plurality of text classification models corresponding to a plurality of preset evidence chain categories one by one;

respectively inputting the word vector data into each text classification model to obtain classification label results respectively output by each text classification model; each text classification model is a two-way long-short-term memory network model based on an attention mechanism and constructed through the relation between a preset claim evidence text and a preset label; the two-way long-short-term memory network model consists of a Bi-LSTM layer and an attribute layer;

integrating the classification label results to obtain integrated target classification label results;

and analyzing the target classification label result to generate an evidence integrity result corresponding to the evidence text of the claim.

Further, the step of converting the claim evidence text through a preset word vector model to generate corresponding word vector data specifically includes:

preprocessing the evidence text of the claim to obtain the evidence text of the specified claim;

determining a target word vector model from a plurality of preset word vector models;

and converting the specified claim evidence text through the target word vector model to generate corresponding word vector data.

Further, the step of preprocessing the evidence text of claim settlement to obtain the evidence text of specified claim settlement specifically includes:

removing stop words from the evidence text of the claim to obtain a first text;

converting the words in the first text based on a preset conversion rule to obtain a second text;

performing special punctuation mark removal processing on the second text to obtain a third text;

and taking the third text as the specified claim evidence text.

Further, the step of removing the stop word from the proof text of claim to obtain a first text specifically includes:

calling a preset stop word data table;

Reading all words in the evidence text of the claim;

matching all words in the evidence text of the claim with words in the stop word data table, and screening target words successfully matched;

and deleting the target word in the evidence text of the claim to obtain the first text.

Further, before the step of calling the multiple text classification models corresponding to the preset multiple evidence chain categories one by one, the method further comprises:

acquiring a training text which is acquired in advance and corresponds to a specified evidence chain category; the appointed evidence chain category is any one of all evidence chain categories;

word segmentation processing is carried out on the training text to obtain a plurality of training words;

generating training word vectors corresponding to the training words respectively by using the word vector model, and generating training text vectors according to the training word vector combination;

acquiring a text category label corresponding to the training text;

and training the two-way long-short-term memory network model by taking the training text vector as the input of the two-way long-short-term memory network model and taking the text category label as the expected output of the text classification model to obtain a specified text classification model corresponding to the specified evidence chain category.

Further, the step of analyzing the target classification label result to generate an evidence integrity result corresponding to the claim evidence text specifically includes:

acquiring a preset standard label set;

judging whether the target classification label result contains all labels in the standard label set or not;

if yes, generating a first evidence integrity result of the claim evidence text belonging to the complete evidence;

if not, generating a second evidence integrity result that the claim evidence text does not belong to the complete evidence.

Further, after the step of analyzing the target classification label result and generating the evidence integrity result corresponding to the claim evidence text, the method further includes:

if the evidence text of the claim does not belong to complete evidence, carrying out data matching on the target classification label result and all labels in the standard label set, and screening target labels which do not exist in the integrated classification label result from the standard label set;

generating corresponding missing early warning information based on the target tag;

and displaying the missing early warning information.

In order to solve the above technical problems, the embodiments of the present application further provide an artificial intelligence based data analysis device, which adopts the following technical scheme:

The first acquisition module is used for acquiring initial claim settlement evidence text corresponding to the claim settlement case input by the user;

the first processing module is used for carrying out word segmentation processing on the initial evidence text of the claim to obtain the evidence text of the claim;

the second processing module is used for converting the claim evidence text through a preset word vector model to generate corresponding word vector data;

the calling module is used for calling various text classification models corresponding to the preset various evidence chain categories one by one;

the third processing module is used for respectively inputting the word vector data into each text classification model to obtain classification label results respectively output by each text classification model; each text classification model is a two-way long-short-term memory network model based on an attention mechanism and constructed through the relation between a preset claim evidence text and a preset label; the two-way long-short-term memory network model consists of a Bi-LSTM layer and an attribute layer;

the integration module is used for integrating the classification label results to obtain integrated target classification label results;

the first generation module is used for analyzing the target classification label result and generating an evidence integrity result corresponding to the evidence text of the claim.

In order to solve the above technical problems, the embodiments of the present application further provide a computer device, which adopts the following technical schemes:

In order to solve the above technical problems, embodiments of the present application further provide a computer readable storage medium, which adopts the following technical solutions:

Compared with the prior art, the embodiment of the application has the following main beneficial effects:

in the embodiment of the application, an initial claim settlement evidence text corresponding to a claim settlement case and input by a user is firstly obtained; then, word segmentation processing is carried out on the initial claim evidence text to obtain claim evidence text; then converting the evidence text of the claim settlement through a preset word vector model to generate corresponding word vector data; subsequently calling a plurality of text classification models which are in one-to-one correspondence with a plurality of preset evidence chain categories; further inputting the word vector data into each text classification model respectively to obtain classification label results output by each text classification model respectively; integrating the classification label results to obtain integrated target classification label results; and finally, analyzing the target classification label result to generate an evidence integrity result corresponding to the evidence text of the claim. According to the embodiment of the application, the text classification model based on Bi-LSTM and Attention is adopted to conduct evidence integrity analysis on the evidence text of the claim settlement, automation and intellectualization of the insurance claim settlement flow are achieved, the working efficiency of processing the claim settlement application and auditing the claim settlement material is effectively improved, the processing efficiency of the evidence integrity analysis on the evidence text of the claim settlement is greatly improved, the claim settlement experience of clients is optimized, and the claim settlement process is accelerated.

Drawings

For a clearer description of the solution in the present application, a brief description will be given below of the drawings that are needed in the description of the embodiments of the present application, it being obvious that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow chart of one embodiment of an artificial intelligence based data analysis method according to the present application;

FIG. 3 is a schematic diagram of one embodiment of an artificial intelligence based data analysis device according to the present application;

FIG. 4 is a schematic structural diagram of one embodiment of a computer device according to the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description and claims of the present application and in the description of the figures above are intended to cover non-exclusive inclusions. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

In order to better understand the technical solutions of the present application, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings.

As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.

It should be noted that, the data analysis method based on artificial intelligence provided in the embodiments of the present application is generally executed by a server/terminal device, and accordingly, the data analysis device based on artificial intelligence is generally disposed in the server/terminal device.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow chart of one embodiment of an artificial intelligence based data analysis method according to the present application is shown. The artificial intelligence-based data analysis method comprises the following steps:

Step S201, obtaining initial claim settlement evidence text corresponding to claim settlement cases, which is input by a user.

In this embodiment, the electronic device (e.g., the server/terminal device shown in fig. 1) on which the data analysis method based on artificial intelligence operates may acquire the initial claim evidence text through a wired connection manner or a wireless connection manner. It should be noted that the wireless connection may include, but is not limited to, 3G/4G/5G connection, wiFi connection, bluetooth connection, wiMAX connection, zigbee connection, UWB (ultra wideband) connection, and other now known or later developed wireless connection. Among other things, insurance claims generally contain the following evidence chains (evidence chain classes): 1. accident proof type evidence: materials including accident evidence, disability evidence and the like; 2. medical evidence: materials including diagnostic certificates, surgical certificates, medical expense receipts, and checklists; 3. relationship evidence of the insured beneficiary and insured; 4. other claim materials required for a particular insurance, etc.

And step S202, performing word segmentation processing on the initial evidence text of the claim to obtain the evidence text of the claim.

In this embodiment, a word segmentation tool, such as a jieba tool, may be used to perform word segmentation on the initial claim evidence text to obtain the claim evidence text.

And step S203, converting the claim evidence text through a preset word vector model to generate corresponding word vector data.

In this embodiment, the foregoing converting the claim proof text by using a preset word vector model to generate a specific implementation process of the corresponding word vector data, which will be described in further detail in the following specific embodiments, which will not be described herein.

And step S204, calling a plurality of text classification models corresponding to the preset plurality of evidence chain categories one by one.

In the embodiment, the text classification model of Bi-LSTM+attribute is specifically adopted to perform the relevance check of the evidence chain, and meanwhile, the forward sequence, the reverse sequence and the sequence weight of the text are considered, so that important information can be focused more, and the model effect can be improved; in specific application, different evidence chain nodes can be set according to different claim settlement scenes, and text classification models are trained for each node respectively, so that the expansibility is strong.

Step S205, the word vector data are respectively input into each text classification model, and classification label results respectively output by each text classification model are obtained.

In this embodiment, each text classification model is a two-way long-short term memory network model based on an attention mechanism and constructed by a relationship between a preset claim evidence text and a preset label; the two-way long-short-term memory network model is composed of a Bi-LSTM layer and an attribute layer. Compared with the traditional manual auditing, the machine auditing makes the claim settling link more efficient, reduces backlog of claim settling materials, adopts a Bi-LSTM+attribute model structure, can fully utilize the previous and future information to extract text features, and makes the model more concentrate on finding useful information obviously related to output by the attribute, thereby improving the quality of output data and improving model effect. The introduction of the artificial intelligence technology changes the traditional insurance claim auditing mode, basically realizes real-time auditing, and greatly improves auditing efficiency and auditing precision.

And integrating the classification label results to obtain integrated target classification label results.

In this embodiment, the target class label result is composed of all the class label results.

In this embodiment, the foregoing specific implementation process of analyzing the target classification label result and generating the evidence integrity result corresponding to the claim evidence text will be described in further detail in the following specific embodiments, which will not be described herein.

Firstly, acquiring an initial claim evidence text corresponding to a claim case input by a user; then, word segmentation processing is carried out on the initial claim evidence text to obtain claim evidence text; then converting the evidence text of the claim settlement through a preset word vector model to generate corresponding word vector data; subsequently calling a plurality of text classification models which are in one-to-one correspondence with a plurality of preset evidence chain categories; further inputting the word vector data into each text classification model respectively to obtain classification label results output by each text classification model respectively; integrating the classification label results to obtain integrated target classification label results; and finally, analyzing the target classification label result to generate an evidence integrity result corresponding to the evidence text of the claim. According to the method and the device, the text classification model based on Bi-LSTM and Attention is adopted to conduct evidence integrity analysis on the evidence text of the claim settlement, automation and intellectualization of the insurance claim settlement flow are achieved, the working efficiency of processing the claim settlement application and checking the claim settlement material is effectively improved, the processing efficiency of the evidence integrity analysis on the evidence text of the claim settlement is greatly improved, the claim settlement experience of clients is optimized, and the claim settlement process is accelerated.

In some alternative implementations, step S203 includes the steps of:

and preprocessing the evidence text of the claim to obtain the evidence text of the specified claim.

In this embodiment, the foregoing preprocessing the proof text of claim to obtain a specific implementation process of the proof text of specified claim, which will be described in further detail in the following embodiments, which will not be described herein.

And determining a target word vector model from the preset various word vector models.

In this embodiment, the word vector model includes: word2vec, glove, fastText, elmo, GPT, bert, xlnet, etc. The above-mentioned determining process of the target word vector model is not limited, and for example, the target word vector model may be determined from the multiple word vector models according to the personal needs of the user, so as to improve the personal experience of the user. Or generating analysis reports of various word vector models in advance according to the test, and screening the word vector model with the best application effect from the analysis report to serve as the target word vector model.

In this embodiment, the target word vector model may be used to convert the specified claims evidence text after word segmentation into word vectors.

The method comprises the steps of preprocessing the evidence text of the claim to obtain the evidence text of the specified claim; then determining a target word vector model from a plurality of preset word vector models; and converting the specified claim evidence text through the target word vector model to generate corresponding word vector data. According to the method and the device, the specified claim evidence text can be rapidly subjected to conversion processing based on the target word vector model, so that the conversion processing efficiency of the specified claim evidence text is improved, word vector data meeting the processing specification of the text classification model can be accurately generated, the processing workload of the text classification model can be reduced, and the classification processing efficiency of the word vector model is improved.

In some optional implementations of this embodiment, the preprocessing the proof text of claims to obtain the proof text of specified claims includes the following steps:

and removing the stop word from the evidence text of the claim to obtain a first text.

In this embodiment, the above processing of removing the stop word from the proof text of claim to obtain the specific implementation process of the first text, which will be described in further detail in the following specific embodiments, which will not be described herein.

And carrying out conversion processing on the words in the first text based on a preset conversion rule to obtain a second text.

In this embodiment, the conversion rule may include at least complex conversion, case conversion, and the like.

And carrying out special punctuation mark removal processing on the second text to obtain a third text.

In this embodiment, the special punctuation mark may be preset according to the actual service usage requirement.

And taking the third text as the specified claim evidence text.

The method comprises the steps of removing stop words from the evidence text of the claim to obtain a first text; then converting the words in the first text based on a preset conversion rule to obtain a second text; and subsequently, carrying out special punctuation mark removal processing on the second text to obtain a third text, and taking the third text as the specified claim evidence text. According to the method and the device, the stop word processing, the conversion processing and the special punctuation mark processing are removed from the evidence text of the claim, so that the required specified evidence text of the claim meeting the processing specification of the word vector model is quickly and accurately generated, the processing workload of the word vector model can be reduced, and the conversion processing efficiency of the word vector model is improved.

In some optional implementations, the removing the stop word from the proof text of claim to obtain a first text includes the following steps:

and calling a preset stop word data table.

In this embodiment, the stop word data table may be a data table which is created in advance according to actual use requirements and stores stop words.

And reading all words in the evidence text of the claim.

And matching all words in the evidence text of the claim with words in the stop word data table, and screening target words successfully matched.

In this embodiment, the target word that is successfully matched refers to the words that are the same as the target word contained in the stop word data table.

The method comprises the steps of calling a preset stop word data table; then reading all words in the claim evidence text; then, matching all words in the evidence text of the claim with words in the stop word data table, and screening target words successfully matched; and deleting the target word in the evidence text of the claim to obtain the first text. According to the method and the device for removing the stop words from the evidence text of the claim settlement based on the use of the stop word data table, the first text which does not contain the stop words can be rapidly and accurately generated, and the accuracy of the generated first text is ensured.

In some alternative implementations, before step S204, the electronic device may further perform the following steps:

and acquiring a pre-acquired training text corresponding to the specified evidence chain category.

In this embodiment, the specified evidence chain category is any one of all the evidence chain categories. The training text is pre-collected text data suitable for designating the category of the evidence chain.

And performing word segmentation processing on the training text to obtain a plurality of training words.

In this embodiment, a word segmentation tool, such as a jieba tool, may be used to segment the training text.

Generating training word vectors corresponding to the training words respectively by using the word vector model, and generating training text vectors according to the training word vector combination.

In this embodiment, the training text vector is composed of all the training word vectors.

And obtaining a text category label corresponding to the training text.

In this embodiment, the training text may be labeled in advance according to actual service requirements. The training text can be marked by adopting a manual marking or machine marking or manual and machine marking mode, so that the marked category can be used as the category of expected output to train the text classification model.

In this embodiment, whether the evidence chain is complete or not is determined, that is, whether each evidence chain node has corresponding evidence is determined, and the problem is converted into calculating the correlation between the evidence chain node and the evidence text. By using the materials required by claims such as accident proof type evidence, medical type evidence, relation type evidence and the like as the label nodes, the evidence text possibly contains a plurality of types of information and possibly belongs to a plurality of evidence chain nodes, so that multi-label learning is required. The training process of the text classification model may include: the training text vector is input into a Bi-LSTM layer of a text classification model, meanwhile, a forward sequence and a reverse sequence of the text are considered, more complete semantics are obtained, the output is input into an Attention layer again, the Attention model is used for weighting and summing according to different weight parameter combinations according to the importance degree of each element in the input sequence, and a final classification label result is obtained through a softmax function. In order to quickly train the text classification model, the text classification model is optimized by adopting an Adam optimization method, and compared with the gradient which is needed to be calculated on a data set in each iteration of gradient descent, the Adam algorithm has the advantage of high calculation speed because the calculation cost is high when the data set is large. In addition, a text classification model is trained for each evidence chain node according to the steps.

The method comprises the steps of obtaining a training text which is acquired in advance and corresponds to a specified evidence chain category; then word segmentation processing is carried out on the training text to obtain a plurality of training words; generating training word vectors corresponding to the training words respectively by using the word vector model, and generating training text vectors according to the training word vector combination; subsequently obtaining a text category label corresponding to the training text; and finally, taking the training text vector as the input of the two-way long-short-term memory network model, taking the text category label as the expected output of the text classification model, and training the two-way long-short-term memory network model to obtain the specified text classification model corresponding to the specified evidence chain category. The training process can be used for quickly establishing a plurality of text classification models corresponding to various evidence chain categories respectively, and the accuracy of the text classification models can be effectively improved.

In some alternative implementations of the present embodiment, step S207 includes the steps of:

and acquiring a preset standard label set.

In this embodiment, the standard label set is a set of labels which is constructed according to actual business requirements and at least stores accident proof type evidence (accident proof, disability proof, etc.), medical type evidence (diagnostic proof, operation proof, medical expense receipt, list, etc.), relationship type evidence, etc.

And judging whether the target classification label result contains all labels in the standard label set.

If yes, a first evidence integrity result that the claim evidence text belongs to complete evidence is generated.

In this embodiment, if the target classification label result includes all labels in the standard label set, it indicates that each evidence chain node of the claim evidence text is complete, that is, there is no missing evidence chain node.

In this embodiment, if the target classification label result does not include all labels in the standard label set, it indicates that each evidence chain node of the claim evidence text is not complete, that is, there is a missing evidence chain node in the claim evidence text.

The method comprises the steps of obtaining a preset standard label set; then judging whether the target classification label result contains all labels in the standard label set or not; if yes, generating a first evidence integrity result of the claim evidence text belonging to the complete evidence; if not, generating a second evidence integrity result that the claim evidence text does not belong to the complete evidence. According to the method and the device for classifying the target based on the standard label set, the target classification label result is analyzed, the evidence integrity result corresponding to the claim evidence text can be automatically and accurately generated, and the generation efficiency and the accuracy of the evidence integrity result are improved.

In some optional implementations of this embodiment, after step S207, the electronic device may further perform the following steps:

and if the evidence text of the claim does not belong to the complete evidence, carrying out data matching on the target classification label result and all labels in the standard label set, and screening target labels which are not in the integrated classification label result from the standard label set.

In this embodiment, the target labels screened from the standard label set that are not present in the integrated classification label result refer to missing evidence chain nodes that are present in the claim evidence text.

And generating corresponding missing early warning information based on the target tag.

In this embodiment, a preset information template may be obtained, and then the target tag is correspondingly filled into the information template, so as to generate the missing early warning information. The information template is written and generated according to actual use requirements.

And displaying the missing early warning information.

In this embodiment, the display mode of the missing early warning information is not limited, and text display, picture display, voice display and other modes can be adopted.

When detecting that the evidence text of the claim does not belong to complete evidence, carrying out data matching on the target classification label result and all labels in the standard label set, and screening target labels which do not exist in the integrated classification label result from the standard label set; then generating corresponding missing early warning information based on the target tag; and displaying the missing early warning information later. When detecting that the evidence text of the claim settlement does not belong to complete evidence, the method can intelligently screen target labels which do not exist in the integrated classification label result from the standard label set and create missing early warning information so as to timely inform clients of uploading missing related materials through the generated missing early warning information, thereby effectively reducing the auditing time of the evidence material of the claim settlement and improving the efficiency of insurance claim settlement.

It is emphasized that to further guarantee the privacy and security of the evidence integrity results, the evidence integrity results may also be stored in nodes of a blockchain.

The blockchain referred to in the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by computer readable instructions stored in a computer readable storage medium that, when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of an artificial intelligence-based data analysis apparatus, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus is particularly applicable to various electronic devices.

As shown in fig. 3, the artificial intelligence based data analysis apparatus 300 according to the present embodiment includes: a first acquisition module 301, a first processing module 302, a second processing module 303, a calling module 304, a third processing module 305, an integration module 306, and a first generation module 307. Wherein:

the first obtaining module 301 is configured to obtain an initial claim evidence text corresponding to a claim case input by a user;

the first processing module 302 is configured to perform word segmentation processing on the initial claim evidence text to obtain a claim evidence text;

the second processing module 303 is configured to perform conversion processing on the claim evidence text through a preset word vector model, and generate corresponding word vector data;

the calling module 304 is configured to call a plurality of text classification models corresponding to a plurality of preset evidence chain categories one by one;

the third processing module 305 is configured to input the word vector data into each text classification model, and obtain a classification label result output by each text classification model; each text classification model is a two-way long-short-term memory network model based on an attention mechanism and constructed through the relation between a preset claim evidence text and a preset label; the two-way long-short-term memory network model consists of a Bi-LSTM layer and an attribute layer;

The integrating module 306 is configured to integrate the classification label results to obtain an integrated target classification label result;

the first generating module 307 is configured to analyze the target classification label result and generate an evidence integrity result corresponding to the claim evidence text.

In this embodiment, the operations performed by the modules or units respectively correspond to the steps of the artificial intelligence-based data analysis method in the foregoing embodiment one by one, and are not described herein again.

In some alternative implementations of the present embodiment, the second processing module 303 includes:

the first processing sub-module is used for preprocessing the evidence text of the claim to obtain the evidence text of the specified claim;

the determining submodule is used for determining a target word vector model from a plurality of preset word vector models;

and the second processing sub-module is used for converting the specified claim evidence text through the target word vector model to generate corresponding word vector data.

In some optional implementations of this embodiment, the first processing submodule includes:

the first processing unit is used for removing stop words from the evidence text of the claim to obtain a first text;

the second processing unit is used for carrying out conversion processing on words in the first text based on a preset conversion rule to obtain a second text;

the third processing unit is used for removing special punctuation marks from the second text to obtain a third text;

and the determining unit is used for taking the third text as the specified claim evidence text.

In this embodiment, the operations performed by the modules or units respectively correspond to the steps of the data analysis method based on artificial intelligence in the foregoing embodiment one by one, which is not described herein again.

In some optional implementations of the present embodiment, the first processing unit includes:

the calling subunit is used for calling a preset stop word data table;

a reading subunit, configured to read all words in the proof text of claims;

the processing subunit is used for carrying out matching processing on all words in the evidence text of the claim and the words in the stop word data table, and screening out successfully matched target words;

And the deleting subunit is used for deleting the target word in the evidence settlement text to obtain the first text.

In some optional implementations of the present embodiments, the artificial intelligence based data analysis apparatus further includes:

the second acquisition module is used for acquiring a training text which is acquired in advance and corresponds to the specified evidence chain category; the appointed evidence chain category is any one of all evidence chain categories;

the fourth processing module is used for carrying out word segmentation processing on the training text to obtain a plurality of training words;

the second generation module is used for generating training word vectors corresponding to the training words respectively by using the word vector model, and generating training text vectors according to the training word vector combination;

the third acquisition module is used for acquiring text category labels corresponding to the training texts;

and the training module is used for training the two-way long-short-term memory network model by taking the training text vector as the input of the two-way long-short-term memory network model and taking the text category label as the expected output of the text classification model to obtain a specified text classification model corresponding to the specified evidence chain category.

In some optional implementations of the present embodiment, the first generating module 307 includes:

the acquisition sub-module is used for acquiring a preset standard label set;

the judging submodule is used for judging whether the target classification label result contains all labels in the standard label set or not;

the first generation sub-module is used for generating a first evidence integrity result of the evidence text belonging to the complete evidence if the evidence text belongs to the complete evidence;

and the second generation sub-module is used for generating a second evidence integrity result of which the evidence text of the claim does not belong to the complete evidence if not.

the screening module is used for carrying out data matching on the target classification label result and all labels in the standard label set if the evidence text of the claim does not belong to complete evidence, and screening target labels which do not exist in the integrated classification label result from the standard label set;

The third generation module is used for generating corresponding missing early warning information based on the target tag;

and the display module is used for displaying the missing early warning information.

In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 4, fig. 4 is a basic structural block diagram of a computer device according to the present embodiment.

The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It should be noted that only computer device 4 having components 41-43 is shown in the figures, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculations and/or information processing in accordance with predetermined or stored instructions, the hardware of which includes, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (fields-Programmable Gate Array, FPGAs), digital processors (Digital Signal Processor, DSPs), embedded devices, etc.

The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.

The memory 41 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the computer device 4. Of course, the memory 41 may also comprise both an internal memory unit of the computer device 4 and an external memory device. In this embodiment, the memory 41 is typically used to store an operating system and various application software installed on the computer device 4, such as computer readable instructions of an artificial intelligence-based data analysis method. Further, the memory 41 may be used to temporarily store various types of data that have been output or are to be output.

The processor 42 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or process data, such as executing computer readable instructions of the artificial intelligence based data analysis method.

The network interface 43 may comprise a wireless network interface or a wired network interface, which network interface 43 is typically used for establishing a communication connection between the computer device 4 and other electronic devices.

The present application also provides another embodiment, namely, a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of an artificial intelligence-based data analysis method as described above.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.

It is apparent that the embodiments described above are only some embodiments of the present application, but not all embodiments, the preferred embodiments of the present application are given in the drawings, but not limiting the patent scope of the present application. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a more thorough understanding of the present disclosure. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing, or equivalents may be substituted for elements thereof. All equivalent structures made by the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the protection scope of the application.

Claims

1. A data analysis method based on artificial intelligence, comprising the steps of:

2. The artificial intelligence-based data analysis method according to claim 1, wherein the step of converting the claim evidence text by a preset word vector model to generate corresponding word vector data specifically comprises:

3. The method for analyzing data based on artificial intelligence according to claim 2, wherein the step of preprocessing the proof text of claim to obtain the specified proof text of claim comprises:

removing stop words from the evidence text of the claim to obtain a first text;

and taking the third text as the specified claim evidence text.

4. The method for analyzing data based on artificial intelligence according to claim 3, wherein the step of removing stop words from the proof text of claim to obtain a first text comprises:

calling a preset stop word data table;

reading all words in the evidence text of the claim;

5. The artificial intelligence based data analysis method according to claim 1, further comprising, before the step of calling a plurality of text classification models corresponding to a predetermined plurality of evidence chain categories one by one:

Acquiring a text category label corresponding to the training text;

6. The method of claim 1, wherein the step of analyzing the target classification label result to generate an evidence integrity result corresponding to the claim evidence text specifically comprises:

acquiring a preset standard label set;

7. The artificial intelligence based data analysis method of claim 6, further comprising, after the step of analyzing the target classification label result to generate an evidence integrity result corresponding to the claim evidence text:

and displaying the missing early warning information.

8. An artificial intelligence based data analysis device comprising:

9. A computer device comprising a memory having stored therein computer readable instructions which when executed implement the steps of the artificial intelligence based data analysis method of any of claims 1 to 7.

10. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the artificial intelligence based data analysis method of any of claims 1 to 7.