CN114780602A - Data tracing analysis method and device, computer equipment and storage medium - Google Patents

Data tracing analysis method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN114780602A
CN114780602A CN202210144943.6A CN202210144943A CN114780602A CN 114780602 A CN114780602 A CN 114780602A CN 202210144943 A CN202210144943 A CN 202210144943A CN 114780602 A CN114780602 A CN 114780602A
Authority
CN
China
Prior art keywords
data
tracing
extraction
model
processed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210144943.6A
Other languages
Chinese (zh)
Inventor
王海平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN202210144943.6A priority Critical patent/CN114780602A/en
Publication of CN114780602A publication Critical patent/CN114780602A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/243Natural language query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2132Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on discrimination criteria, e.g. discriminant analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Fuzzy Systems (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the technical field of artificial intelligence, and discloses a data traceability analysis method, a data traceability analysis device, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring data to be processed; performing feature extraction on policy data in the data to be processed by using an extraction model to obtain standardized data; tracing the source of the resource data in the data to be processed according to the standardized data by using a tracing model to obtain a detailed report; and integrating the data in the detailed report according to a first preset condition to obtain a target report. The application also relates to a block chain technology, and the detailed report and the target report are stored in the block chain. The resource data tracing method and the resource data tracing system achieve rapid and clear tracing of the resource data.

Description

Data tracing analysis method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to a data tracing analysis method and apparatus, a computer device, and a storage medium.
Background
At present, various large companies sell various products, including dozens of products and even hundreds of products, and the scenes of bundled sales and the like exist in the product sales, and the cost of selling the products is generally settled according to main products or companies. Thus, it is difficult to directly obtain the profit of each product. At present, the work of data apportionment is mainly carried out by a computer, and is mainly processed and viewed by PL/SQL of ocacle. However, with the continuous refinement of the original data, the PL/SQL using ocacle can only check a certain class or item mechanically, and with the continuous increase of the original data, the speed of processing and checking using the PL/SQL using ocacle is slower and slower, so how to trace the source of the data quickly and clearly becomes a problem to be solved urgently.
Disclosure of Invention
The application provides a data tracing analysis method and device, computer equipment and a storage medium, and aims to solve the problem that how to quickly and clearly trace the source of data in the prior art.
In order to solve the above problem, the present application provides a data tracing analysis method, including:
acquiring data to be processed;
carrying out feature extraction on policy data in the data to be processed by utilizing an extraction model to obtain standardized data;
tracing the resource data in the data to be processed according to the standardized data by using a tracing model to obtain a detailed report;
and integrating the data in the detailed report according to a first preset condition to obtain a target report.
Further, before the performing feature extraction on the policy data in the data to be processed by using an extraction model, the method further includes:
identifying abnormal values in the policy data by using a second preset condition, and extracting text data of positions where the abnormal values are located;
performing semantic extraction on the text data through a semantic recognition model to obtain an extraction result;
and correcting the abnormal value by using the extraction result.
Further, the extracting the features of the policy data in the data to be processed by using the extraction model comprises:
and performing feature extraction on the policy data through a preset regular expression in the extraction model to obtain standardized data.
Further, the obtaining of standardized data by performing feature extraction on policy data in the data to be processed by using an extraction model includes:
respectively inputting each paragraph text in the policy data into an extraction model for feature extraction to obtain key data corresponding to each paragraph, wherein the extraction model is obtained based on LDA model training;
and combining the key data corresponding to each section in the policy data to obtain the standardized data.
Further, after the obtaining of the standardized data by performing feature extraction on the policy data in the data to be processed by using the extraction model, the method further includes:
acquiring a standard data type corresponding to the data based on the data type of each data in the standardized data;
comparing the data type corresponding to the data with the standard data type, and judging whether the data type is consistent with the standard data type;
and if the data are inconsistent, converting the data by using a conversion algorithm.
Further, the tracing the resource data in the data to be processed according to the standardized data by using the tracing model includes:
and based on the tracing model, performing multi-dimensional tracing on the resource data by using the standardized data to realize subdivision of the resource data.
Further, the integrating the data in the detail report according to the first preset condition includes:
receiving an execution instruction of a front end;
extracting the corresponding first preset condition according to the execution instruction;
and integrating the data in the detail report according to the first preset condition.
In order to solve the above problem, the present application further provides a data tracing analysis apparatus, including:
the acquisition module is used for acquiring data to be processed;
the extraction module is used for extracting the characteristics of policy data in the data to be processed by utilizing an extraction model to obtain standardized data;
the source tracing module is used for tracing the source of the resource data in the data to be processed according to the standardized data by using a source tracing model to obtain a detailed report;
and the integration module is used for integrating the data in the detailed report according to a first preset condition to obtain a target report.
In order to solve the above problem, the present application also provides a computer device, including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the data traceability analysis method as described above.
In order to solve the above problem, the present application further provides a non-volatile computer readable storage medium, on which computer readable instructions are stored, and when the computer readable instructions are executed by a processor, the data traceability analysis method as described above is implemented.
Compared with the prior art, the data traceability analysis method, the data traceability analysis device, the computer equipment and the storage medium provided by the embodiment of the application have at least the following beneficial effects:
the method comprises the steps of obtaining data to be processed, utilizing an extraction model to carry out feature extraction on policy data in the data to be processed to obtain standardized data so as to improve the efficiency of subsequent processing steps, then utilizing a traceability model to trace the source of resource data in the data to be processed according to the standardized data to obtain a detail report so as to obtain refined data of multiple dimensions of the resource data, finally integrating the data in the detail report according to a first preset condition to obtain a target report, realizing the combination of the refined data as required to obtain the target report, and realizing the rapid and clear tracing of the resource data.
Drawings
In order to more clearly illustrate the solution of the present application, a brief description will be given below of the drawings required for describing the embodiments of the present application, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without inventive effort.
Fig. 1 is a schematic flowchart of a data source tracing analysis method according to an embodiment of the present application;
FIG. 2 is a flowchart of one embodiment of step S2 in FIG. 1;
fig. 3 is a schematic block diagram of a data tracing analysis apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the foregoing drawings are used for distinguishing between different objects and not for describing a particular sequential order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. One skilled in the art will explicitly or implicitly appreciate that the embodiments described herein can be combined with other embodiments.
The application provides a data traceability analysis method. Referring to fig. 1, fig. 1 is a schematic flowchart of a data source tracing analysis method according to an embodiment of the present disclosure.
In this embodiment, the data tracing analysis method includes:
s1, acquiring data to be processed;
specifically, the data to be processed includes policy data and resource data, the policy data is acquired through a business system, the resource data is acquired through a financial system, further, the policy data and the resource data are acquired from the business system and the financial system in real time or at regular time and stored in a database, and the policy data and the resource data are acquired subsequently and directly acquired from the database, wherein the database can be an HIVE database.
The policy data specifically includes data such as a policy number, a policy risk category, a premium, an organization number, a customer number, a channel number, and a insured person's condition.
The resource data includes commission (corresponding to each risk), insurance fee, claim settlement, business fee and the like.
S2, performing feature extraction on policy data in the data to be processed by using an extraction model to obtain standardized data;
specifically, feature extraction is carried out on policy-keeping data in the data to be processed by utilizing a regular expression in an extraction model, and integration is carried out to obtain standardized data; or alternatively
And respectively extracting the features of the texts of each paragraph in the policy data through an extraction model obtained by LDA model training to obtain key data of each paragraph, and integrating the key data to obtain the standardized data.
The standardized data insurance policy number, the dangerous seeds, the purchasing channels, the affiliated organizations and the like. The standardized data is subsequently used as a traceability factor to process the resource data.
Further, before the performing feature extraction on the policy data in the data to be processed by using an extraction model, the method further includes:
identifying abnormal values in the policy data by using a second preset condition, and extracting text data of positions of the abnormal values;
performing semantic extraction on the text data through a semantic recognition model to obtain an extraction result;
and correcting the abnormal value by using the extraction result.
Specifically, the abnormal value in the policy data is identified by using a second preset condition, for example, the second preset condition is used for defining the age of the insured person, and is limited to 0-100 years, the insured person condition in the policy data is identified by using the second preset condition, and the insured person age is identified to 130 years, so as to extract the text data of the paragraph of which the insured person age is 130 years or the adjacent paragraph of the paragraph, the specific number of the extracted paragraphs can be adjusted as required, and the text data is extracted semantically by using a semantic recognition model, so as to obtain the extraction result.
When the insured person condition is recorded, the family condition (the birth year and month and the identity number of the insured person) of the insured person is usually recorded, the family condition is semantically identified to obtain the age of the parents and/or children of the insured person, so as to deduce the age of the insured person, and the deduced age is used for replacing the corresponding abnormal value; or alternatively
When the age of the insured person is abnormal, the text data is identified through the semantic identification model, the birth year and month or the identity number of the insured person are identified, the semantic identification model obtains the real age of the insured person through the birth year and month or the identity number of the insured person, and therefore the abnormal value is corrected.
The semantic recognition Model is obtained by training based on a Hidden Markov Model (HMM), wherein the Hidden Markov Model is a time sequence probability Model and describes an unobservable Hidden state sequence generated randomly by a Hidden Markov chain, and an observed value is generated randomly under each Hidden state to form an observable random sequence. The key is that the state sequence satisfies the Markov property, and the observable sequence column is randomly generated from the hidden state sequence with a certain probability.
The abnormal value is corrected by utilizing the semantic recognition model, so that the accuracy of tracing the data subsequently is improved.
Further, the extracting the features of the policy data in the data to be processed by using the extraction model comprises:
and performing feature extraction on the policy data through a preset regular expression in the extraction model to obtain standardized data.
Specifically, because the text in the policy data is formatted, the feature extraction can be performed in the form of a preset regular expression, and the feature extraction can also be preferably completed and combined, so as to obtain standardized data.
Feature extraction is performed through the regular expression to obtain standardized data, and processing efficiency is improved.
Further, as shown in fig. 2, the obtaining of the normalized data by performing feature extraction on the policy data in the to-be-processed data by using the extraction model includes:
respectively inputting each paragraph text in the policy data into an extraction model for feature extraction to obtain key data corresponding to each paragraph, wherein the extraction model is obtained based on LDA model training;
and combining the key data corresponding to each section in the policy data to obtain the standardized data.
Specifically, because the policy data is relatively formatted, but there is no fixed format when there are a plurality of security risk types in a policy, or when the situation of the insured person is described, the extraction model obtained by utilizing LDA training has a relatively good processing effect on both formatted data and unformatted data, so that the feature extraction is performed by inputting each paragraph text in the policy data into the extraction model;
or respectively inputting each part of text into an extraction model for feature extraction, thereby obtaining key data corresponding to each paragraph. The said parts are, for example, two parts of insurance policy risk and insured person situation, and the two parts will be respectively processed with one feature extraction, no matter the insured person situation has several text segments.
And combining the key data corresponding to each section in the policy data to obtain the standardized data and the like.
Lda (late Dirichlet allocation) is a document topic generation model, also called a three-layer bayesian probability model, and includes three layers of structures of words, topics and documents. By generative model, we mean that each word of an article is considered to be obtained through a process of "selecting a topic with a certain probability and selecting a word from the topic with a certain probability".
By utilizing the extraction model obtained based on LDA model training, the feature extraction is carried out, the processing efficiency is improved, and meanwhile, the accuracy of the feature extraction can be improved.
Further, after the obtaining of the standardized data by performing feature extraction on the policy data in the data to be processed by using the extraction model, the method further includes:
acquiring a standard data type corresponding to the data based on the data type of each data in the standardized data;
comparing the data type corresponding to the data with the standard data type, and judging whether the data types are consistent;
and if the data are inconsistent, converting the data by using a conversion algorithm.
Specifically, after the standardized data is obtained, the unit of each data in the standardized data, that is, the data type, is also judged, whether the data types of each data in the standardized data are consistent or not is judged, and if the data types are inconsistent, the subsequent traceability process is seriously influenced; specifically, according to the data type of each data in the standardized data, acquiring a labeled data type corresponding to the data, comparing the data type corresponding to each data with a standard data type, and judging whether the data types are consistent;
for example, annual payment and monthly payment may exist in the payment type, and the payment fees corresponding to the payment types are different, and when the standard data type corresponding to the payment type is monthly payment, the payment fee whose payment type is annual payment is converted, and the data is specifically converted by using a conversion algorithm. If they are consistent, no processing is performed.
By judging whether the data types of the data are unified or not and converting the non-unified data by using a conversion algorithm, the data types of the data are unified, and the accuracy of tracing the data subsequently is improved.
S3, tracing the source of the resource data in the data to be processed according to the standardized data by using a tracing model to obtain a detailed report;
specifically, the resource data is generally financial data under the product category, and the data corresponds to policy data, but the data is too high and is not beneficial to subsequent analysis, so that the resource data is traced according to the standardized data by using a tracing model to obtain a detailed report, namely financial detailed data.
Further, the tracing the resource data in the data to be processed according to the standardized data by using the tracing model includes:
and based on the tracing model, performing multi-dimensional tracing on the resource data by using the standardized data to realize subdivision of the resource data.
Specifically, since the resource data is generally financial data under the product category, that is, the financial data is refined to each product category, and the data has little utilization value for related personnel, the resource data needs to be further refined to subclasses, subclasses or more refined categories; the obtained standardized data and the traceability model are required to be utilized to realize multi-dimensional traceability refinement of the resource data.
And taking the standardized data as a source tracing factor, taking the resource data as a source number to be traced, and tracing the source number to be traced by using the source tracing factor.
For example, the resource data can be refined layer by layer according to the subclass under the product major class in the standardized data, the further subclass under the dangerous class and the further policy under the dangerous class; or alternatively
The subclass under the product major category is further refined according to the channel to which the policy under the subclass belongs;
and the resource data are traced to the source in multiple dimensions, and the resource data are subdivided.
The source tracing model can distribute the resource data layer by layer downwards based on the sequence of the product major category, the product lower category, the dangerous seed and the policy to obtain the finest dimensionality.
And fast and clear tracing of the resource data is realized by utilizing a tracing model, and the resource data is subdivided.
S4, integrating the data in the detail report according to a first preset condition to obtain a target report;
specifically, the first preset condition is obtained correspondingly according to the selection of the user at the front end, and the data in the detail report is integrated according to the first preset condition, so that a target report is obtained.
Further, the integrating the data in the detail report according to the first preset condition includes:
receiving an execution instruction of a front end;
extracting the corresponding first preset condition according to the execution instruction;
and integrating the data in the detail report according to the first preset condition.
Specifically, the front end generates a corresponding execution instruction according to the selection of a user by receiving an execution instruction of the front end, namely selecting a required target report according to the requirement of a service, the rear end receives the execution instruction and extracts a corresponding preset condition according to the execution instruction, and finally, data in the detail report is integrated according to the preset condition;
for example, when the user selects the profit reports of each channel at the front end, since the detail data stored in the detail report is the detail data obtained by performing multi-dimensional traceability on the resource data by using the standardized data, the profit data of each channel can be obtained by classifying the detail data according to the channels in the detail data, and the profit data of each channel is collected and sequenced to obtain the target report.
In other embodiments of the present application, the target report can also be a profit report for a single product, a profit report for an organization, and the like.
According to the execution instruction, the corresponding first preset condition is obtained to integrate the detailed report, so that the corresponding target report is generated according to the user requirement, and the corresponding target report is quickly and clearly obtained.
It is emphasized that, in order to further ensure the privacy and security of the data, all the data of the detail report and the target report may also be stored in the nodes of a blockchain.
The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The method comprises the steps of obtaining data to be processed, utilizing an extraction model to carry out feature extraction on policy data in the data to be processed to obtain standardized data so as to improve the efficiency of subsequent processing steps, then utilizing a traceability model to trace the source of resource data in the data to be processed according to the standardized data to obtain a detail report so as to obtain refined data of multiple dimensions of the resource data, finally integrating the data in the detail report according to a first preset condition to obtain a target report, realizing the combination of the refined data as required to obtain the target report, and realizing the rapid and clear tracing of the resource data.
The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
The embodiment also provides a data tracing analysis apparatus, which is a functional module diagram of the data tracing analysis apparatus according to the present application, as shown in fig. 3.
The data tracing analysis apparatus 100 may be installed in an electronic device. According to the implemented functions, the data traceability analysis apparatus 100 may include an obtaining module 101, an extracting module 102, a traceability module 103 and an integrating module 104. A module, which may also be referred to as a unit in this application, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions of the respective modules/units are as follows:
an obtaining module 101, configured to obtain data to be processed;
the extraction module 102 is configured to perform feature extraction on policy data in the to-be-processed data by using an extraction model to obtain standardized data;
further, the data tracing analysis apparatus 100 further includes an anomaly identification module, a semantic extraction module, and a correction module;
the abnormal identification module is used for identifying an abnormal value in the policy data by using a second preset condition and extracting text data of the position where the abnormal value is located;
the semantic extraction module is used for performing semantic extraction on the text data through a semantic recognition model to obtain an extraction result;
and the correcting module is used for correcting the abnormal value by using the extraction result.
Through the cooperation of the abnormal recognition module, the semantic extraction module and the correction module, the semantic recognition module is used for correcting abnormal values, and the accuracy of subsequent data tracing is improved.
Further, the extraction module 102 includes a regular extraction sub-module;
the regular extraction submodule is used for extracting the characteristics of the policy data through a preset regular expression in the extraction model to obtain standardized data.
Through the regular extraction submodule, the feature extraction is carried out by utilizing the regular expression so as to obtain standardized data, and the processing efficiency is improved.
Further, the extraction module 102 includes a feature extraction sub-module and a combination sub-module;
the feature extraction submodule is used for inputting each paragraph text in the policy data into an extraction model respectively for feature extraction to obtain key data corresponding to each paragraph, and the extraction model is obtained based on LDA model training;
and the combination sub-module is used for combining the key data corresponding to each section in the policy data to obtain the standardized data.
By matching the feature extraction submodule and the combination submodule and utilizing the extraction model obtained based on LDA model training, the feature extraction is carried out, the processing efficiency is improved, and meanwhile, the accuracy of feature extraction can be improved.
Further, the data tracing analysis apparatus 100 further includes a standard data obtaining module, a judging module and a converting module;
the standard data acquisition module is used for acquiring a standard data type corresponding to each data based on the data type of each data in the standardized data;
the judging module is used for comparing the data type corresponding to the data with the standard data type and judging whether the data type is consistent with the standard data type;
and the conversion module is used for converting the data by using a conversion algorithm if the data are inconsistent.
Through the cooperation of the standard data acquisition module, the judgment module and the conversion module, whether the data types of the data are unified or not is judged, and the non-unified data are converted by using a conversion algorithm, so that the data types of the data are unified, and the accuracy of subsequent data tracing is improved.
The tracing module 103 is configured to trace a source of the resource data in the data to be processed according to the standardized data by using a tracing model to obtain a detailed report;
further, the source tracing module 103 includes a subdivision sub-module;
and the subdivision submodule is used for carrying out multi-dimensional traceability on the resource data by utilizing the standardized data based on the traceability model so as to realize subdivision on the resource data.
And the resource data is quickly and clearly traced by utilizing a tracing model through a subdivision submodule, so that the resource data is subdivided.
And the integration module 104 is configured to integrate the data in the detailed report according to a first preset condition to obtain a target report.
The integration module 104 comprises a receiving submodule, a condition extraction submodule and a corresponding integration submodule;
the receiving submodule is used for receiving an execution instruction of a front end;
the condition extraction submodule is used for extracting the corresponding first preset condition according to the execution instruction;
and the corresponding integration submodule is used for integrating the data in the detail report according to the first preset condition.
The detailed report is integrated by matching the receiving sub-module, the condition extraction sub-module and the corresponding integration sub-module and acquiring the corresponding first preset condition according to the execution instruction, so that the corresponding target report is generated according to the user requirement, and the corresponding target report is rapidly and clearly obtained.
By adopting the device, the data tracing analysis device 100 obtains standardized data by matching the obtaining module 101, the extracting module 102, the tracing module 103 and the integrating module 104, obtains the data to be processed, performs feature extraction on policy data in the data to be processed by using the extracting model so as to improve the efficiency of subsequent processing steps, obtains a detailed report by using the tracing model, traces the resource data in the data to be processed according to the standardized data so as to obtain detailed data of multiple dimensions of the resource data, and finally integrates the data in the detailed report according to a first preset condition so as to obtain a target report, so that the detailed data are combined as required to obtain the target report, and the resource data are traced quickly and clearly.
The embodiment of the application also provides computer equipment. Referring to fig. 4 in particular, fig. 4 is a block diagram of a basic structure of a computer device according to the embodiment.
The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It is noted that only computer device 4 having components 41-43 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to instructions set or stored in advance, and the hardware thereof includes but is not limited to a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user in a keyboard mode, a mouse mode, a remote controller mode, a touch panel mode or a voice control equipment mode.
The memory 41 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 4. Of course, the memory 41 may also include both an internal storage unit of the computer device 4 and an external storage device thereof. In this embodiment, the memory 41 is generally used for storing an operating system and various application software installed in the computer device 4, for example, computer readable instructions of a data tracing analysis method, etc. Further, the memory 41 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 42 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute the computer readable instructions stored in the memory 41 or process data, for example, execute the computer readable instructions of the data tracing analysis method.
The network interface 43 may comprise a wireless network interface or a wired network interface, and the network interface 43 is generally used for establishing communication connection between the computer device 4 and other electronic devices.
In this embodiment, when the processor executes the computer readable instruction stored in the memory, the steps of the data tracing analysis method in the above embodiments are implemented, the to-be-processed data is obtained, the extraction model is used to perform feature extraction on policy data in the to-be-processed data to obtain standardized data, so as to improve the efficiency of subsequent processing steps, the tracing model is used again to trace the source of resource data in the to-be-processed data according to the standardized data to obtain a detailed report, so as to obtain refined data of multiple dimensions of the resource data, and finally the data in the detailed report is integrated according to a first preset condition to obtain a target report, so that the refined data are combined as required to obtain the target report, and the resource data are traced quickly and clearly.
The embodiment of the present application further provides a computer-readable storage medium, where the computer-readable instructions are stored, and the computer-readable instructions can be executed by at least one processor, so that the at least one processor performs the steps of the data tracing analysis method as described above, by obtaining data to be processed, performing feature extraction on policy data in the data to be processed by using an extraction model to obtain standardized data so as to improve the efficiency of subsequent processing steps, and then tracing resource data in the data to be processed according to the standardized data by using a tracing model to obtain a detailed report so as to obtain refined data of multiple dimensions of the resource data, and finally integrating data in the detailed report according to a first preset condition to obtain a target report, so as to combine the refined data as needed, and a target report is obtained, and the resource data can be quickly and clearly traced.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application or portions thereof that contribute to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (such as a ROM/RAM, a magnetic disk, and an optical disk), and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
The data traceability analysis apparatus, the computer device, and the computer readable storage medium according to the embodiments of the present application have the same technical effects as the data traceability analysis method according to the embodiments, and are not expanded herein.
It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and the embodiments are provided so that this disclosure will be thorough and complete. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims (10)

1. A data tracing analysis method is characterized by comprising the following steps:
acquiring data to be processed;
carrying out feature extraction on policy data in the data to be processed by utilizing an extraction model to obtain standardized data;
tracing the resource data in the data to be processed according to the standardized data by using a tracing model to obtain a detailed report;
and integrating the data in the detailed report according to a first preset condition to obtain a target report.
2. The data tracing analyzing method of claim 1, wherein before said extracting the features of the policy data in the data to be processed by using the extraction model, further comprising:
identifying abnormal values in the policy data by using a second preset condition, and extracting text data of positions of the abnormal values;
performing semantic extraction on the text data through a semantic recognition model to obtain an extraction result;
and correcting the abnormal value by using the extraction result.
3. The data traceability analysis method of claim 1, wherein the performing feature extraction on the policy data in the data to be processed by using an extraction model comprises:
and performing feature extraction on the policy data through a preset regular expression in the extraction model to obtain standardized data.
4. The data tracing analysis method of claim 1, wherein the obtaining standardized data by performing feature extraction on policy data in the data to be processed by using an extraction model comprises:
respectively inputting each paragraph text in the policy data into an extraction model for feature extraction to obtain key data corresponding to each paragraph, wherein the extraction model is obtained based on LDA model training;
and combining the key data corresponding to each section in the policy data to obtain the standardized data.
5. The data tracing analysis method according to claim 1, wherein after said obtaining standardized data by performing feature extraction on policy data in said data to be processed using an extraction model, further comprising:
acquiring a standard data type corresponding to each data based on the data type of each data in the standardized data;
comparing the data type corresponding to the data with the standard data type, and judging whether the data type is consistent with the standard data type;
and if the data are inconsistent, converting the data by using a conversion algorithm.
6. The data tracing analysis method according to claim 1, wherein the tracing, by using the tracing model, the resource data in the data to be processed according to the standardized data includes:
and based on the tracing model, performing multi-dimensional tracing on the resource data by using the standardized data to realize subdivision of the resource data.
7. The data traceability analysis method of claim 1, wherein the integrating the data in the detailed report according to the first preset condition comprises:
receiving an execution instruction of a front end;
extracting the corresponding first preset condition according to the execution instruction;
and integrating the data in the detail report according to the first preset condition.
8. A data traceability analysis apparatus, the apparatus comprising:
the acquisition module is used for acquiring data to be processed;
the extraction module is used for performing feature extraction on the policy data in the data to be processed by utilizing an extraction model to obtain standardized data;
the source tracing module is used for tracing the source of the resource data in the data to be processed according to the standardized data by using a source tracing model to obtain a detailed report;
and the integration module is used for integrating the data in the detailed report according to a first preset condition to obtain a target report.
9. A computer device, characterized in that the computer device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores computer readable instructions which, when executed by the processor, implement the data traceability analysis method of any one of claims 1 to 7.
10. A computer readable storage medium having computer readable instructions stored thereon, which when executed by a processor implement the data traceability analysis method as claimed in any one of claims 1 to 7.
CN202210144943.6A 2022-02-17 2022-02-17 Data tracing analysis method and device, computer equipment and storage medium Pending CN114780602A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210144943.6A CN114780602A (en) 2022-02-17 2022-02-17 Data tracing analysis method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210144943.6A CN114780602A (en) 2022-02-17 2022-02-17 Data tracing analysis method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114780602A true CN114780602A (en) 2022-07-22

Family

ID=82423251

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210144943.6A Pending CN114780602A (en) 2022-02-17 2022-02-17 Data tracing analysis method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114780602A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115630045A (en) * 2022-12-06 2023-01-20 遵义时月凯网络科技有限公司 Data tracing and tracking method based on artificial intelligence and AI system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115630045A (en) * 2022-12-06 2023-01-20 遵义时月凯网络科技有限公司 Data tracing and tracking method based on artificial intelligence and AI system
CN115630045B (en) * 2022-12-06 2023-07-21 上海埃林哲软件***股份有限公司 Data tracing method and AI system based on artificial intelligence

Similar Documents

Publication Publication Date Title
CN111695439A (en) Image structured data extraction method, electronic device and storage medium
CN110765101B (en) Label generation method and device, computer readable storage medium and server
CN110147540B (en) Method and system for generating business security requirement document
CN112347254B (en) Method, device, computer equipment and storage medium for classifying news text
CN113726784A (en) Network data security monitoring method, device, equipment and storage medium
CN111783471A (en) Semantic recognition method, device, equipment and storage medium of natural language
CN111625567A (en) Data model matching method, device, computer system and readable storage medium
CN114780602A (en) Data tracing analysis method and device, computer equipment and storage medium
CN117273968A (en) Accounting document generation method of cross-business line product and related equipment thereof
CN117408821A (en) Insurance claim verification method, apparatus, computer device and storage medium
CN116956326A (en) Authority data processing method and device, computer equipment and storage medium
CN117133006A (en) Document verification method and device, computer equipment and storage medium
CN116757812A (en) Method, device, electronic equipment and storage medium for detecting abnormal data
CN113779198A (en) Electronic business card generating method, device, equipment and medium based on artificial intelligence
CN113822215A (en) Equipment operation guide file generation method and device, electronic equipment and storage medium
CN114064893A (en) Abnormal data auditing method, device, equipment and storage medium
US12026458B2 (en) Systems and methods for generating document templates from a mixed set of document types
CN116364223B (en) Feature processing method, device, computer equipment and storage medium
CN116628128B (en) Method, device and equipment for standardization of supply chain data and storage medium thereof
CN117389607A (en) Signboard configuration method and device, computer equipment and storage medium
CN115829768A (en) Data calculation method, device and equipment based on rule engine and storage medium
CN114997507A (en) Behavior path generation method, behavior path generation device, behavior path generation equipment and storage medium
CN117251799A (en) Financial certificate processing method and device, computer equipment and storage medium
CN117933699A (en) Task analysis method, device, computer equipment and storage medium
CN115829763A (en) Data transmission method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination