CN115238009A

CN115238009A - Metadata management method, device and equipment based on blood vessel margin analysis and storage medium

Info

Publication number: CN115238009A
Application number: CN202210938163.9A
Authority: CN
Inventors: 龚官岱
Original assignee: Ping An Property and Casualty Insurance Company of China Ltd
Current assignee: Ping An Property and Casualty Insurance Company of China Ltd
Priority date: 2022-08-05
Filing date: 2022-08-05
Publication date: 2022-10-25

Abstract

The application discloses a metadata management method, a metadata management device, metadata management equipment and a metadata management storage medium based on blood vessel margin analysis, and belongs to the technical field of artificial intelligence. According to the data classification method and device, the metadata description corresponding to the data to be processed is obtained, the blood margin analysis is carried out according to the metadata description, the blood margin information of the data to be processed is obtained, the data are divided according to the blood margin information to obtain a blood margin data set, the blood margin data set is input into a pre-trained data classification model, a data classification result is obtained, the mapping relation among the data to be processed is recorded in the data classification result, a data map of the data to be processed is constructed according to the mapping relation and the metadata description, and the data management is carried out on the data to be processed according to the data map. In addition, the present application also relates to blockchain techniques in which data to be processed may be stored. According to the data processing method and device, classification management is carried out on the data to be processed by combining the blood relationship information and the metadata description of the data, the reliability of data management is improved, and the data utilization value is further improved.

Description

Metadata management method, device and equipment based on blood vessel margin analysis and storage medium

Technical Field

The application belongs to the technical field of artificial intelligence, and particularly relates to a metadata management method, device, equipment and storage medium based on blood relationship analysis.

Background

Data management is a process of effectively collecting, storing, processing and applying data by utilizing computer hardware and software technologies, but in the actual data management process, the storage, processing and application of a lot of data are not completely in line with enterprise management methods, even for a small enterprise, a complete data management method is not provided, so that the problems of difficult enterprise access, difficult use, difficult analysis and the like are caused, and the overall operation, management and management efficiency of the enterprise is influenced.

The current data asset management platform on the market only reflects the function of data display, but has no complete solution to the deep problems such as data governance, data quality monitoring and data standard management. Taking the company data table metadata management as an example, the current data table metadata management has the following problems: because deep analysis is not carried out on the data sheet, data monitoring only stops on the monitoring data sheet, data information of a metadata management layer is not effectively displayed and processed, and when metadata corresponding to the data sheet is abnormal, developers cannot timely know abnormal reasons, so that data abnormality cannot be timely processed.

Disclosure of Invention

An object of the embodiments of the present application is to provide a metadata management method and apparatus, a computer device, and a storage medium based on blood vessel analysis, so as to solve the technical problems that data monitoring in the existing data asset management scheme only stays in data on a monitoring data table, and when metadata corresponding to the data table is abnormal, developers cannot know the reason of the abnormality in time and deal with the data abnormality in time.

In order to solve the above technical problem, an embodiment of the present application provides a metadata management method based on blood relationship analysis, which adopts the following technical solutions:

a method for metadata management based on vessel analysis, comprising:

acquiring data to be processed from a preset data table, and acquiring metadata description corresponding to each data to be processed to obtain a first metadata description;

performing blood relationship analysis on the data to be processed according to the first metadata description to obtain blood relationship information of the data to be processed;

dividing the data to be processed according to the blood relationship information of the data to be processed to obtain a blood relationship data set;

inputting the blood margin data set into a pre-trained data classification model to obtain a data classification result of the data to be processed, wherein the data classification result records the mapping relation among the data to be processed;

constructing a data map of the data to be processed according to the mapping relation among the data to be processed in the data classification result and the metadata description corresponding to the data to be processed;

and performing data management on the data to be processed according to the data map.

Further, performing blood relationship analysis on the data to be processed according to the first metadata description to obtain blood relationship information of the data to be processed, specifically comprising:

converting the data to be processed carrying the first metadata description into a script file of an SQL code;

extracting a regularized SQL statement from a script file of the SQL code, and converting the SQL statement into an abstract syntax tree;

traversing the abstract syntax tree to obtain the logical relations of all tree nodes in the abstract syntax tree;

and obtaining the blood relationship information of the data to be processed based on the logical relations of all the tree nodes.

Further, the tree nodes include root nodes and leaf nodes, the abstract syntax tree is traversed, and the logic relationships of all the tree nodes in the abstract syntax tree are obtained, which specifically includes:

traversing the abstract syntax tree from the root node to the leaf node at the lowest layer of the abstract syntax tree;

and extracting the logical relations among all the adjacent tree nodes to obtain the logical relations of all the tree nodes in the abstract syntax tree.

Further, the data classification model includes an encoding layer and a decoding layer, and the blood vessel data set is input to the pre-trained data classification model to obtain a data classification result of the data to be processed, which specifically includes:

performing feature extraction and feature vector conversion on the data to be processed in the blood margin data set to obtain a data feature vector;

coding the data characteristic vectors through a coding layer of the data classification model to obtain data coding vectors;

carrying out space mapping on the data coding vector to obtain a space mapping result of the data to be processed;

and decoding the space mapping result of the data to be processed through a decoding layer of the data classification model to obtain a data classification result of the data to be processed.

Further, before the blood margin data set is input into the pre-trained data classification model to obtain the data classification result of the data to be processed, the method further comprises the following steps:

acquiring sample data from a preset database, and acquiring metadata description corresponding to each sample data to obtain second metadata description;

performing blood margin analysis on the sample data according to the second metadata description to obtain blood margin information of the sample data;

performing data division on the sample data according to the blood margin information of the sample data to obtain a sample blood margin data set;

importing a sample blood margin data set into a preset transformer pre-training model, wherein the transformer pre-training model comprises an encoding layer and a decoding layer;

performing feature extraction and feature vector conversion on sample data in the sample blood margin dataset to obtain a sample feature vector;

coding the sample characteristic vector through a coding layer of a transformer pre-training model to obtain a sample coding vector;

carrying out space mapping on the sample coding vector to obtain a space mapping result of sample data;

decoding the space mapping result of the sample data through a decoding layer of a transducer pre-training model to obtain a data classification result of the sample data;

and iteratively updating the transformer pre-training model based on the data classification result of the sample data to obtain a trained data classification model.

Further, iteratively updating the transform pre-training model based on the data classification result of the sample data to obtain a trained data classification model, specifically comprising:

obtaining a loss function of a transformer pre-training model;

calculating a relative error between the data classification result and a preset standard classification result based on a loss function to obtain a classification error;

transmitting a classification error in a transformer pre-training model, and comparing the classification error with a preset error threshold;

and if the classification error is larger than a preset error threshold value, iteratively updating the transformer pre-training model until the model is fitted to obtain a trained data classification model.

Further, constructing a data map of the data to be processed according to the mapping relationship between the data to be processed in the data classification result and the metadata description corresponding to the data to be processed, specifically including:

drawing an initial data map according to the mapping relation among the data to be processed in the data classification result;

and adding metadata description corresponding to each data to be processed in the initial data map to form a data map of the data to be processed.

In order to solve the above technical problem, an embodiment of the present application further provides a metadata management apparatus based on blood vessel analysis, which adopts the following technical solutions:

a blood-based analysis metadata management apparatus, comprising:

the data acquisition module is used for acquiring data to be processed from a preset data table and acquiring metadata descriptions corresponding to the data to be processed to obtain a first metadata description;

the first analysis module is used for carrying out blood relationship analysis on the data to be processed according to the first metadata description to obtain blood relationship information of the data to be processed;

the first dividing module is used for dividing the data to be processed according to the blood relationship information of the data to be processed to obtain a blood relationship data set;

the data classification module is used for inputting the blood margin data set into a pre-trained data classification model to obtain a data classification result of the data to be processed, wherein the data classification result records the mapping relation among the data to be processed;

the data summarizing module is used for constructing a data map of the data to be processed according to the mapping relation among the data to be processed in the data classification result and the metadata description corresponding to the data to be processed;

and the data management module is used for performing data management on the data to be processed according to the data map.

In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:

a computer device comprising a memory having computer readable instructions stored therein and a processor that when executed implements the steps of a method of blood-based analysis metadata management as in any one of the above.

In order to solve the foregoing technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:

a computer readable storage medium having computer readable instructions stored thereon which, when executed by a processor, implement the steps of a method of blood-based analysis metadata management as claimed in any one of the preceding claims.

Compared with the prior art, the embodiment of the application mainly has the following beneficial effects:

the application discloses a metadata management method, a metadata management device, metadata management equipment and a metadata management storage medium based on blood relationship analysis, and belongs to the technical field of artificial intelligence. According to the data classification method and device, the metadata description corresponding to the data to be processed is obtained, the blood margin analysis is carried out according to the metadata description, the blood margin information of the data to be processed is obtained, the data are divided according to the blood margin information to obtain a blood margin data set, the blood margin data set is input into a pre-trained data classification model, a data classification result is obtained, the mapping relation among the data to be processed is recorded in the data classification result, a data map of the data to be processed is constructed according to the mapping relation and the metadata description, and the data management is carried out on the data to be processed according to the data map. The data classification method and the data classification system combine the blood relationship information and the metadata description of the data to simply and primarily classify the data to be processed, and further perform data classification management on the data to be processed through a trained data classification model, so that the reliability of data management is improved, and the data utilization value is further improved.

Drawings

In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.

FIG. 1 illustrates an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 illustrates a flow diagram for one embodiment of a method for blood-based analysis metadata management in accordance with the present application;

FIG. 3 illustrates a schematic structural diagram of one embodiment of a blood-based analysis metadata management apparatus according to the present application;

FIG. 4 shows a schematic block diagram of one embodiment of a computer device according to the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein may be combined with other embodiments.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

A user may use

terminal devices

101, 102, 103 to interact with a server 105 over a network 104 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the

terminal devices

101, 102, 103.

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.

The server 105 may be a server that provides various services, for example, a background server that provides support for pages displayed on the

terminal devices

101, 102, and 103, and may be an independent server, or a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a web service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), and a big data and artificial intelligence platform.

It should be noted that the method for managing metadata based on blood vessel analysis provided in the embodiments of the present application is generally performed by a server, and accordingly, a device for managing metadata based on blood vessel analysis is generally disposed in the server.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow diagram of one embodiment of a method for blood-based analysis metadata management is shown, in accordance with the present application. The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like. The metadata management method based on the blood relationship analysis comprises the following steps:

s201, obtaining data to be processed from a preset data table, and obtaining metadata description corresponding to each data to be processed to obtain a first metadata description.

Metadata (Metadata), also called intermediary data and relay data, is data (data about data) describing data, and is mainly information describing data attribute (property), and is used to support functions such as indicating storage location, history data, resource search, file record, and the like. Metadata is data describing objects such as information resources or data, and is used for the purpose of: identifying a resource; evaluating the resources; tracking changes of the resource in the using process; the method realizes simple and efficient management of a large amount of networked data; the information resources are effectively discovered, searched and integrally organized, and the used resources are effectively managed. Metadata is an electronic catalog, and in order to achieve the purpose of creating a catalog, the contents or features of data must be described and collected, so as to achieve the purpose of assisting data retrieval.

Specifically, the preset data table stores data to be processed and metadata descriptions corresponding to the data to be processed, the server obtains a plurality of data to be processed from the preset data table, and obtains the metadata descriptions corresponding to the data to be processed at the same time to obtain a first metadata description. For example, the server obtains the data to be processed as "XX serious risk insurance premium" of 50 ten thousand in the data table, wherein "50 ten thousand" represents the data itself, and "XX serious risk insurance premium" represents the metadata description of the data.

S202, performing blood relationship analysis on the data to be processed according to the first metadata description to obtain blood relationship information of the data to be processed.

The relationship between data is expressed by referring to the relationship of blood relationship in human society, which is called the relationship of blood relationship of data, and the relationship of blood relationship includes some specific characteristics: attribution of data (organization or individual to which data belongs), source diversity of data (more than one same data source), traceability of data (the whole process from generation to extinction of data), hierarchy of data (classification, summarization and summarization of data form a data hierarchy) and the like, and blood-related relationship of analyzed data is called blood-related analysis.

Specifically, the server converts the to-be-processed data carrying the metadata description into an SQL statement, converts the SQL statement into an abstract syntax tree, represents a blood relation between the to-be-processed data by leaf nodes of the abstract syntax tree, and obtains logical relations of all tree nodes in the abstract syntax tree and logical relations of all tree nodes in the abstract syntax tree by traversing the abstract syntax tree.

And S203, performing data division on the data to be processed according to the blood relationship information of the data to be processed to obtain a blood relationship data set.

Specifically, the server divides the data to be processed according to the blood relationship information of the data to be processed to obtain a blood relationship data set. In a specific embodiment of the present application, to-be-processed data having a blood relationship is divided into the same data set, and when all to-be-processed data are divided, a plurality of blood relationship data sets are obtained.

And S204, inputting the blood margin data set into a pre-trained data classification model to obtain a data classification result of the data to be processed, wherein the data classification result records the mapping relation among the data to be processed.

The data classification model is obtained based on training of a transformer pre-training model, the transformer pre-training model is a model based on an encoder-decoder structure, an encoder and a decoder of the model are both composed of an attention module and a foreitem neural network, the model is built by pure attention, the calculation speed is higher, and a better result is obtained on a translation task. Transformer was originally proposed as a machine-translated sequence-to-sequence model, while later studies showed that the Transformer-based pre-training model (PTM) performed optimally in each task. Therefore, transformer has become the preferred architecture in the NLP field, especially PTM. In addition to language-dependent applications, transformers are also used for CV, audio processing, and even chemical and life sciences.

Specifically, the server is trained in advance based on a transformer pre-training model to obtain a data classification model, and when data classification is performed, the blood vessel data set is input into the pre-trained data classification model, so that a data classification result of the data to be processed can be directly obtained, wherein the data classification result records the mapping relation among the data to be processed.

It should be noted that, when the data classification model is used to process the blood vessel data set, after feature extraction and feature coding are performed on the to-be-processed data in the blood vessel data set, the coding features are mapped to the same feature space, so that the relationship between the to-be-processed data in different blood vessel data sets can be obtained, and the to-be-processed data is further classified according to the relationship between the to-be-processed data in different blood vessel data sets.

And S205, constructing a data map of the data to be processed according to the mapping relation among the data to be processed in the data classification result and the metadata description corresponding to the data to be processed.

The data map is an enterprise data catalog management module provided on the basis of metadata, and covers functions of global data retrieval, metadata detail viewing, data preview, data consanguinity, data category management and the like, and the data can be better searched, understood and used due to the existence of the data map.

Specifically, the server constructs an initial data map according to the mapping relation among the data to be processed in the data classification result, and adds the metadata description corresponding to the data to be processed in the initial data map to form a data map of the data to be processed, the data map is convenient for developers to know the data information of each data to be processed more intuitively,

and S206, performing data management on the data to be processed according to the data map.

Specifically, the server performs data management on the data to be processed according to the data map, for example, when the data to be processed needs to be changed, the server receives the data change instruction, analyzes the data change instruction, obtains a data change field and a metadata description change field, and adaptively modifies the data to be processed and the metadata description in the data map according to the data change field and the metadata description change field.

In this embodiment, the electronic device (for example, the server shown in fig. 1) on which the blood-related analysis-based metadata management method operates may receive the data change instruction through a wired connection manner or a wireless connection manner. It is noted that the wireless connection may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a UWB (ultra wideband) connection, and other wireless connection now known or developed in the future.

In the embodiment, the data to be processed is simply primarily classified by combining the blood relationship information and the metadata description of the data, and the data to be processed is subjected to data classification management through a trained data classification model, so that the reliability of data management is improved, and the data utilization value is further improved.

Specifically, the server converts the to-be-processed data carrying the first metadata description into SQL codes, combines the SQL codes to form an SQL script file, extracts regular SQL statements from the script file of the SQL codes, converts the SQL statements into an abstract syntax tree, traverses the abstract syntax tree, obtains logical relationships of all tree nodes in the abstract syntax tree, and obtains the blood-related information of the to-be-processed data based on the logical relationships of all the tree nodes.

In another specific embodiment of the present application, the data to be processed may also be stored in the preset database in the form of an SQL code in advance, at this time, the data to be processed in the form of the SQL code is obtained between the servers, the conversion of the SQL statement into the abstract syntax tree is completed, the abstract syntax tree is traversed, the logical relationships of all tree nodes in the abstract syntax tree are obtained, and the blood relationship information of the data to be processed is obtained based on the logical relationships of all tree nodes.

Specifically, the tree nodes include a root node and a plurality of leaf nodes, the server traverses the abstract syntax tree from the root node to the leaf nodes at the bottom layer of the abstract syntax tree, and extracts the logical relations between all adjacent tree nodes to obtain the logical relations of all the tree nodes in the abstract syntax tree.

It should be additionally noted that after the blood relationship information of the data to be processed is obtained, the blood relationship information needs to be verified, when the blood relationship information is verified, the server traverses the entire abstract syntax tree from the leaf node at the bottom layer in a reverse direction to obtain a reverse logical relationship, the blood relationship information is verified by comparing the logical relationship and the reverse logical relationship of all the tree nodes in the abstract syntax tree, and after the blood relationship information is verified, the verified blood relationship information is output.

coding the data characteristic vector through a coding layer of the data classification model to obtain a data coding vector;

and decoding the space mapping result of the data to be processed through a decoding layer of the data classification model to obtain the data classification result of the data to be processed.

Specifically, the data classification model comprises a coding layer and a decoding layer, after the server inputs the blood vessel data set into the pre-trained data classification model, feature extraction and feature vector conversion are firstly carried out on the data to be processed in the blood vessel data set to obtain a data feature vector, then the data feature vector is coded through the coding layer of the data classification model to obtain a data coding vector, the data coding vector is subjected to spatial mapping, the data features extracted from different blood vessel data sets are mapped to the same feature space to obtain a spatial mapping result of the data to be processed, and the spatial mapping result of the data to be processed is decoded through the decoding layer of the data classification model to obtain a data classification result of the data to be processed.

carrying out space mapping on the sample coding vector to obtain a space mapping result of the sample data;

Specifically, a data classification model needs to be trained in advance before data classification, and the data classification model is obtained based on a transformer pre-training model. The method comprises the steps that a server obtains sample data from a preset database, obtains metadata description corresponding to each sample data to obtain second metadata description, conducts blood margin analysis on the sample data according to the second metadata description to obtain blood margin information of the sample data, conducts data division on the sample data according to the blood margin information of the sample data to obtain a sample blood margin data set, leads the sample blood margin data set into a preset transformer pre-training model, wherein the transformer pre-training model comprises a coding layer and a decoding layer, conducts feature extraction and feature vector conversion on the sample data in the sample blood margin data set to obtain sample feature vectors, codes the sample feature vectors through the coding layer of the transformer pre-training model to obtain sample coding vectors, conducts space mapping on the sample coding vectors to obtain space mapping results of the sample data, extracts the feature vectors from the same sample blood margin data set to be the same feature space, decodes the space mapping results of the sample data to obtain data classification results of the sample data, updates the transformer pre-training model based on the data classification results of the sample data, and conducts iteration training on the data classification results of the sample data classification.

It should be noted that the transformer pre-training model further includes a self-attention layer self-attention and a Softmax function layer. In which the self-attention layer self-attention focuses features to reduce computational complexity, each token requires attend all other tokens in the standard self-attention mechanism. However, it is observed that for the trained Transformer, the learned attion matrix a is typically very sparse over most of the data points. Thus, computational complexity can be reduced by limiting the number of query key pairs per query attribute in combination with structural deviations. Softmax is an activation function that normalizes a vector of values into a vector of probability distributions, with the sum of the individual probabilities being 1. The Softmax function can be used as the last layer of the neural network for the output of the multi-class problem, often in conjunction with a cross-entropy loss function.

Further, iteratively updating the transformer pre-training model based on the data classification result of the sample data to obtain a trained data classification model, which specifically comprises:

obtaining a loss function of a transformer pre-training model;

transmitting a classification error in a transducer pre-training model, and comparing the classification error with a preset error threshold;

Specifically, the server obtains a classification error by obtaining a loss function of a transform pre-training model, calculating a relative error between a data classification result and a preset standard classification result based on the loss function, transmitting the classification error in the transform pre-training model based on a back propagation algorithm, comparing the classification error with a preset error threshold, and if the classification error is larger than the preset error threshold, iteratively updating the transform pre-training model until the model is fitted to obtain a trained data classification model.

and adding metadata description corresponding to each piece of data to be processed in the initial data map to form a data map of the data to be processed.

Specifically, the server draws an initial data map according to the mapping relation between the data to be processed in the data classification result, and then adds metadata description corresponding to the data to be processed in the initial data map to form a data map of the data to be processed, so that developers can know the data information of each data to be processed more intuitively through the data map.

In the embodiment, the application discloses a metadata management method based on blood relationship analysis, and belongs to the technical field of artificial intelligence. According to the data classification method and device, the metadata description corresponding to the data to be processed is obtained, the blood margin analysis is carried out according to the metadata description, the blood margin information of the data to be processed is obtained, the data are divided according to the blood margin information to obtain a blood margin data set, the blood margin data set is input into a pre-trained data classification model, a data classification result is obtained, the mapping relation among the data to be processed is recorded in the data classification result, a data map of the data to be processed is constructed according to the mapping relation and the metadata description, and the data management is carried out on the data to be processed according to the data map. The data classification method and the data classification system combine the blood relationship information and the metadata description of the data to simply and primarily classify the data to be processed, and further perform data classification management on the data to be processed through a trained data classification model, so that the reliability of data management is improved, and the data utilization value is further improved.

It is emphasized that, in order to further ensure the privacy and security of the data to be processed, the data to be processed may also be stored in a node of a block chain.

The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a string of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, which is used for verifying the validity (anti-counterfeiting) of the information and generating a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware associated with computer readable instructions, which can be stored in a computer readable storage medium, and when executed, can include processes of the embodiments of the methods described above. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless otherwise indicated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of execution is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a metadata management apparatus based on blood-related analysis, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied in various electronic devices.

As shown in fig. 3, the metadata management apparatus 300 based on blood-related analysis according to the present embodiment includes:

the data acquisition module 301 is configured to acquire data to be processed from a preset data table, and acquire a metadata description corresponding to each data to be processed to obtain a first metadata description;

the first analysis module 302 is configured to perform blood relationship analysis on the data to be processed according to the first metadata description to obtain blood relationship information of the data to be processed;

the first dividing module 303 is configured to perform data division on the data to be processed according to the blood relationship information of the data to be processed, so as to obtain a blood relationship data set;

the data classification module 304 is configured to input the blood vessel data set into a pre-trained data classification model to obtain a data classification result of the to-be-processed data, where the data classification result records a mapping relationship between the to-be-processed data;

the data summarizing module 305 is configured to construct a data map of the to-be-processed data according to the mapping relationship between the to-be-processed data in the data classification result and the metadata description corresponding to the to-be-processed data;

and the data management module 306 is configured to perform data management on the data to be processed according to the data map.

Further, the first analysis module 302 specifically includes:

the code conversion unit is used for converting the data to be processed carrying the first metadata description into a script file of an SQL (structured query language) code;

the syntax tree construction unit is used for extracting a regularized SQL statement from a script file of the SQL code and converting the SQL statement into an abstract syntax tree;

the relation abstraction unit is used for traversing the abstract syntax tree to obtain the logical relation of all tree nodes in the abstract syntax tree;

and the blood relationship information acquisition unit is used for acquiring blood relationship information of the data to be processed based on the logical relations of all the tree nodes.

Further, the relationship abstraction unit specifically includes:

the traversal subunit is used for traversing the abstract syntax tree from the root node to the leaf node at the bottom layer of the abstract syntax tree;

and the relationship abstraction subunit is used for extracting the logical relationship among all the adjacent tree nodes to obtain the logical relationship of all the tree nodes in the abstract syntax tree.

Further, the data classification model includes an encoding layer and a decoding layer, and the data classification module 304 specifically includes:

the characteristic processing unit is used for carrying out characteristic extraction and characteristic vector conversion on the data to be processed in the blood margin data set to obtain a data characteristic vector;

the encoding unit is used for encoding the data characteristic vectors through the encoding layer of the data classification model to obtain data encoding vectors;

the space mapping unit is used for carrying out space mapping on the data coding vector to obtain a space mapping result of the data to be processed;

and the decoding unit is used for decoding the space mapping result of the data to be processed through the decoding layer of the data classification model to obtain the data classification result of the data to be processed.

Further, the apparatus 300 for managing metadata based on blood relationship analysis further comprises:

the system comprises a sample acquisition module, a metadata description module and a data processing module, wherein the sample acquisition module is used for acquiring sample data from a preset database and acquiring the metadata description corresponding to each sample data to obtain a second metadata description;

the second analysis module is used for carrying out blood relationship analysis on the sample data according to the second metadata description to obtain blood relationship information of the sample data;

the second division module is used for carrying out data division on the sample data according to the blood margin information of the sample data to obtain a sample blood margin data set;

the system comprises a sample importing module, a data processing module and a data processing module, wherein the sample importing module is used for importing a sample blood margin data set into a preset transformer pre-training model, and the transformer pre-training model comprises an encoding layer and a decoding layer;

the sample processing module is used for performing feature extraction and feature vector conversion on sample data in the sample blood margin data set to obtain a sample feature vector;

the sample coding module is used for coding the sample characteristic vector through a coding layer of a transformer pre-training model to obtain a sample coding vector;

the sample mapping module is used for carrying out space mapping on the sample coding vector to obtain a space mapping result of the sample data;

the sample decoding module is used for decoding the space mapping result of the sample data through a decoding layer of the transformer pre-training model to obtain a data classification result of the sample data;

and the model iteration module is used for iteratively updating the transformer pre-training model based on the data classification result of the sample data to obtain a trained data classification model.

Further, the model iteration module specifically includes:

the loss function acquisition unit is used for acquiring a loss function of the transformer pre-training model;

the classification error calculation unit is used for calculating the relative error between the data classification result and the preset standard classification result based on the loss function to obtain a classification error;

the error comparison unit is used for transmitting the classification error in the transformer pre-training model and comparing the classification error with a preset error threshold value;

and the model iteration unit is used for iteratively updating the transformer pre-training model when the classification error is larger than a preset error threshold value until the model is fitted to obtain a trained data classification model.

Further, the data summarization module 305 specifically includes:

the initial map drawing unit is used for drawing an initial data map according to the mapping relation among the data to be processed in the data classification result;

and the metadata description adding unit is used for adding metadata descriptions corresponding to the data to be processed in the initial data map to form a data map of the data to be processed.

In the above embodiment, the application discloses a metadata management device based on blood relationship analysis, and belongs to the technical field of artificial intelligence. According to the data classification method and device, the metadata description corresponding to the data to be processed is obtained, the blood margin analysis is carried out according to the metadata description, the blood margin information of the data to be processed is obtained, the data are divided according to the blood margin information to obtain a blood margin data set, the blood margin data set is input into a pre-trained data classification model, a data classification result is obtained, the mapping relation among the data to be processed is recorded in the data classification result, a data map of the data to be processed is constructed according to the mapping relation and the metadata description, and the data management is carried out on the data to be processed according to the data map. The data classification method and the data classification system combine the blood relationship information and the metadata description of the data to simply and primarily classify the data to be processed, and further perform data classification management on the data to be processed through a trained data classification model, so that the reliability of data management is improved, and the data utilization value is further improved.

In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 4, fig. 4 is a block diagram of a basic structure of a computer device according to the present embodiment.

The computer device 4 comprises a memory 41, a processor 42, and a network interface 43, which are communicatively connected to each other via a system bus. It is noted that only computer device 4 having components 41-43 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user in a keyboard mode, a mouse mode, a remote controller mode, a touch panel mode or a voice control equipment mode.

The memory 41 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the computer device 4. Of course, the memory 41 may also include both an internal storage unit of the computer device 4 and an external storage device thereof. In this embodiment, the memory 41 is generally used for storing an operating system and various types of application software installed on the computer device 4, such as computer readable instructions of a metadata management method based on blood vessel analysis. Further, the memory 41 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 42 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or process data, such as computer readable instructions for executing the method for managing metadata based on blood-related analysis.

The network interface 43 may comprise a wireless network interface or a wired network interface, and the network interface 43 is generally used for establishing a communication connection between the computer device 4 and other electronic devices.

The application discloses computer equipment belongs to artificial intelligence technical field. According to the data classification method and device, the blood relationship information of the data to be processed is obtained by obtaining the metadata description corresponding to the data to be processed, blood relationship analysis is carried out according to the metadata description, the data is divided according to the blood relationship information to obtain a blood relationship data set, the blood relationship data set is input into a pre-trained data classification model to obtain a data classification result, the mapping relation among the data to be processed is recorded in the data classification result, a data map of the data to be processed is constructed according to the mapping relation and the metadata description, and data management is carried out on the data to be processed according to the data map. The data classification method and the data classification system combine the blood relationship information and the metadata description of the data to simply and primarily classify the data to be processed, and further perform data classification management on the data to be processed through a trained data classification model, so that the reliability of data management is improved, and the data utilization value is further improved.

The present application further provides another embodiment, which is to provide a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the method for managing metadata based on a blood-based analysis as described above.

The application discloses a storage medium belongs to artificial intelligence technical field. According to the data classification method and device, the metadata description corresponding to the data to be processed is obtained, the blood margin analysis is carried out according to the metadata description, the blood margin information of the data to be processed is obtained, the data are divided according to the blood margin information to obtain a blood margin data set, the blood margin data set is input into a pre-trained data classification model, a data classification result is obtained, the mapping relation among the data to be processed is recorded in the data classification result, a data map of the data to be processed is constructed according to the mapping relation and the metadata description, and the data management is carried out on the data to be processed according to the data map. The data classification method and the data classification system combine the blood relationship information and the metadata description of the data to simply classify the data to be processed, and perform data classification management on the data to be processed through the trained data classification model, so that the reliability of data management is improved, and the data utilization value is further improved.

Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method of the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a better implementation. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

It should be understood that the above-described embodiments are merely exemplary of some, and not all, embodiments of the present application, and that the drawings illustrate preferred embodiments of the present application without limiting the scope of the claims appended hereto. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that modifications can be made to the embodiments described in the foregoing detailed description, or equivalents can be substituted for some of the features described therein. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields, and all the equivalent structures are within the protection scope of the present application.

Claims

1. A method for managing metadata based on blood relationship analysis is characterized by comprising the following steps:

performing data division on the data to be processed according to the blood relationship information of the data to be processed to obtain a blood relationship data set;

inputting the blood margin data set into a pre-trained data classification model to obtain a data classification result of the data to be processed, wherein the data classification result records a mapping relation among the data to be processed;

2. The method for managing metadata based on a blood-cut-off analysis according to claim 1, wherein the performing the blood-cut-off analysis on the data to be processed according to the first metadata description to obtain the blood-cut-off information of the data to be processed specifically comprises:

extracting a regularized SQL statement from the script file of the SQL code, and converting the SQL statement into an abstract syntax tree;

traversing the abstract syntax tree to obtain the logic relations of all tree nodes in the abstract syntax tree;

3. The method for managing metadata based on vessel analysis according to claim 2, wherein the tree nodes include a root node and a leaf node, and the traversing the abstract syntax tree to obtain the logical relationship of all the tree nodes in the abstract syntax tree specifically includes:

traversing the abstract syntax tree from the root node downwards until the leaf node at the lowest layer of the abstract syntax tree;

4. The method for managing metadata based on blood-related analysis according to claim 1, wherein the data classification model includes an encoding layer and a decoding layer, and the step of inputting the blood-related data set into the pre-trained data classification model to obtain the data classification result of the data to be processed specifically includes:

performing feature extraction and feature vector conversion on the data to be processed in the blood vessel data set to obtain a data feature vector;

performing spatial mapping on the data coding vector to obtain a spatial mapping result of the data to be processed;

5. The method for managing metadata based on blood relationship analysis according to claim 1, wherein before the inputting the blood relationship data set into a pre-trained data classification model to obtain the data classification result of the data to be processed, the method further comprises:

acquiring sample data from a preset database, and acquiring metadata description corresponding to each sample data to obtain a second metadata description;

performing data division on the sample data according to the blood relationship information of the sample data to obtain a sample blood relationship data set;

importing the sample blood margin data set into a preset transformer pre-training model, wherein the transformer pre-training model comprises an encoding layer and a decoding layer;

performing feature extraction and feature vector conversion on the sample data in the sample blood margin dataset to obtain a sample feature vector;

coding the sample characteristic vector through a coding layer of the transformer pre-training model to obtain a sample coding vector;

decoding the space mapping result of the sample data through a decoding layer of the transducer pre-training model to obtain a data classification result of the sample data;

6. The metadata management method based on blood-related analysis according to claim 5, wherein the iteratively updating the transformer pre-training model based on the data classification result of the sample data to obtain a trained data classification model specifically comprises:

obtaining a loss function of the transformer pre-training model;

calculating a relative error between the data classification result and a preset standard classification result based on the loss function to obtain a classification error;

transmitting the classification error in the transform pre-training model, and comparing the classification error with a preset error threshold;

7. The method for managing metadata based on vessel analysis according to any one of claims 1 to 6, wherein constructing a data map of the to-be-processed data according to the mapping relationship between the to-be-processed data in the data classification result and the metadata description corresponding to the to-be-processed data specifically includes:

8. A metadata management apparatus based on blood vessel analysis, comprising:

the data acquisition module is used for acquiring data to be processed from a preset data table and acquiring metadata description corresponding to each data to be processed to obtain a first metadata description;

the data classification module is used for inputting the blood vessel data set into a pre-trained data classification model to obtain a data classification result of the data to be processed, wherein the data classification result records the mapping relation among the data to be processed;

9. A computer device comprising a memory having computer readable instructions stored therein and a processor that when executed performs the steps of the method for blood-based analysis metadata management according to any one of claims 1 to 7.

10. A computer readable storage medium having computer readable instructions stored thereon which, when executed by a processor, implement the steps of the method for blood-margin analysis based metadata management according to any of claims 1 to 7.