CN112231420A

CN112231420A - Data analysis method, data analysis device, electronic device, and storage medium

Info

Publication number: CN112231420A
Application number: CN202011174732.4A
Authority: CN
Inventors: 姚石; 傅君玉; 吴梁纯; 李俊杰; 常晋曦
Original assignee: Ping An Zhitong Consulting Co Ltd
Current assignee: Ping An Zhitong Consulting Co Ltd
Priority date: 2020-10-28
Filing date: 2020-10-28
Publication date: 2021-01-15

Abstract

The application is applicable to the technical field of big data, and provides a data analysis method, a data analysis device, electronic equipment and a computer-readable storage medium, wherein the method comprises the following steps: acquiring at least one original data stream through a preset data platform; classifying each original data stream according to the data category; cleaning the classified original data stream to obtain an effective data stream; and calling a preset analysis tool, extracting target effective data according to the incidence relation among the effective data streams, and analyzing the target effective data to obtain an analysis map under the dimension to be analyzed. By the method, the analysis efficiency and the analysis accuracy of data analysis can be improved.

Description

Data analysis method, data analysis device, electronic device, and storage medium

Technical Field

The present application relates to the field of big data technologies, and in particular, to a data analysis method, a data analysis apparatus, an electronic device, and a computer-readable storage medium.

Background

Nowadays, financial cases are in a high-emergence situation due to changes of macroscopic economic situation and successive mines explosion of internet financial companies. For financial cases, the number of parties involved in the cases is large, and people are eager to be overcome. In the stage of examination and prosecution, the inspection hospital often faces to numerous and complicated files and complicated accounts and analyzes cases from the files.

In the face of complex transactions, massive information and complicated analysis work, the traditional analysis map is only used as a data import tool, partial or all cases are analyzed under manual selection, the obtained analysis results are possibly scattered, and scene and systematized analysis results are difficult to obtain.

Disclosure of Invention

In view of this, embodiments of the present application provide a data analysis method, a data analysis apparatus, an electronic device, and a computer-readable storage medium, which can remove redundant data before analysis, and enhance the correlation of data to be analyzed, so as to improve the efficiency and accuracy of data analysis.

A first aspect of an embodiment of the present application provides a data analysis method, including:

acquiring at least one original data stream through a preset data platform;

classifying each original data stream according to the data category;

cleaning the classified original data stream to obtain an effective data stream;

and calling a preset analysis tool, extracting target effective data according to the incidence relation among the effective data streams, and analyzing the target effective data to obtain an analysis map under the dimension to be analyzed.

A second aspect of an embodiment of the present application provides a data analysis apparatus, including:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring at least one original data stream through a preset data platform;

the classification unit is used for classifying each original data stream according to the data category;

a cleaning unit, configured to clean the classified original data stream to obtain an effective data stream;

and the analysis unit is used for calling a preset analysis tool, extracting target effective data according to the incidence relation among the effective data streams, and analyzing the target effective data to obtain an analysis map under the dimension to be analyzed.

A third aspect of the embodiments of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the electronic device, where the processor implements the steps of the data analysis method provided in the first aspect when executing the computer program.

A fourth aspect of embodiments of the present application provides a computer-readable storage medium, which stores a computer program that, when executed by a processor, implements the steps of the data analysis method provided by the first aspect.

By implementing the data analysis method provided by the embodiment of the application, at least one original data stream is obtained through a preset data platform, then the original data streams are classified according to data categories, the classified original data streams are cleaned to obtain effective data streams, a preset analysis tool is called, target effective data are extracted according to the incidence relation among the effective data streams, and the target effective data are analyzed to obtain an analysis map under the dimension to be analyzed. In the process, the original data contained in the original data stream is summarized and sorted through the classification operation of the original data, so that the original unordered original data is ordered; useless data can be screened out through the cleaning operation of the original data, so that only effective data which are meaningful for the subsequent analysis operation are reserved after cleaning; before data analysis, target effective data are extracted based on the incidence relation among the effective data streams and serve as data to be analyzed, data redundancy can be removed, data analysis efficiency is improved, and scene and structured analysis results are obtained.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

FIG. 1 is a flow chart of an implementation of a data analysis method provided by an embodiment of the present application;

FIG. 2 is a flowchart illustrating an implementation of step 102 in a data analysis method provided by an embodiment of the present application;

FIG. 3 is a flowchart illustrating an implementation of step 104 in a data analysis method provided by an embodiment of the present application;

FIG. 4 is a flowchart of another implementation of step 104 in a data analysis method provided by an embodiment of the present application;

FIG. 5 is a diagram of an example of a fund transaction relationship network diagram in a data analysis method provided in an embodiment of the present application;

fig. 6 is a block diagram of a data analysis apparatus according to an embodiment of the present application;

fig. 7 is a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The data analysis method according to the embodiment of the present application may be applied to electronic devices such as a server, a desktop computer, a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an Augmented Reality (AR)/Virtual Reality (VR) device, a notebook computer, a super-mobile personal computer (UMPC), a netbook, and a Personal Digital Assistant (PDA), and the embodiment of the present application does not limit specific types of the electronic devices.

Referring to fig. 1, fig. 1 is a flowchart illustrating an implementation of a data analysis method according to an embodiment of the present application. As shown in fig. 1, the data analysis method provided in this embodiment may include:

step 101, obtaining at least one original data stream through a preset data platform.

In this embodiment, the electronic device may be integrated with a data analysis system, and the data analysis system may access a preset data platform. The data platform is a platform for storage, calculation and presentation, and can provide services such as data access, data processing and data storage for certain specific data types. Illustratively, each data platform has set up a respective access interface; then, the electronic device can access the data in each data platform through the access interface to obtain at least one original data stream.

In some embodiments, the number of the preset data platforms may be one, two or more, and the number of the data platforms is not limited herein. Based on this, the data analysis system can also adopt a distributed framework to improve the concurrence of data analysis.

For example only, the raw data stream in the present embodiment may be obtained based on a case to be analyzed in a court, such as a financial case, which is not limited herein. Different data platforms can import different original data streams based on the same case to be analyzed, for example, the data platform 1 can import the bank flow data stream of the financial case 1; the data platform 2 can import the data stream of the personnel involved in the financial case 1, and the original data stream which can be imported by different data platforms is not limited here.

In some embodiments, the electronic device may perform inductive grooming on the obtained raw data streams after importing the raw data streams through the respective data platforms. For example, two or more original data streams belonging to a financial case are integrated together to form a new original data stream.

And 102, classifying the original data streams according to the data types.

In this embodiment, the data included in the original data stream is the original data. In an original data stream, a plurality of items of original data belonging to different categories may be contained. By way of example only, the banking dataflow data stream acquired for a financial case may include various items of raw data, such as an out-of-fund party, an in-fund party, an amount of fund, and a time of fund flow; the fund outflow party and the fund inflow party belong to the subject category of the involved institution, the fund amount belongs to the transaction amount category, and the fund flow time belongs to the transaction time category. Because each original data stream may originate from different data platforms, and the data types of the original data included in the different original data streams are different, the original data in the original data streams are relatively unordered; based on this, in this embodiment, the original data included in each original data stream may be classified, so that the original unordered original data is ordered, and the data category to which each original data belongs is obtained, so as to implement classification of the original data streams.

In some embodiments, the data analysis system has been pre-partitioned into several data categories. For example, the data categories are generally divided based on real words, and for example, the set data categories may include two major categories: numeric and non-numeric classes; among these, the digital class can be subdivided into: the age category, transaction amount category and transaction time category of the involved personnel; the non-digital class can be subdivided into: the category of the personnel involved in the case, the category of the organization involved in the case, the category of the transaction area and the like; the data category is not limited herein. Based on this, referring to fig. 2, the step 102 may specifically include:

step 1021, extracting the real words contained in each original data stream respectively to obtain the original data contained in each original data stream;

in consideration of the fact that the data categories are generally classified based on real words, in order to improve the efficiency of data analysis, only the real words included in each original data stream are extracted, and the extracted real words are the original data included in each original data stream.

Step 1022, for each original data stream, matching each original data included in the original data stream with at least one preset data category respectively;

each data category has its corresponding format and/or rules. For example, for the transaction time category, it is required that the corresponding data necessarily include date information, and the date information has its inherent format, such as day/month/year, or year/month/day, or month/day/year; the data corresponding to the transaction time category may also include time information, and the time information also has its own format, such as minutes and seconds. Based on this, for any original data stream, the original data stream can be matched with the format and/or rule associated with any data type, and if the matching is successful, the original data stream is considered to belong to the data type; otherwise, continuing to match the original data stream with the format and/or rule associated with the next data category until the matching is successful, so as to determine the data category to which each original data belongs.

And 1023, classifying the original data stream according to the matching result and the data type of each original data.

Wherein, although the original data under each data category can be determined according to the matching result, the original data is still associated with the original data stream from which the original data is derived. For example, each classified raw data may be marked, specifically, a source location of each raw data is marked, and the source location is used to indicate a raw data stream to which the raw data belongs.

In some embodiments, considering that the purpose of analyzing the financial case is to catch criminal suspects of the financial case, the key original data (namely, key case-related personnel main bodies) can be determined through feature extraction operation under the category of case-related personnel main bodies, and the key original data are marked; the feature extraction operation may be: aiming at each original data (namely, a case-involved person main body), detecting the fund exchange of the original data and a third party in a preset time period according to an original data stream to which the original data belongs; and if the fund traffic with the third party exceeds a preset fund threshold value, determining the original data as key original data. For example, there are A, B and C three case-related agent agents (original data) under the category of case-related agent agents, the bank flow data stream associated with a indicates that a has a capital exchange with the third party within half a year as X1, the bank flow data stream associated with B indicates that B has a capital exchange with the third party within half a year as X2, and the bank flow data stream associated with C indicates that C has a capital exchange with the third party within half a year as X3, wherein only X2 exceeds a preset capital threshold; therefore, at this time, B can be determined as a key case-involved person main body (key raw data), and B and all the raw data streams associated with B can be marked in a highlight mode, so that a user can grasp the key points when looking up.

Step 103, cleaning the classified original data stream to obtain an effective data stream.

In the embodiment of the present application, it is considered that the original data is often entered into the data platform by a manual method or an Optical Character Recognition (OCR) method, and the two methods cannot guarantee a hundred percent entry accuracy, which may result in invalid data in the original data. The data format and the data rule to be followed of the original data under the same data category are often the same, so that the classified original data streams can be cleaned, and the obtained cleaned data streams are effective data streams.

In some embodiments, the cleaning operation is mainly an operation performed on raw data belonging to a digital class, and based on this, a data class belonging to a digital class may be first screened out from at least one preset data class to be used as a data class to be cleaned, and then, for each data class to be cleaned, a cleaning operation is performed on the raw data belonging to the data class to be cleaned, so as to obtain valid data belonging to the data class to be cleaned. The cleansing operation includes, but is not limited to, a blank data filling operation, a noise data removing operation, and/or an illegal data removing operation.

In an application scenario, the cleaning operation includes a blank data filling operation; step 103 is embodied as; detecting whether blank data exist under the category of the data to be cleaned or not according to the category of the data to be cleaned; and if blank data exist, calculating the average value of the original data under the category of the data to be cleaned, and filling the average value of the original data to the blank data. For example, under the age category of the involved person, there is a blank age data; the electronic device may calculate the mean of the raw data under the age category of the involved person (i.e. the age mean of the involved person) and fill the mean into the age data of the gap.

In another application scenario, the cleaning operation includes a noise data removing operation; step 103 is embodied as; aiming at each data category to be cleaned, a corresponding noise data interval is preset in the data analysis system; then when cleaning operation is needed, detecting whether data (namely noise data) belonging to a corresponding noise data interval exists in the to-be-cleaned data category; if noisy data is present, the noisy data is culled. For example, under the age category of the patient involved, considering that there are few immature children and Madie old die old people who committed the financial case in real life, the data analysis system can set two noise data intervals, which are less than 16 and more than 80 (the unit is the year of week), respectively; when the original data under the age category of the involved personnel is cleaned, if the original data is detected to be 8, the original data is abnormal data belonging to a noise data interval smaller than 16, and the original data can be removed.

In another application scenario, the cleaning operation includes an illegal data removal operation; step 103 is embodied as; aiming at each type of data to be cleaned, presetting a corresponding illegal data interval in a data analysis system; then when cleaning operation is needed, detecting whether data (namely illegal data) belonging to a corresponding illegal data interval exists under the category of the data to be cleaned; and if the illegal data exist, removing the illegal data. For example, in the category of transaction time, it may happen that the original data (i.e. transaction time) entered due to an exception of the operating system or an attack of a virus is "1900 year x month y day", which is obviously an illegal time. When the original data under the category of transaction time is cleaned, the illegal data of '1900 year x month y day' needs to be removed, so as to ensure that the remaining original data are all legal data.

In some embodiments, considering that both the noise data and the illegal data are data to be removed, for each category of data to be cleaned, the electronic device may obtain an abnormal data interval after merging the noise data interval and the illegal data interval, so that the original data in the abnormal data interval corresponding to each category of data to be cleaned may be removed; that is, the noise data and the illegal data are collectively referred to as abnormal data, and the noise data removing operation and/or the illegal data removing operation are combined into one abnormal data removing operation.

Step 104, calling a preset analysis tool, extracting target effective data according to the incidence relation among the effective data streams, and analyzing the target effective data to obtain an analysis map under the dimension to be analyzed;

in this embodiment, the data analysis system is configured with a plurality of analysis dimensions in advance, wherein each analysis dimension is associated with at least one data category. After the analysis dimension, namely the dimension to be analyzed, concerned by the analysis operation is determined, a preset analysis tool can be called based on the dimension to be analyzed to extract target effective data, and the target effective data is analyzed to obtain an analysis map under the dimension to be analyzed. It should be noted that the number of dimensions to be analyzed is not limited herein. Illustratively, the analysis dimension may include: an account analysis dimension, a multi-dimensional group analysis dimension, a fund penetration analysis dimension, and the like, which are not limited herein. Also, each analysis dimension may also include a plurality of sub-analysis dimensions, for example, for an account analysis dimension, the following may include: the fund in-out stroke number sub-analysis dimension, the fund in-out amount sub-analysis dimension, the abnormal transaction frequency sub-analysis dimension and the like are not limited herein.

In this embodiment, the analysis tool may first determine an expression form of an analysis map to be output by the dimension to be analyzed, and draw a corresponding analysis map based on the expression form to obtain a visualization result. Illustratively, the expression form of the analysis map is various. For example, for an account analysis dimension, an analysis graph of a fund volume sub-analysis dimension may be represented as a pie graph, a fund volume sub-analysis dimension may be represented as a bar graph, and an abnormal transaction frequency sub-analysis dimension may be represented as a bar graph and/or a graph, where no limitation is made on the representation form of the analysis graph adopted by each analysis dimension and its sub-analysis dimension.

In some embodiments, the analysis tool may further determine a display scale of each analysis atlas based on the report template corresponding to the dimension to be analyzed; and then drawing and generating graphic elements based on the corresponding display scale and the corresponding expression form, and filling characters in the graphic elements based on the target effective data to obtain an analysis map.

In some embodiments, referring to fig. 3, the analyzing tool may obtain an association relationship between each effective data stream through a preset cluster analysis model, and extract corresponding target effective data to reduce data redundancy during data analysis, and the step of invoking the preset analyzing tool and extracting the target effective data according to the association relationship between each effective data stream may specifically include:

step 1041, performing vectorization processing on the effective data stream to obtain an effective data stream vector.

The electronic device can be configured with a plurality of vector dimensions in advance, wherein each vector dimension is used for indicating a data category; and vectorizing each effective data stream through a plurality of configured vector dimensions to obtain vectorized expression of the effective data stream, namely an effective data stream vector. For example only, the vector dimension may include a category a of mechanism-related subject, a category B of transaction amount, a category C of transaction time, and the like; considering that there may be multiple items of original data contained in a data stream, there may be multiple items of elements in each dimension in the valid data stream vector generated based on the valid data stream.

And 1042, constructing a clustering analysis model based on a preset clustering algorithm, and inputting the effective data flow vectors into the clustering analysis model to obtain at least one group, wherein each group comprises at least one effective data flow vector.

In the embodiment of the present application, each obtained effective data flow vector may be put into a vector space, and each effective data flow vector may be subjected to clustering. Specifically, the electronic device may build a cluster analysis model in advance based on a given clustering algorithm (e.g., a K-means algorithm or other algorithms), and implement clustering operations through the cluster analysis model. The resulting clustering result is at least one cluster, and each cluster can be considered as a cluster, such that there is at least one valid data flow vector in each cluster. It can be considered that valid data stream vectors in the same group have a certain relevance; that is, the valid data streams corresponding to the valid data stream vectors in the same group have a certain correlation.

And 1043, extracting the target valid data according to the at least one group.

In this embodiment, a partial group may be selected as a target group from at least one group obtained by clustering, and a valid data stream associated with a valid data stream vector in the target group may be determined as target valid data. Specifically, through a clustering algorithm, a mutually exclusive group (i.e., a group with a correlation degree lower than a preset correlation degree lower limit value) in the plurality of groups can be determined and marked as a mutually exclusive group; and a plurality of groups associated with each other (i.e., groups having a correlation higher than a predetermined upper limit of correlation) are labeled as associated groups. According to the determined association group and the mutual exclusion group, a target group can be determined in the at least one group. For example, assume that groups 1, 2, 3, 4, 5 are currently available, wherein group 1 and group 2 are related groups; group 4 and group 5 are related groups; group 2 and group 5 are mutually exclusive groups; the electronic device may determine group 1, group 2, and/or group 3 as the target group; group 3, group 4, and/or group 5 may also be determined as target groups; while groups 1, 2 and groups 4, 5 cannot be simultaneously determined as target groups. That is, the target group is determined following the obtained mutual exclusion and association relationship, the mutually exclusive group cannot be simultaneously determined as the target group, and it is recommended that the associated group be simultaneously determined as the target group. After the target group is determined, the electronic device may determine valid data under the valid data stream associated with the valid data stream vector in the target group as target valid data, that is, data to be analyzed. It will be appreciated that the degree of correlation between two clusters can be calculated from the distance between the mean vectors of the two clusters.

In some embodiments, to implement targeted data analysis, the electronic device can further filter the data to be analyzed based on the data categories associated with the dimensions to be analyzed. For example, assuming that the valid data streams associated with the valid data stream vectors in the target group are data streams 1 and 2, the electronic device may extract valid data belonging to the target data category (i.e., the data category associated with the dimension to be analyzed) in the data streams 1 and 2 as target valid data. That is, the electronic device may further filter the valid data streams associated with the valid data stream vectors in the target group, and only the valid data belonging to the target data category in the valid data streams is reserved as the target valid data.

In some embodiments, referring to fig. 4, for a financial case, if the dimension to be analyzed is a fund penetration analysis dimension, the step 104 may specifically include:

and step 1044, generating a fund transaction relationship network graph based on the target effective data.

Target data categories associated with the fund breakthrough analysis dimension include, but are not limited to, a case-involved person subject category, a transaction amount category, and a case-involved institution subject category. The loan direction can be analyzed and obtained through the target data categories and the target effective data below the target data categories; therefore, the fund transaction relationship network can be generated based on the agent of the involved persons (namely the target effective data under the agent of the involved persons), the transaction amount (the target effective data under the transaction amount category) and the lending direction (expressed based on the target effective data under the agent category of the involved institutions).

In the fund transaction relationship network graph, the nodes are involved persons or involved institutions; the nodes are connected based on the loan direction to indicate the flow of the funds. Illustratively, the connections between nodes may be labeled based on the amount of the transaction between the nodes. For example, if the transaction amount between two nodes is in a preset first amount interval, marking the connection line between the two nodes as green; if the transaction amount between the two nodes is in a preset second amount interval, marking the connecting line between the two nodes as yellow; and if the transaction amount between the two nodes is in a preset third amount interval, marking the connecting line between the two nodes as red. Wherein the maximum value of the first monetary interval is less than or equal to the minimum value of the second monetary interval; the maximum value of the second amount interval is less than or equal to the minimum value of the third amount interval. Through the connection marking operation, the flow direction of abnormal funds can be quickly found out. Alternatively, the links between nodes may be labeled based on the direction of the loan. For example, after a target node is selected, the money flowing into the target node is marked as blue, and the money flowing out of the target node is marked as red, so that the money flowing in/out state of a single node is quickly known.

Step 1045, monitoring whether a user command input based on the fund transaction relationship network graph is received.

After the electronic device generates the fund transaction relationship network graph, the fund transaction relationship network graph can be displayed on a screen of the electronic device for being referred by a user. The user can input user instructions based on the fund transaction relationship network diagram so as to realize various operations of the fund transaction relationship network diagram. By way of example only, the user instructions may include: editing instructions and viewing instructions, wherein the viewing instructions comprise click viewing instructions and floating viewing instructions.

Step 1046, if the user command is monitored, correspondingly processing the fund transaction relationship network graph under the instruction of the user command.

When the electronic device receives a user instruction, the fund transaction relationship network graph may be processed based on the user instruction. Here, the above step 1053 is explained and explained by taking the user operating through a computer as an example:

for a view instruction: a user may input a view instruction to the electronic device by placing a cursor or left clicking a mouse.

If the electronic equipment receives a floating checking instruction input based on the fund transaction relationship network diagram and cursor placement operation, determining the placement position of a cursor; if the placement position is on the node, the electronic equipment can display the account details contained in the node in a floating frame form; if the placement location is on the inter-node connection, the electronic device may display the aggregate transaction amount associated between the nodes in the form of a floating frame.

If the electronic equipment receives a click check instruction input based on the fund transaction relationship network diagram and the left mouse button click operation, determining the click position of the left mouse button click operation; if the click position is a node, the electronic equipment can select the node and highlight the name of the selected node; and if the viewing instruction input based on the click position is received again, hiding the name of the node.

For the editing instructions: the editing instructions comprise editing instructions for a single node and editing instructions for more than two nodes.

For the editing instruction of a single node, a user can firstly click a selected node by a left mouse button and then click the selected node by a right mouse button to call out a function menu, the function menu comprises an editing option, and the related information of the node is edited by clicking the editing option. The related information includes: node name, node color, node display size, and/or node icon. Optionally, an edit panel can also be called out through an "edit" option, and an ICON is displayed in the edit panel and is used for representing the node type; by way of example only, the data analysis system has been pre-provisioned with a plurality of node types, e.g., node types editable by a counterparty include: real estate, general personal, high-end consumption, automotive services, general consumption, dining, hotels, travel, stocks, futures, trusts, financial products, consumer finance, currency, securities, banks, insurance, capital markets, derived securities, investment financing, funds (private, public), international collection, financial management, trade finance, local finance, foreign exchange management, risk management, and the like, without limitation herein.

For the editing instructions of more than two nodes, a user can select the edited node by pressing a Control button (Ctrl) case on a keyboard and clicking more than two nodes by a left mouse button; or, the edited node is selected by directly selecting a node area through a left mouse button sliding frame. The electronic device may highlight the selected edited node. The user calls out a function menu including a 'merge' option by clicking the right mouse button again. The user presses the 'merge' option through the left mouse button, and the fund transaction relationship network graph performs merge processing on the edited option. The electronic device may choose to render the merged nodes with a blurring effect.

In some embodiments, as shown in fig. 5, after the merging process of the edited nodes is completed, the electronic device may further display the nodes before merging in a floating window form in a designated area (e.g., a lower right corner area) of the screen, so as to implement node merging front-back comparison.

In some embodiments, the user can also manually label color blocks and/or characteristics of the fund transaction relationship network graph, the data analysis system can transmit parameters of the labeled nodes to a background of the data system, so that the background can perform machine learning of the same kind of data to form learning data of a training set, and the learning data can be used as a data set and provide more accurate clustering and interface intelligent prompting for subsequent data analysis operation. The background can be a Hadoop platform.

In some embodiments, after the step 104, in order to realize the structured output of the data analysis result, the data analysis method may further include:

the analysis map is output to a report template corresponding to the dimension to be analyzed to obtain an analysis report;

in this embodiment, the electronic device presets a corresponding report template for each possible analysis dimension, where the report template is provided with content to be filled (that is, blank content). And filling the content to be filled in the report template based on the obtained analysis map so as to obtain a complete analysis report. By way of example only, a report template may be "①Case and case main body②Etc. of③Accounts related to case④During the period of the transaction⑤To⑤Cumulative trade RMB⑥Yuan, relates to opponents⑦The human body is a natural person and the human body is a natural person,⑧an enterprise. ", the underlined parts i-r all are contents to be filled, target data can be obtained by inquiring the analysis maps output under each dimension to be analyzed (and the sub-analysis dimensions thereof), and the target data is filled in the position of the corresponding contents to be filled. Through the report template, the analysis results to be displayed under the corresponding analysis dimensionality are connected in series by the language of the clause and subclause, and a user can conveniently look up the analysis results.

In some embodiments, the data analysis method further includes:

uploading the valid data stream, the analysis profile and/or the analysis report to a Blockchain (Blockchain).

In order to ensure the security of data and the fair transparency to the user, each valid data stream, the analysis graph and/or the analysis report may be uploaded to a block chain for evidence storage. The user can then download the valid data stream, the analysis graph and/or the analysis report from the blockchain through the respective devices, so as to verify whether the data are tampered. The blockchain in this embodiment is a novel application mode that uses computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, and an encryption algorithm. The block chain, which is essentially a decentralized database, is a string of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, which is used for verifying the validity (anti-counterfeiting) of the information and generating a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

As can be seen from the above, in the data analysis method provided in this embodiment, at least one original data stream is obtained through a preset data platform, then, the original data streams are classified according to data categories, the classified original data streams are cleaned, effective data streams are obtained, a preset analysis tool is called, target effective data is extracted according to an association relationship between the effective data streams, and the target effective data is analyzed, so as to obtain an analysis map in a dimension to be analyzed. In the process, the original data contained in the original data stream is summarized and sorted through the classification operation of the original data, so that the original unordered original data is ordered; useless data can be screened out through the cleaning operation of the original data, so that only effective data which are meaningful for the subsequent analysis operation are reserved after cleaning; before data analysis, target effective data are extracted as data to be analyzed based on the incidence relation among the effective data streams, data redundancy can be removed, and scene and structured analysis results are obtained; finally, the analysis result is output in a report form, and a more organized and targeted analysis result can be provided for a user.

Referring to fig. 6, fig. 6 is a block diagram of a data analysis apparatus according to an embodiment of the present disclosure. In this embodiment, each unit included in the electronic device is configured to execute each step in the data analysis method embodiment, and refer to the relevant description in the embodiment corresponding to the data analysis method. For convenience of explanation, only the portions related to the present embodiment are shown.

Referring to fig. 6, the data analysis device 6 includes:

an obtaining unit 601, configured to obtain at least one original data stream through a preset data platform;

a classifying unit 602, configured to classify each original data stream according to a data category;

a cleaning unit 603, configured to clean the classified original data stream to obtain valid data;

the analysis unit 604 is configured to invoke a preset analysis tool, extract target valid data according to an association relationship between the valid data, and analyze the target valid data to obtain an analysis map in the dimension to be analyzed.

As an embodiment of the present application, the classification unit 602 includes:

the real word extracting subunit is used for respectively extracting real words contained in each original data stream to obtain original data contained in each original data stream;

the data matching subunit is configured to match, for each original data stream, each piece of original data included in the original data stream with at least one preset data category;

and the category determining subunit is used for determining the data category to which each original data belongs according to the matching result so as to realize the classification of the original data stream.

As an embodiment of the present application, the analysis unit 604 includes:

the vectorization subunit is used for vectorizing the effective data to obtain an effective data vector;

the clustering subunit is used for constructing a clustering analysis model based on a preset clustering algorithm and inputting the effective data vectors into the clustering analysis model to obtain at least one group, wherein each group comprises at least one effective data vector;

and the extraction subunit is used for extracting the target effective data according to the at least one group.

As an embodiment of the present application, the extraction subunit includes:

a group relation determining subunit, configured to determine a mutually exclusive group and an associated group in the at least one group according to the cluster analysis model;

a target group determining subunit, configured to determine a target group in the at least one group based on the mutually exclusive group and the association group;

and the target valid data determining subunit is used for determining the valid data associated with the valid data vectors in the target group as target valid data.

As an embodiment of the present application, the cleaning unit 603 includes:

the data category screening subunit is used for screening out data categories belonging to a digital category from at least one preset data category as data categories to be cleaned;

and the original data cleaning subunit is used for cleaning the original data in the category of the data to be cleaned according to each category of the data to be cleaned to obtain the effective data in the category of the data to be cleaned.

As an embodiment of the present application, the cleaning operation includes: blank data stuffing operations, noise data removal operations, and/or illegal data removal operations.

As an embodiment of the present application, if the dimension to be analyzed is a fund penetration analysis dimension, the analyzing unit 604 includes:

the network diagram generating subunit is used for generating a fund transaction relationship network diagram based on the target effective data;

the instruction monitoring subunit is used for monitoring whether a user instruction input based on the fund transaction relationship network diagram is received;

and the network graph processing subunit is used for processing the fund transaction relationship network graph under the instruction of the user instruction if the user instruction is monitored.

As an embodiment of the present application, the user instruction includes an editing instruction and a floating viewing instruction, and the network diagram processing subunit includes:

a network diagram editing subunit, configured to, if an editing instruction input based on the fund transaction relationship network diagram is accessed, determine a node to be edited in the fund transaction relationship network diagram under an instruction of the editing instruction, and perform node framing and/or node merging operation on the node to be edited in the fund transaction relationship network diagram;

and the network diagram viewing subunit is used for displaying each target effective data related to the node pointed by the floating viewing instruction in a floating frame mode under the instruction of the floating viewing instruction if the floating viewing instruction input based on the fund transaction relationship network diagram is accessed.

As an embodiment of the present application, the data analysis device 6 further includes:

and the uploading unit is used for uploading the effective data and/or the analysis map to the block chain.

In the embodiment of the application, a data analysis device obtains at least one original data stream through a preset data platform, classifies each original data stream according to data category, cleans the classified original data streams to obtain effective data streams, calls a preset analysis tool, extracts target effective data according to the incidence relation among the effective data streams, and analyzes the target effective data to obtain an analysis map under a dimension to be analyzed. In the process, the original data contained in the original data stream is summarized and sorted through the classification operation of the original data, so that the original unordered original data is ordered; useless data can be screened out through the cleaning operation of the original data, so that only effective data which are meaningful for the subsequent analysis operation are reserved after cleaning; during data analysis, target effective data are extracted as data to be analyzed based on the incidence relation among the effective data streams, data redundancy can be removed, and scene and structured analysis results are obtained; finally, the analysis result is output in a report form, and a more organized and targeted analysis result can be provided for a user.

It should be noted that, because the contents of information interaction, execution process, and the like between the above units are based on the same concept, specific functions and technical effects thereof may be referred to in the method embodiment section, and are not described herein again.

Fig. 7 is a block diagram of an electronic device according to another embodiment of the present application. As shown in fig. 7, the electronic apparatus 7 of this embodiment includes: a processor 71, a memory 72 and a computer program 73, for example a program for a data analysis method, stored in the memory 72 and executable on the processor 71. The processor 71, when executing the computer program 73, implements the steps in the various embodiments of the data analysis methods described above, such as the steps 101 to 104 shown in fig. 1. Alternatively, when the processor 71 executes the computer program 73, the functions of the units in the embodiment corresponding to fig. 5, for example, the functions of the units 501 to 504 shown in fig. 5, are implemented, and please refer to the related description in the embodiment corresponding to fig. 5, which is not described herein again.

Illustratively, the computer program 73 may be divided into one or more units, which are stored in the memory 72 and executed by the processor 71 to complete the present application. The one or more units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 73 in the electronic device 70. For example, the computer program 73 may be divided into an acquisition unit, a classification unit, a cleaning unit, and an analysis unit, and the specific functions of the units are as described above.

The electronic device may include, but is not limited to, a processor 71, a memory 72. It will be appreciated by those skilled in the art that fig. 7 is merely an example of the electronic device 7, and does not constitute a limitation of the electronic device 7, and may include more or less components than those shown, or combine certain components, or different components, for example, the electronic device may also include input output devices, network access devices, buses, etc.

The Processor 71 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 72 may be an internal storage unit of the electronic device 7, such as a hard disk or a memory of the electronic device 7. The memory 72 may be an external storage device of the electronic device 7, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided in the electronic device 7. Further, the memory 72 may include both an internal storage unit and an external storage device of the electronic device 7. The memory 72 is used for storing the computer program and other programs and data required by the electronic device. The memory 72 may also be used to temporarily store data that has been output or is to be output.

In this embodiment, when the processor 70 executes the computer program 72 to implement the steps in any of the data analysis method embodiments, the data analysis apparatus obtains at least one original data stream through a preset data platform, classifies each original data stream according to a data category, cleans the classified original data stream to obtain an effective data stream, and then invokes a preset analysis tool to extract target effective data according to an association relationship between each effective data stream, and analyzes the target effective data to obtain an analysis map in a dimension to be analyzed. In the process, the original data contained in the original data stream is summarized and sorted through the classification operation of the original data, so that the original unordered original data is ordered; useless data can be screened out through the cleaning operation of the original data, so that only effective data which are meaningful for the subsequent analysis operation are reserved after cleaning; during data analysis, target effective data are extracted as data to be analyzed based on the incidence relation among the effective data streams, data redundancy can be removed, and scene and structured analysis results are obtained; finally, the analysis result is output in a report form, so that the user can look up the analysis result more orderly and pertinently.

An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program can implement the steps in the data analysis method embodiments.

The embodiments of the present application provide a computer program product, which when running on an electronic device, enables the electronic device to implement the steps in the above data analysis method embodiments when executed.

The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A method of data analysis, comprising:

acquiring at least one original data stream through a preset data platform;

classifying each original data stream according to the data category;

2. The data analysis method of claim 1, wherein the classifying each raw data stream according to data class comprises:

respectively extracting real words contained in each original data stream to obtain original data contained in each original data stream;

for each original data stream, respectively matching each original data contained in the original data stream with at least one preset data category;

and determining the data category of each original data according to the matching result so as to realize the classification of the original data stream.

3. The data analysis method of claim 1, wherein the invoking of a preset analysis tool to extract target valid data according to an association relationship between the valid data streams comprises:

vectorizing the effective data flow to obtain an effective data flow vector;

constructing a cluster analysis model based on a preset clustering algorithm, and inputting the effective data flow vectors into the cluster analysis model to obtain at least one group, wherein each group comprises at least one effective data flow vector;

and extracting target effective data according to the at least one group.

4. The data analysis method of claim 3, wherein the extracting target valid data from the at least one group comprises:

determining a mutually exclusive group and an associated group in the at least one group according to the cluster analysis model;

determining a target group among the at least one group based on the mutually exclusive group and the associated group;

determining valid data streams associated with valid data stream vectors in the target group as target valid data.

5. The data analysis method of claim 1, wherein if the dimension to be analyzed is a fund penetration analysis dimension, the analyzing the target valid data to obtain an analysis map in the dimension to be analyzed comprises:

generating a fund transaction relationship network graph based on the target effective data;

monitoring whether a user instruction input based on the fund transaction relationship network diagram is received;

and if the user instruction is monitored, processing the fund transaction relationship network graph under the instruction of the user instruction.

6. The data analysis method of claim 5, wherein the user command comprises an edit command and a floating view command, and if the user command is monitored, the fund transaction relationship network graph is processed under the instruction of the user command, and the processing method comprises the following steps:

if an editing instruction input based on the fund transaction relationship network graph is accessed, determining a node to be edited in the fund transaction relationship network graph under the instruction of the editing instruction, and performing node selection and/or node combination operation on the node to be edited in the fund transaction relationship network graph;

and if the floating viewing instruction input based on the fund transaction relationship network graph is accessed, displaying each target effective data related to the node pointed by the floating viewing instruction in a floating frame mode under the indication of the floating viewing instruction.

7. The data analysis method according to any one of claims 1 to 6, wherein after the analyzing the target valid data to obtain the analysis map in the dimension to be analyzed, the data analysis method further comprises:

uploading the active data stream and/or the analysis graph into a blockchain.

8. A data analysis apparatus, comprising:

the cleaning unit is used for cleaning the classified original data stream to obtain an effective data stream;

9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 7 are implemented when the computer program is executed by the processor.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.