CN114511353A

CN114511353A - Data analysis method and device

Info

Publication number: CN114511353A
Application number: CN202210106089.4A
Authority: CN
Inventors: 陆怡; 贾玉红; 徐聿帆; 林孙镇江
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2022-01-28
Filing date: 2022-01-28
Publication date: 2022-05-17

Abstract

The embodiment of the application provides a data analysis method and a data analysis device, which relate to the technical field of big data and comprise the following steps: displaying a first interface, wherein the first interface comprises a query entrance and part or all of a user behavior path knowledge graph; when the query entrance acquires a query instruction, analyzing the content in the user behavior path knowledge graph according to the query instruction to obtain an analysis result; and displaying a second interface, wherein the second interface comprises the analysis result. The data analysis method provided by the application adopts the knowledge graph technology to analyze the user behavior path, can present richer information, and has important significance for visually understanding the whole user behavior path, drilling and mining the user behavior characteristics, analyzing the path defects and improving the product value.

Description

Data analysis method and device

Technical Field

The present application relates to the field of big data technologies, and in particular, to a data analysis method and apparatus.

Background

In the big data era, collecting and analyzing user data can help developers to improve and optimize products, and provide more perfect product service for users.

In the process of operating a web page or an application (app), a terminal device may record each operation behavior of a user and generate a corresponding log. A product manager or an operation analyst may analyze user behavior based on logs generated by the user behavior. For example, when the user behavior is analyzed, the user behavior path may be obtained through analysis, so as to implement characterization of the user behavior characteristics or optimization and improvement of product design, etc. through the user behavior path. However, the existing user behavior path analysis method has the problems of incomplete and inaccurate analysis result.

Disclosure of Invention

The embodiment of the application provides a data analysis method and device, which can comprehensively collect user behavior path data and improve the analysis capability of a user behavior path and the accuracy of an analysis result.

In a first aspect, an embodiment of the present application provides a data analysis method, including:

displaying a first interface, wherein the first interface comprises a query entrance and part or all of a user behavior path knowledge graph; the user behavior path knowledge graph comprises a plurality of nodes, relations among the nodes, node attributes of any node and relation attributes of any relation; the node attribute comprises one or more of a node type or a click frequency, and the relationship attribute comprises one or more of a path frequency, a path type or a path occurrence time of the user behavior path;

when the query entrance acquires a query instruction, analyzing the content in the user behavior path knowledge graph according to the query instruction to obtain an analysis result;

and displaying a second interface, wherein the second interface comprises the analysis result.

Optionally, the query entry includes an encapsulated visual query interface, and/or a write area for writing to a query language.

The query instruction includes one or more of: the method comprises the steps of a shortest path query instruction, a node expansion instruction, a node merging instruction, a node splitting instruction, a condition filtering node instruction according to node attributes, a relation merging instruction, a relation splitting instruction, a condition filtering relation instruction according to relation attributes, a relation weight calculation instruction or a graph index calculation instruction.

Optionally, when the query entry obtains the query instruction, analyzing the content in the user behavior path knowledge graph according to the query instruction to obtain an analysis result, including:

when the query entry acquires a node merging instruction, merging a plurality of nodes corresponding to the node merging instruction, and merging the relationship among the plurality of nodes corresponding to the node merging instruction to obtain an analysis result, wherein the analysis result comprises a user behavior path indication map after the nodes are merged;

or when the query entry acquires the node splitting instruction, restoring the merged node to a state in which detailed node information can be viewed, and splitting the relationship between the nodes corresponding to the splitting instruction to obtain an analysis result, wherein the analysis result comprises a user behavior path indication map after the node splitting;

or when the query entry acquires a node filtering instruction according to the node attribute screening condition, hiding nodes which do not accord with the node attribute screening condition and corresponding relations to obtain an analysis result, wherein the analysis result comprises a user behavior path indicating map of the hidden part of nodes;

or when the query entry acquires the relationship merging instruction, merging a plurality of relationships between nodes corresponding to the relationship merging instruction to obtain an analysis result, wherein the analysis result comprises a user behavior path indication map after relationship merging;

or when the query entry acquires the relationship splitting instruction, restoring the merged relationship to a state in which detailed relationship information can be viewed, and acquiring an analysis result, wherein the analysis result comprises a user behavior path indication map after the relationship splitting;

or when the query entry acquires the relation filtering instruction according to the relation attribute screening condition, hiding the nodes which do not accord with the relation attribute screening condition and the corresponding relation to obtain an analysis result, wherein the analysis result comprises a user behavior path indication map of the hidden part of nodes;

or when the query entry acquires a relation weight calculation instruction, calculating the proportion of the internal quantity of a certain path to the total quantity of paths between two points, and marking the thickness of the relation according to the proportion, wherein the larger the proportion is, the thicker lines between the relations are, and an analysis result is obtained and comprises a user behavior path indication map after the relation is marked;

or when the query entry acquires the graph index calculation instruction, calculating the user behavior path knowledge graph to acquire a link file, wherein the analysis result comprises the link file.

Optionally, the query instruction further includes: at least one of a node statistical instruction and a relationship statistical instruction; when the query entrance acquires a query instruction, analyzing the content in the user behavior path knowledge graph according to the query instruction to obtain an analysis result, wherein the analysis result comprises the following steps:

when a node counting instruction is obtained at a query entrance, counting the number of each node in the user behavior path knowledge graph or the number queried at this time according to the name of a node label or a node attribute, or counting the distribution situation of each node in the user behavior path knowledge graph or the distribution situation of different values in a query result at this time according to the value of the node label or the node attribute, or counting the distribution situation of the value of a target node in the user behavior path knowledge graph or the distribution situation of the value in the query result at this time according to a specific attribute in the target node, or dividing a plurality of communities according to a function module to which the node label belongs, and counting the node distribution situations and the corresponding quantity information in the different communities.

Or when the query entry acquires the relation statistic instruction, the number of each node in the user behavior path knowledge graph or the number queried at this time is counted according to the name of the relation label or the relation attribute, or, according to the value of the relationship label or the relationship attribute, the distribution condition of each node in the user behavior path knowledge graph or the distribution condition of different values in the query result is counted, or, according to a specific attribute in the target relationship, counting the value distribution of the target relationship in the user behavior path knowledge graph or the value distribution in the query result, or, according to a 3 sigma principle, counting the collection and distribution quantity of the main flow path and the paths of the little public, and according to a full-quantity display principle, highlighting all illegal paths, or counting all reverse paths under a certain condition; the main flow path is a path with a path passing frequency greater than or equal to a frequency threshold, the little path is a path with a path passing frequency less than the frequency threshold, and the illegal path is a user operation path beyond a program allowable range.

Optionally, the first interface further includes a sliding time window, and the data analysis method further includes:

when a start-stop time modification to the sliding time window is received, the first interface displays a portion of the user behavior path knowledge graph over time of the modified sliding time window.

Optionally, displaying the first interface includes:

sending a post request to a graph database;

obtaining a result of the post request; the request result is js object numbered notation json character string data;

converting the request result into a json data format required by visualization;

performing visual rendering on the basis of a preset visual rendering technology frame, force guide layout, visual style configuration information and a json data format result required by visualization to obtain a visual result; the visualization result supports one or more functions of node dragging, node relation attribute interactive display and node right-click expansion.

Optionally, the data analysis method provided by the present application further includes:

collecting a user behavior log;

classifying the user behavior logs to perform entity data processing and relation data processing to obtain entity data and relation data;

and constructing a user behavior path knowledge graph based on the entity data and the relation data.

Optionally, the user behavior log includes a buried point log and a routing log; the entity data processing comprises the following steps: processing an entity label and an entity attribute; the relational data processing comprises the following steps: processing a relation label and processing a relation attribute;

wherein, entity label processing includes: processing a buried point entity label and a routing entity label; in the processing of the embedded point entity label, the embedded point entity label is divided into different entities according to different entity service requirements, and the division according to the different entity service requirements comprises one or more of the following steps: dividing according to the type of the control, dividing according to the functions of different click objects, dividing according to the service functions and the like; in the routing entity label processing, the routing entity labels are divided according to one or more of the following steps: dividing according to the depth of a page, dividing according to the breadth of the page and dividing according to the function of the page;

the entity attribute processing comprises the following steps: dividing the tags obtained after the entity tag processing into a plurality of first tags according to different service requirements, taking one of the plurality of first tags as an entity tag, and taking the rest information except the entity tag in the plurality of first tags as the entity attribute information;

dividing the relation labels according to different relation service requirements to obtain different entities in the relation label processing; partitioning according to different relationship business requirements includes one or more of: dividing according to path frequency, dividing according to path value, dividing according to path residence time, dividing according to whether path starting depths are the same or not, and dividing according to whether path starting depths are the same or not and according to whether shallow-to-deep access is performed;

the processing of the relationship attribute comprises the following steps: dividing the labels obtained after the processing of the relation labels into a plurality of second labels according to different service requirements, taking one of the plurality of second labels as the relation label, and taking the rest information except the relation label in the plurality of second labels as the relation attribute information.

Optionally, the constructing the user behavior path knowledge graph based on the entity data and the relationship data includes:

utilizing the entity data and the relation data to carry out ontology construction and knowledge mapping to obtain a user behavior path knowledge graph;

wherein, the ontology construction comprises: building nodes by using entity labels, wherein the entity labels correspond to the nodes one by one, and each node corresponds to respective entity attribute information; establishing a relationship by using the relationship labels, wherein the relationship labels correspond to the relationships one by one, and each relationship corresponds to respective relationship attribute information;

the knowledge mapping comprises: and mapping the fields of the data file in the actual data processing one by one according to the entity label, the relation label and the attribute information corresponding to the entity label and the relation label in the body construction.

In a second aspect, the present application provides a data analysis apparatus comprising:

the first display module is used for displaying a first interface, and the first interface comprises a query entrance and part or all of the user behavior path knowledge graph; the user behavior path knowledge graph comprises a plurality of nodes, relations among the nodes, node attributes of any node and relation attributes of any relation; the node attribute comprises one or more of a node type and a click frequency, and the relationship attribute comprises one or more of a path frequency, a path type or a path occurrence time of a user behavior path;

the analysis module is used for analyzing the content in the user behavior path knowledge graph according to the query instruction when the query instruction is received by the query entrance to obtain an analysis result;

and the second display module is used for displaying a second interface, and the second interface comprises an analysis result.

In a third aspect, an embodiment of the present application provides an electronic device, including: a memory and a processor;

the memory is used for storing a computer program; the processor is configured to execute the memory-stored computer program to implement the method of any one of the first aspects.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium on which a computer program is stored, the computer program being executed by a processor to implement the method of any one of the first aspect.

In a fifth aspect, the present application provides a computer program product comprising a computer program that, when executed by a processor, implements the method of any one of the first aspects.

According to the data analysis method and device, after the data analysis system receives the query instruction input by the user, the content in the behavior path knowledge graph of the user can be analyzed, and an analysis result is obtained. Compared with the existing user behavior path analysis method, the method has the advantages that the knowledge graph technology is adopted to analyze the user behavior path, so that the user behavior path can be analyzed in a one-way mode, a reverse path in the user behavior path and a user behavior path jumping repeatedly between pages can also be analyzed, and a more comprehensive and more accurate analysis result can be obtained. The method has important significance for intuitively understanding the whole user behavior path, drilling and excavating user behavior characteristics, analyzing path defects and improving product value.

Drawings

Fig. 1 is a schematic view of a scenario provided in an embodiment of the present application;

FIG. 2 is a schematic flow chart of a data analysis method provided in an embodiment of the present application;

FIG. 3 is a schematic illustration of a first interface provided in accordance with an embodiment of the present application;

FIG. 4 is a schematic interface diagram of a user-triggered query control according to an embodiment of the present application;

fig. 5 is an interface schematic diagram of a user-triggered node merge control according to an embodiment of the present application;

fig. 6 is an interface schematic diagram of a user-triggered node splitting control according to an embodiment of the present application;

FIG. 7 is a schematic interface diagram of a user-triggered conditional filter node control according to an embodiment of the present application;

fig. 8 is an interface schematic diagram of a user-triggered relationship merge control provided in the embodiment of the present application;

fig. 9 is an interface schematic diagram of a user-triggered relationship splitting control according to an embodiment of the present application;

FIG. 10 is a schematic interface diagram of a user-triggered relationship weight calculation control according to an embodiment of the present application;

fig. 11 is an interface schematic diagram of a user triggered node statistics control provided in the embodiment of the present application;

FIG. 12 is a schematic illustration of a first interface provided in accordance with an embodiment of the present application;

FIG. 13 is a schematic illustration of a visualization configuration interface provided in accordance with an embodiment of the present application;

FIG. 14 is a schematic flow chart diagram of a data analysis method provided in an embodiment of the present application;

FIG. 15 is a schematic diagram of a data analysis system provided in an embodiment of the present application;

fig. 16 is a schematic diagram of a data analysis apparatus according to an embodiment of the present application.

Detailed Description

In order to facilitate clear description of the technical solutions of the embodiments of the present application, some terms and techniques referred to in the embodiments of the present application are briefly described below:

1) burying a point log: the buried point is that some tool codes for collecting event data are implanted in the program of the business system to collect various events. The buried point log records information of a specific process, is used for tracking the use condition of a function, and is used for continuously optimizing products or providing data support of operation. The embedded point log is a log at a control level, and the provided information mainly comprises access number, visitor number, stay time, page browsing number, jump rate and the like.

2) Routing log: and recording a log of the working state of the router, recording page jump information of a specific flow, and tracking the switching condition of the whole page. The routing log is a log at page level, and the provided information mainly comprises page response duration, access source information and the like. The routing log may provide user behavior path data at a macro page level.

3) Knowledge graph: the method is characterized in that a series of different graphs of the relation between the knowledge development process and the structure are displayed, knowledge resources and carriers thereof are described by using a visualization technology, and knowledge and the mutual relation among the knowledge resources, the carriers, the knowledge are mined, analyzed, constructed, drawn and displayed. The modern theory is that the theory and method of applying mathematics, graphics, information visualization technology, information science and other disciplines are combined with the method of metrology citation analysis, co-occurrence analysis and the like, and the core structure, development history, frontier field and overall knowledge framework of the disciplines are vividly displayed by utilizing a visual map to achieve the aim of multi-discipline fusion.

4) Visualization: the theory, method and technology are that the data is converted into the graph or the image is displayed on the screen by using the computer graphics and the image processing technology, and then the interactive processing is carried out.

5) User behavior path: the access path is formed during the operation of the web page or the app by the user, for example, the user behavior path includes a path formed by a previous buried point log pointing to a next buried point log, or a path formed by a previous route log pointing to a next route log.

6) A POST request is a submission of data to be processed to a specified resource.

7) Other terms

In the embodiments of the present application, the words "first", "second", and the like are used to distinguish the same items or similar items having substantially the same functions and actions. For example, the first interface and the second interface are not limited to the order of the first interface and the second interface. Those skilled in the art will appreciate that the terms "first," "second," and the like do not denote any order or importance, but rather the terms "first," "second," and the like do not denote any order or importance.

It should be noted that in the embodiments of the present application, words such as "exemplary" or "for example" are used to indicate examples, illustrations or descriptions. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.

In the embodiments of the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c can be single or multiple.

The data analysis method provided by the embodiment of the application is described in detail below with reference to the accompanying drawings. It should be noted that "at … …" in the embodiment of the present application may be at the instant of a certain condition, or may be within a certain period of time after a certain condition occurs, and the embodiment of the present application is not particularly limited to this.

Collecting and analyzing user data can help developers improve and optimize products, and provide more perfect product service for users. Analyzing user data is often an analysis of user behavioral paths.

The user behavior path analysis mainly comprises the steps of analyzing the circulation rules and characteristics of each user when operating each function according to the click behavior log of each user in the operation process, and mining the access or click mode of the user so as to solve some specific service problems. Such as improving the arrival rate of core functions, extracting the mainstream path behavior of a specific user group, characterizing user browsing characteristics, optimizing and improving product design, etc.

Among different types of products, a product manager or an operation analyst basically uses a funnel graph and a sang-based graph for user behavior analysis. The funnel graph has the characteristics of clear target, intuition and singleness, is suitable for products with single links, and can make key breakthrough on the local transformation problem; the sang-based graph has the characteristics of time sequence analysis, diversified paths, suitability for analyzing main flow paths and paths of the Xiaozhong, and common scenes of value attribution and personalized recommendation.

Whether funnel or mor-base, there is a drawback: since both graphs are unidirectional in nature, it is difficult to observe and analyze the reverse path and to track the user behavior of repeated jumps between pages. The user behaviors of the reverse path and the repeated leap often represent some product design defects and user thinking modes with potential values, and the observation and the analysis of the paths and the behaviors have important significance for optimizing the product design, deeply understanding psychological factors behind the user behaviors and exploring the product value expectation of users.

The user behaviors of the reverse path and the repeated leap are essentially a correlation, and the knowledge graph is a relationship network obtained by connecting different kinds of information and knowledge together and provides the capability of analyzing problems from the perspective of relationship. Further, the knowledge graph mainly comprises related technologies such as knowledge extraction, knowledge representation, knowledge storage, knowledge mining, knowledge reasoning and the like, and can be used in business scenes such as relationship mining, group recognition, network analysis, event conduction and the like. The technologies and the applicable scenes completely meet the requirements of user behavior path analysis. Moreover, the visualization of the knowledge graph can present rich information, which has important significance for visually understanding the whole user behavior path, drilling and mining the user behavior characteristics, analyzing the path defects and improving the product value.

In view of this, the present application provides a data analysis method and apparatus for a user behavior path, which is comprehensive in user behavior path coverage (including a sequential path and a reverse path) and supports flexible path exploration, so that a data analyst can intuitively know the whole user behavior path, and can more efficiently and conveniently use big data to complete actual service requirements, improve data mining efficiency, and further improve data value.

The following describes the technical solutions of the present invention and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. The data analysis method provided by the embodiment of the invention will be described below with reference to the accompanying drawings.

Fig. 1 is a scenario diagram provided in the embodiment of the present application, as shown in fig. 1, including a user a1 and a terminal device B1, where the terminal device may receive operations performed by the user. The terminal equipment is provided with a data analysis system for analyzing the user behavior path, and when the terminal equipment receives the trigger of the user to the data analysis system, the terminal equipment can enter the data analysis system and display an interface for data analysis. The terminal can be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers and portable wearable devices, which is not limited in the embodiment of the present application; the terminal device may receive an operation of the user, which may be but not limited to clicking, touching, and the like, and this is not limited in this embodiment of the application.

In one embodiment, as shown in fig. 2, a data analysis method is provided, which is exemplified by the application of the method to the data analysis system in fig. 1, and includes the following steps:

step 101: a first interface is displayed, the first interface including a query entry and part or all of a user behavior path knowledge graph.

The user behavior path knowledge graph comprises a plurality of nodes, relations among the nodes, node attributes of any node and relation attributes of any relation; the node attribute comprises one or more of a node type and a click number, and the relationship attribute comprises one or more of a path frequency, a path type or a path occurrence time of the user behavior path.

In the embodiment of the application, the user behavior path knowledge graph may be a user behavior relationship network obtained by concatenating access paths formed by users in a process of operating a webpage or an app by using a knowledge graph technology. For example, the user stays at page a for N minutes initially, the operation is switched from page a to page B, the operation is returned from page B to page a, and the operation is switched to page C, and the user behaviors are arranged in series according to the time sequence by using the knowledge graph technology to obtain the user behavior relationship network.

The node type refers to the type of a buried point or a route; the number of clicks refers to the number of times a website or a web page is accessed within a period of time, and usually takes days as a unit; the path frequency of the user behavior path refers to the frequency of the user accessing the website or the page through the path; the path type refers to dividing the path of the user accessing the website or page into different types, for example, a long-time resident path, a short-time resident path, etc.; the path occurrence time refers to the time when the user accesses a website or page through the path.

When the terminal device receives that a user logs in the data analysis system, the first interface in the step 101 is displayed through the data analysis system.

The first interface is illustrated in FIG. 3, where the query entry may include a query control, a statistics control, and an analysis control. The data analysis system can receive the trigger of the user on any one of the controls, generates a corresponding query instruction when receiving the trigger of the user on any one of the controls, and analyzes the content in the user behavior path knowledge graph according to the query instruction.

When receiving that a user triggers the query control, the data analysis system can query the content in the behavior path knowledge graph of the user to obtain a query result; when the data analysis system receives that the user triggers the statistical control, the data analysis system can perform statistics on the content in the behavior path knowledge graph of the user to obtain a statistical result; when receiving that the analysis control is triggered by the user, the data analysis system can analyze the content in the behavior path knowledge graph of the user to obtain an analysis result.

The first interface also displays part or all of the user behavior path knowledge graph, specifically, according to different products, the content in the corresponding user behavior path knowledge graph is different, and when a user logs in the data analysis system, if the time for displaying all of the user behavior path knowledge graph to be loaded is longer, the part of the user behavior path knowledge graph is displayed at the moment; and when the user login data analysis system is received, displaying all the user behavior path knowledge maps, and when the required time is short, displaying all the user behavior path knowledge maps. The number of the contents of the knowledge graph of the user behavior path is displayed, and the contents are different according to different terminal equipment loaded on the data analysis system.

Step 102: and when the query entrance acquires the query instruction, analyzing the content in the user behavior path knowledge graph according to the query instruction to obtain an analysis result.

Step 103: and displaying a second interface, wherein the second interface comprises the analysis result.

After the data analysis system receives any query instruction in step S102, the data analysis system displays an analysis result corresponding to the instruction in the displayed second interface.

And displaying different user behavior path knowledge graphs according to different query instructions by the analysis result, wherein the analysis result includes but is not limited to merging, splitting, hiding or marking the content in the user behavior path knowledge graphs.

The analysis result may also be presented to the user in the form of a chart, including but not limited to a bar chart, a line chart, a radar chart, etc., and the form of the chart is not limited in the present application. It is understood that the analysis result may be displayed in other manners, and the application is not limited thereto.

In the data analysis method, after the data analysis system receives the query instruction input by the user, the content in the behavior path knowledge graph of the user can be analyzed to obtain an analysis result. Compared with the existing user behavior path analysis method, the method has the advantages that the knowledge graph technology is adopted to analyze the user behavior path, so that the user behavior path can be analyzed in a one-way mode, a reverse path in the user behavior path and a user behavior path jumping repeatedly between pages can also be analyzed, and a more comprehensive and more accurate analysis result can be obtained. The method has important significance for visually knowing the whole user behavior path, drilling and excavating user behavior characteristics, analyzing path defects and improving product value.

Optionally, on the basis of the embodiment corresponding to fig. 2, a process of analyzing the content in the user behavior path knowledge graph is specifically described below.

Optionally, the query entry includes an encapsulated visual query interface, and/or a write area for writing to a query language. The query instruction includes one or more of: the method comprises the steps of a shortest path query instruction, a node expansion instruction, a node merging instruction, a node splitting instruction, a condition filtering node instruction according to node attributes, a relation merging instruction, a relation splitting instruction, a condition filtering relation instruction according to relation attributes, a relation weight calculation instruction or a graph index calculation instruction.

When the data analysis system receives a user trigger query control in the interface shown in fig. 3, the data analysis system may display the encapsulated visual query interface, or a write-in area for writing in the query language, or simultaneously display the visual query interface and the write-in area for writing in the query language.

Illustratively, as shown in a of fig. 4, when the data analysis system receives a user-triggered query control, the data analysis system simultaneously displays a visual query interface and a write-in area for writing a query language. The data analysis system can receive the corresponding query language written by the user in the writing area 4010 of the query language, and analyze the content in the knowledge graph of the user behavior path to obtain an analysis result; or the data analysis system can also receive some visual query interfaces which are provided by the data analysis system and used for solidifying and encapsulating common query logic and are clicked by a user in the 4011 area, and the content in the user behavior path knowledge graph is analyzed to obtain an analysis result.

Optionally, after the data analysis system obtains the analysis results corresponding to the two query modes, the data analysis system supports further query on the analysis results, and the query mode is called as extended query. The extended query can only be queried using a packaged K-out query interface or a fixed query statement. At this time, when the data analysis system receives the secondary trigger of the user on the query control, a plurality of fixed query interfaces are displayed for the user to select. For example, after the query result of the data analysis system displays the information related to the node a, the expanded query is performed by taking the node a as the center, and at this time, the query can be performed only by using the query interface provided by the data analysis system, for example, by clicking the interface of the longest path from the node a to the node F.

The shortest path query instruction is used for querying the shortest path between two nodes; the node expansion instruction refers to the analysis of a node after the analysis result of the node is obtained; the node merging instruction is to merge a plurality of nodes into one node; the node splitting instruction is to restore the merged node to the original state; filtering the node instruction according to the node attribute screening condition, namely hiding the nodes which do not accord with the condition according to a certain screening condition; the relation merging instruction merges the relation between the two nodes; the relation split instruction restores the merged relation to the original state; screening the relation filtering instruction according to the relation attribute, and hiding the relation which does not accord with the condition according to a certain screening condition; the relation weight calculation instruction carries out weight analysis on the path between the two nodes; and the graph index calculation instruction performs overall analysis on the user behavior path knowledge graph according to a certain fixed algorithm.

Optionally, when the query entry obtains the query instruction, analyzing the content in the user behavior path knowledge graph according to the query instruction, and obtaining the analysis result may include the following implementation:

the possible implementation mode is as follows: when the query entry acquires the node merging instruction, merging a plurality of nodes corresponding to the node merging instruction, and merging the relationship among the plurality of nodes corresponding to the node merging instruction to obtain an analysis result, wherein the analysis result comprises a user behavior path indication map after the nodes are merged.

For example, the query instruction may include a corresponding instruction control, and when the data analysis system receives a trigger of the user to the instruction control, the data analysis system generates a corresponding query instruction, and executes an analysis process corresponding to the query instruction.

For example, the method for acquiring the node merging instruction by the query portal is as follows:

as shown in a in fig. 5, when the data analysis system receives a trigger of the user to the analysis control, a secondary interface is displayed, and the secondary interface includes a node analysis control, a relationship analysis control, and a graph index calculation control. When the data analysis system receives the trigger of the user on the node analysis control, an interface shown as b in fig. 5 is displayed, and the page comprises a node merging control, a node splitting control and a node filtering control according to the condition.

And when the data analysis system receives the trigger of the user on the node merging control, generating a node merging instruction. The generation modes of the rest query instructions are the same as or similar to the generation mode of the node merging instruction, which is not described in detail later, and are represented by the user triggering the related instruction control.

As shown in c in fig. 5, when the data analysis system receives a trigger of the user to the node merge control, a next-level page may be displayed, where the page includes a plurality of node names and a merge control, and when the data analysis system receives a plurality of node names selected by the user and triggers the merge control, the selected plurality of nodes are merged into one node. Meanwhile, the data analysis system can automatically combine the relationship between the corresponding nodes to generate a combined user behavior path indication map.

In a second possible implementation manner, when the query entry acquires the node splitting instruction, the merged nodes are restored to the state in which detailed node information can be viewed, and the relationship between the nodes corresponding to the splitting instruction is split, so that an analysis result is obtained, wherein the analysis result includes the user behavior path knowledge graph after the nodes are split.

As shown in fig. 6, when the data analysis system receives a trigger of a user to the node splitting control, a next-level page is displayed, where the next-level page includes a plurality of merged node names and a splitting control, and when the data analysis system receives a certain merged node name selected by the user and triggers the splitting control, the merged node is split. The node splitting refers to an operation of restoring the merged node to a state in which detailed node information can be viewed. After the nodes are split, the data analysis system can automatically split the corresponding relations of the nodes to form a split user behavior path knowledge graph.

And in a third possible implementation manner, when the query entry acquires the node filtering instruction according to the node attribute screening condition, hiding nodes which do not conform to the node attribute screening condition and corresponding relations to obtain an analysis result, wherein the analysis result comprises a user behavior path indication map of a hidden part of nodes.

As shown in fig. 7, when the data analysis system receives a trigger of a user to filter a node control by a condition, a next-level page may be displayed, where the next-level page includes an input box and a filter control, where a node property filtering condition may be input.

And the data analysis system receives the node attribute screening conditions input by the user in the input box, and displays the node and the relationship between the nodes related to the input screening conditions after triggering the filtering control. The filter-by-condition node supports layer-by-layer filtering, i.e., filtering again on the basis of the previous filtering.

For example, the node a and the node C are nodes related to a control type, the node B is a node unrelated to the control type, after the data analysis system receives that a user selects the control type in the input box and triggers a filtering instruction, only the relationship between the node a and the node C and the relationship between the node a and the node C are displayed in the corresponding user path knowledge graph, and the relationship between the node a and the node B and the relationship between the node C and the node B are hidden.

And in a possible implementation manner four, when the query entry acquires the relationship merging instruction, merging a plurality of relationships between nodes corresponding to the relationship merging instruction to obtain the analysis result, wherein the analysis result comprises a user behavior path indication map after relationship merging.

The relationship merging is often used in the case of a large number of relationships between two nodes, as shown in fig. 8, when the data analysis system receives a trigger of a user to the relationship merging control, a next-level page may be displayed, where the page includes a plurality of node names, an input box capable of inputting a relationship merging condition, and a merging control.

When the data analysis system receives two node names with a plurality of relations selected by a user, selects a relation combination condition and triggers a combination control, the relations between the two selected node combinations are combined into one, and a combined user behavior path knowledge graph is generated. The system provides three relation merging conditions of grouping merging according to labels, grouping merging according to attribute values and all merging for selection by a user.

And a fifth possible implementation manner, when the query entry acquires the relationship splitting instruction, restoring the merged relationship to a state in which detailed relationship information can be viewed, or splitting the merged relationship in a manner different from that adopted during merging to obtain an analysis result, wherein the analysis result comprises a user behavior path indication map after the relationship splitting.

As shown in fig. 9, when the data analysis system receives a trigger of a user to the relationship splitting control, the data analysis system splits the relationship between the merged nodes.

Firstly, when a data analysis system receives an operation that a user selects a certain node name and triggers a relationship splitting control, the combined relationship is restored to a state in which detailed relationship information can be viewed; and secondly, when the data analysis system receives that a user selects a certain node name, inputs a relationship splitting condition and triggers a splitting control, a virtual relationship of the non-disk graph database with more abundant information is formed, specifically, the relationship is not split according to the condition used in the relationship combination, but a new condition is adopted to split the relationship, and the split relationship is a temporary relationship and cannot be recorded in the graph database. For example, the relationship between the node a and the node B is merged by using the relationship label, when the merged relationship is split, the relationship is split by using the relationship attribute value, the obtained splitting result is only temporarily displayed, if the data analysis system receives the refresh instruction, the splitting result disappears, and the original merged relationship is still maintained.

And in a possible implementation manner, when the query entry acquires the relation filtering instruction according to the relation attribute screening condition, hiding the nodes which do not accord with the relation attribute screening condition and the corresponding relation to obtain an analysis result, wherein the analysis result comprises a user behavior path knowledge graph of the hidden part of nodes.

After receiving the trigger of the user to filter the relation control according to the condition, the data analysis system displays a lower interface (not shown in the figure), wherein the lower interface comprises an input box capable of inputting the relation attribute filtering condition and a filtering control.

And after the data analysis system receives the condition screened by the user in the input box according to the input relation attribute and triggers the filter control, hiding the relation which does not accord with the condition. After the relationship is hidden, some isolated nodes may appear, and the data analysis system hides the isolated nodes after the relationship is hidden. For example, there is a relationship between node a, node B, and node C, and the input relationship screening condition is to hide the relationship between node a and node B and hide the relationship between node B and node C. At this time, the node C will be an isolated node, and the data analysis system will hide the node C at the same time.

A seventh possible implementation manner is that when the query entry obtains the relation weight calculation instruction, the proportion of the internal quantity of a certain path to the total quantity of paths between two points is calculated, and the thickness of the relation is marked according to the proportion, and the larger the proportion is, the thicker lines between the relations are, the analysis result is obtained, and the analysis result includes a user behavior path indication map after the relation is marked.

As shown in fig. 10, after receiving the trigger of the user on the relationship weight calculation control, the data analysis system displays a next-level page including a plurality of node name controls, function selection controls, and calculation controls.

After the data analysis system receives a function which is selected by a user to have two nodes of a plurality of paths and needs to be used, and triggers a calculation control, the data analysis system automatically calculates the proportion of the internal quantity of a certain path to the quantity of all paths between the two nodes, and automatically marks the thickness of the relationship according to the proportion, wherein the larger the proportion is, the thicker the relationship line is. The data analysis system provides methods such as logarithmic normalization, MapMinMax normalization and the like for controlling the thickness of the lines, so that the thickness proportion is relatively reasonably displayed. For example, there are multiple paths between node a and node F, and after weight calculation, path 1 includes 5 nodes, and path 2 includes 3 nodes, and the line of path 1 between node a and node F is thicker than the line of path 2.

In a possible implementation manner eight, when the query entry obtains the graph index calculation instruction, the user behavior path knowledge graph is calculated to obtain a link file, and an analysis result includes the link file.

When the data analysis receives the trigger of the user to the graph index calculation control, the fixed algorithm provided by the data analysis system carries out overall analysis on the user behavior path knowledge graph to obtain a corresponding link file. The fixed algorithm provided by the system comprises a loop identification algorithm, a community discovery algorithm and other graph calculation algorithms. Where a link refers to a physical line from one node to an adjacent node without any other switching node in between. A link file is a collection of multiple links.

For example, a loop recognition algorithm may be used to compute all paths for node a and node F in the entire user behavior path knowledge graph, forming a link file for node a to node F.

Optionally, the query instruction further includes: at least one of a node statistical instruction and a relationship statistical instruction; and when the query entrance acquires the query instruction, analyzing the content in the user behavior path knowledge graph according to the query instruction to obtain an analysis result.

Optionally, when the query instruction obtained by the query entry is the node statistical instruction, analyzing the content in the user behavior path knowledge graph according to the query instruction, and obtaining an analysis result may include the following implementation:

in a first possible implementation manner, when the query entry obtains the node statistical instruction, the number of each node in the user behavior path knowledge graph or the number queried this time is counted according to the name of the node label or the node attribute.

As shown in fig. 11, when the data analysis system receives a user trigger node statistics control, a next-level interface is displayed, where the interface includes multiple input boxes for allowing a user to select a node statistics condition, a full graph statistics control, a cancel control, and a determine control.

The data analysis system can receive the name of a node label selected by a user or the name of a node attribute to count the number of each node in the user behavior path knowledge graph or the number inquired at this time; the quantity inquired at this time refers to the maximum quantity which can be displayed by taking the current time as the node after the data analysis system receives the trigger of the user on the node statistical control, and the maximum quantity can be adjusted according to the configuration of the terminal equipment where the system is located.

For example, as shown in fig. 11, when the data analysis system receives a trigger of the user to the full graph statistics control, the data analysis system performs statistics on the number of each node in the user behavior path knowledge graph; and when the data analysis system receives the trigger of the user on the determination control, counting the number of the query.

And a second possible implementation manner is to count the distribution condition of each node in the user behavior path knowledge graph or the distribution condition of different values in the query result according to the value of the node label or the node attribute.

The data analysis system may receive the value of the node label or the node attribute selected by the user in the interface shown in fig. 11, and count the distribution of each node in the user behavior path knowledge graph or the distribution of different values in the current query result.

And a third possible implementation manner is that, according to a specific attribute in the target node, the distribution situation of the values of the target node in the user behavior path knowledge graph or the distribution situation of the values in the query result is counted.

The data analysis system may receive a specific attribute of a target node selected by a user in the interface shown in fig. 11, and count the distribution of values of the target node in the user behavior path knowledge graph or the distribution of values in the query result.

And a possible implementation manner is four, a plurality of communities are divided according to the function modules to which the node tags belong, and the node distribution conditions and the corresponding quantity information in different communities are counted.

The data analysis system may receive the functional module to which the user selects the node tag in the interface shown in fig. 11 to perform community division, and then count the internal node distribution status and the corresponding quantity information of each community. For example, after the node tags are divided into different node communities by the function of selecting different click objects, the distribution condition and the quantity information of the user accessing each entrance in the entrance and channel click communities are counted.

Optionally, when the query instruction obtained by the query entry is the relationship statistic instruction, analyzing the content in the user behavior path knowledge graph according to the query instruction, and obtaining the analysis result may include the following implementation:

similar to fig. 11, when the data analysis system receives a user trigger to turn on the system control, the data analysis system displays a next-level interface, where the interface includes a plurality of input boxes for the user to select the relationship statistical conditions, a full-graph statistical control, a cancel control, and a confirm control. The specific analysis steps are the same as or similar to those shown in the node statistics, and are not described in detail in the following implementation.

In a first possible implementation manner, when the query entry acquires the relationship statistics instruction, the number of each node in the user behavior path knowledge graph or the number queried this time is counted according to the name of the relationship label or the relationship attribute.

And in a second possible implementation manner, according to the values of the relationship labels or the relationship attributes, the distribution condition of each node in the user behavior path knowledge graph or the distribution condition of different values in the query result is counted.

And a third possible implementation manner is that the distribution situation of the values of the target relationship in the user behavior path knowledge graph or the distribution situation of the values in the query result is counted according to a specific attribute in the target relationship.

And a possible implementation mode is four, the collection and distribution quantity of the main flow path and the paths of the little public are counted according to the Lauda (3 sigma) criterion, and all illegal paths are highlighted according to the principle of full-scale display.

The main flow path is a path with a path passing frequency greater than or equal to a frequency threshold; the minor path is a path with a path passing frequency smaller than the frequency threshold; the threshold may be artificially set for a data analysis system user by analyzing a user behavior path. The illegal path refers to a user operation path beyond the allowable range of the data analysis system program, for example, a developer forgets to shield an old code entry of a product, and some users enter some pages through the entry which is not opened, so that the illegal path is formed; or the user maliciously acquires a direct get or post request from the background through some ways, and the log records the behaviors in the process of directly sending the request to acquire the product information, and an illegal path can be formed after connecting the behaviors according to the time sequence.

And a fifth possible implementation manner, counting all reverse paths under certain conditions.

The data analysis system may receive user selections to statistically count all of the reverse paths under certain conditions. For example, a normal path is a path in which a user enters a second-level page from a first-level page and then enters a third-level page from the second-level page, and a reverse path is a path in which the user returns the third-level page and then returns the first-level page from the second-level page.

When the data analysis system obtains the query instruction, the content in the user behavior path knowledge graph is analyzed according to the query instruction, and an analysis result is obtained, which is explained. On the basis, the data analysis system can also receive further analysis of the analysis result by the user.

Optionally, after the data analysis system obtains the analysis result corresponding to the query instruction, the data analysis system may further receive a further analysis result from the user, and further include the following implementation:

in a first possible implementation manner, the data analysis system further provides a label for the analysis result, including a key node label and a key relationship label. The marked content is not affected by operations of splitting, merging, filtering by condition and the like. And the data analysis system can cancel the corresponding mark only when receiving the condition that the user triggers the key node mark control again.

In a second possible implementation manner, the data analysis system further provides output downloading of the analysis result content. When the data analysis system receives the control triggered by the user to download, the current analysis result is completely output, and the problems of fuzziness and information loss outside the boundary during screenshot output are avoided.

Optionally, the first interface displayed by the data analysis system further includes a sliding time window control 1201, as shown in fig. 12.

The data analysis system displays a first interface after receiving login operation of a user, and the user behavior path knowledge graph displayed on the first interface defaults to select the longest time window for full data display, namely, the time of the user entering the data analysis system is taken as a node, and the maximum number which can be displayed by one query supported by the system is displayed. After the data analysis system receives the modification of the start-stop time of the sliding time window control 1201 by the user, the visualized display content can be modified in real time to only display the node information of the content in the current time period, and correspondingly, the relationship information can also change in real time along with the display/hiding of the node information.

Optionally, the displaying the first interface by the data analysis system further includes the following steps:

step 1011, sends a post request to the graph database.

The data analysis system sends a post request to the graph database, wherein the post request is a query request; the database stores the underlying data needed to construct the first interface, e.g., the number of clicks, dwell time, etc. of the user on certain interfaces.

Step 1012, obtaining the result of the post request, where the request result is js object notation (JSON) string data.

And after the graph database receives the query request of the data analysis system, the graph database returns JSON character string data corresponding to the post request result to the data analysis system after being analyzed.

And 1013, converting the json character string data into a json data format required by visualization.

And 1014, performing visual rendering on the result of json data format required by visualization based on a preset visual rendering technology framework, force guide layout, visual style configuration information and json data format to obtain a visual result.

Specifically, based on a preset visual rendering technical framework, visual rendering is performed on the basis of force-oriented layout and in combination with visual style configuration information. And after the rendering is finished, presenting a visual result with stable and reasonable layout. The visualization result supports the functions of node dragging, interactive display of the attribute of the node relation, expansion after right-clicking the node and the like.

Optionally, as shown in fig. 13, the visualizing the style configuration information in step 1014 includes: node style configuration information and relationship style configuration information.

Wherein the node style configuration information includes: and the node text display content, the node size, the node color and the node icon.

Wherein the relationship style configuration information includes: the relation text display content, the thickness of the relation line, the color of the relation line and the line type of the relation line.

Illustratively, when the data analysis system receives a user trigger visual configuration control, an interface shown as 13 is displayed, and when the data analysis system receives a user trigger node style configuration control and a node text control, a window for allowing the user to input text is displayed. And the data analysis system configures a corresponding visual interface according to the received related characters input by the user and the trigger storage configuration control.

Optionally, as shown in fig. 14, the data analysis method in the embodiment of the present application further includes the following steps:

step 1401, collect user behavior logs.

Wherein the user behavior log comprises: a buried point log and a route log.

And 1402, classifying the user behavior logs to perform entity data processing and relation data processing to obtain entity data and relation data.

Wherein, entity data processing includes: processing an entity label and an entity attribute; the relational data processing comprises the following steps: relationship label processing and relationship attribute processing.

Optionally, the processing of the physical label includes: and processing a buried point entity label and processing a routing entity label.

The types of the embedded points are various, and the embedded point entity tag can be divided into different entities according to different entity service requirements in the processing, including but not limited to one or more of the following:

the method is divided according to control types and can be divided into a picture control entity, a character control entity, a button control entity and the like; according to the functional division of different click objects, the click objects can be divided into tab click entities, entry and channel click entities, specific content click entities, setting option click entities and other functional button click entities; the method can be divided into a payment process entity and a search process entity according to the business function division.

The routing entity label can be divided into one or more of the following items:

dividing different pages into a zero-level page entity (a first page entity), a first-level page entity, a second-level page entity and the like according to the depth division of the pages; dividing according to page breadth, setting different page link in-degree/out-degree quantity thresholds, and dividing different pages into different entities, wherein the thresholds can be a plurality of precise integers (such as 0,1,2 … …) or a plurality of interval value ranges (such as [0,1], [2,4], [3,6] … …); according to the page function division, different pages are divided into an information collection page, a static information display page and a dynamic information inquiry page.

Optionally, the entity attribute processing includes: dividing the labels obtained after the entity label processing into a plurality of first labels according to different service requirements, taking one of the plurality of first labels as an entity label, and taking the rest information except the entity label in the plurality of first labels as the entity attribute information. For example, after selecting the function of different click objects as an entity tag, the entity of each category may have attributes such as a control type, a service function type, and the like.

Optionally, the embedded points or routing information of the single user are arranged and connected in time sequence, so that the original user behavior path relation data of the user can be obtained; the types of the user behavior path relations are various, and the relation labels can be divided into different relations according to different relation business requirements, including but not limited to one or more of the following:

dividing according to the path passing frequency, and dividing into a main stream path, a little path and an illegal path; dividing according to the path value, and dividing into a high-value path, a low-value path, a non-value path and the like; dividing according to the path residence time, which can be divided into a long-time residence path, a short-time residence path, an instantaneous skipping path and the like, and also can be divided into paths with a plurality of residence time grades; dividing the paths according to whether the initial depths of the paths are the same or not, and dividing the paths into transverse paths (with the same depth) and longitudinal paths (with different depths); when the initial depths of the paths are the same, the paths can be divided into sequential paths (shallow source nodes and deep target nodes) and reverse-sequential paths (deep source nodes and shallow target nodes) according to whether shallow access is performed to deep access.

Specifically, the main flow path is a path whose path passing frequency is greater than or equal to a frequency threshold; the minor path is a path with a path passing frequency smaller than the frequency threshold; the threshold may be artificially set for a data analysis system user by analyzing a user behavior path.

For example, the threshold of the path frequency may be set to 70%, and in all user behavior paths, when the frequency of the user passing through the path is greater than 70%, the path is a high-value path; taking shopping products as an example, a high-value path is used for more users to pay for orders through the path,

the low-value path is used for ordering payment for fewer users through the path, and the worthless path is used for ordering payment for almost no users through the path.

Optionally, the processing of the relationship attribute includes dividing the label obtained after the processing of the relationship label into a plurality of second labels according to different service requirements, using one of the plurality of second labels as the relationship label, and using the rest information except the relationship label in the plurality of second labels as the relationship attribute information. For example, the path passing frequency is selected as the relationship label, and the entity of each category can have the attributes of the path value, the path residence time, the path direction and the like.

And 1403, constructing a user behavior path knowledge graph based on the entity data and the relation data.

Specifically, ontology construction and knowledge mapping are performed by using entity data and relationship data, and a user behavior path knowledge graph is obtained.

Wherein, the ontology construction comprises: building nodes by using entity labels, wherein the entity labels correspond to the nodes one by one, and each node corresponds to respective entity attribute information; and establishing a relationship by using the relationship labels, wherein the relationship labels correspond to the relationships one by one, and each relationship corresponds to respective relationship attribute information.

The ontology is a generalized data model, only abstracted general types are modeled, and the model does not contain concrete physical information, namely the ontology is used for defining physical types and describing attributes of the physical types.

For example, any one of a picture control entity, a text control entity and a button control entity in the process of processing a buried point entity label according to control types is selected as an entity label, a corresponding node is constructed, and other information such as a service function and a button function is selected as corresponding attribute information.

The knowledge mapping comprises: and mapping the fields of the data file in the actual data processing one by one according to the entity label, the relation label and the attribute information corresponding to the entity label and the relation label in the body construction. That is, the ontology model is given a real-world concrete interpretation meaning. For example, if the name of an attribute in an entity is "page _ depth" (page depth), and a field in an actual data source is also "page _ depth", then automatic matching can be performed here.

Fig. 15 is a schematic diagram illustrating a configuration of a data analysis system according to an embodiment of the present application. As shown in fig. 15, the data analysis system includes an analysis module and a construction module. The analysis module is a data analysis system upper layer display module and is used for analyzing the content in the user behavior path knowledge graph according to the query instruction after receiving the query instruction input by the user to obtain an analysis result. The construction module is a bottom layer module supporting the normal operation of the analysis module.

Wherein, the analysis module includes: and a data analysis layer. The construction module comprises: the system comprises a log collection layer, a data processing layer, a map construction layer and a map visual layer.

Optionally, the building module implementing the normal operation of the analysis module includes the following steps:

s1501, collecting user behavior logs by a log collection layer.

Wherein the user behavior log comprises: a buried point log and a route log.

S1502, the data processing layer classifies the user behavior logs collected by the log collection layer to perform entity data processing and relation data processing to obtain entity data and relation data.

Wherein, entity data processing includes: processing an entity label and an entity attribute; the relational data processing comprises the following steps: processing a relation label and processing a relation attribute; the process of entity data processing and relationship data processing is the same as that of step 1402, and is not described herein.

S1503, the map construction layer constructs the user behavior path knowledge map based on the entity data and the relation data processed by the data analysis layer.

The process of building the user behavior path knowledge graph by the graph building layer is the same as that in step 1403, and is not described herein.

S1504, the map visual layer visually configures the user behavior path knowledge map constructed by the map construction layer.

The mode of visually configuring the constructed user behavior path knowledge graph by the graph visual layer is the same as that shown in fig. 13, and is not described herein again.

As shown in fig. 16, an embodiment of the present application further provides a data analysis apparatus 160, including:

a first display module 1601, configured to display a first interface, where the first interface includes a query entry and part or all of a user behavior path knowledge graph; the user behavior path knowledge graph comprises a plurality of nodes, relations among the nodes, node attributes of any node and relation attributes of any relation; the node attribute comprises one or more of a node type and a click number, and the relationship attribute comprises one or more of a path frequency, a path type or a path occurrence time of the user behavior path.

The analysis module 1602 is configured to, when the query entry receives the query instruction, analyze the content in the user behavior path knowledge graph according to the query instruction, and obtain an analysis result.

A second display module 1603 for displaying a second interface, which includes the analysis result.

The embodiment of the application also provides the electronic equipment, wherein the electronic equipment comprises a processor and a memory which are connected through a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the electronic device is used for storing visualization data, view resource data and the like. The computer program is executed by a processor to implement a data analysis method.

The embodiment of the application also provides a computer readable storage medium. The methods described in the above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media may include computer storage media and communication media, and may include any medium that can communicate a computer program from one place to another. A storage medium may be any target medium that can be accessed by a computer.

In one possible implementation, the computer-readable medium may include Random Access Memory (RAM), Read-Only Memory (ROM), compact disc Read-Only Memory (CD-ROM) or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and Disc, as used herein, includes Disc, laser Disc, optical Disc, Digital Versatile Disc (DVD), floppy disk and blu-ray Disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above embodiments are only for illustrating the embodiments of the present invention and are not to be construed as limiting the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the embodiments of the present invention shall be included in the scope of the present invention.

Claims

1. A method of data analysis, comprising:

displaying a first interface, wherein the first interface comprises a query entrance and part or all of a user behavior path knowledge graph; the user behavior path knowledge graph comprises a plurality of nodes, relationships among the plurality of nodes, node attributes of any one of the nodes, and relationship attributes of any one of the relationships; the node attribute comprises one or more of a node type or a click time, and the relationship attribute comprises one or more of a path frequency, a path type or a path occurrence time of a user behavior path;

displaying a second interface, the second interface including the analysis result.

2. The method of claim 1, wherein the query entry comprises an encapsulated visual query interface and/or a write area for writing query language;

3. The method according to claim 2, wherein when the query entry obtains the query instruction, analyzing the content in the user behavior path knowledge graph according to the query instruction to obtain an analysis result, including:

when the query entry acquires the node merging instruction, merging a plurality of nodes corresponding to the node merging instruction, and merging the relationship among the plurality of nodes corresponding to the node merging instruction to obtain the analysis result, wherein the analysis result comprises a user behavior path indication map after the nodes are merged;

or when the query entry acquires the node splitting instruction, restoring the merged node to a state in which detailed node information can be viewed, and splitting the relationship between the nodes corresponding to the splitting instruction to obtain the analysis result, wherein the analysis result comprises a user behavior path indication map after the node is split;

or when the query entry acquires the node filtering by node attribute condition instruction, hiding nodes which do not accord with the node attribute filtering condition and corresponding relations to obtain the analysis result, wherein the analysis result comprises a user behavior path indication map of a hidden part of nodes;

or when the query entry acquires the relationship merging instruction, merging a plurality of relationships between nodes corresponding to the relationship merging instruction to obtain the analysis result, wherein the analysis result comprises a user behavior path indication map after relationship merging;

or when the query entry acquires the relationship splitting instruction, restoring the merged relationship to a state in which detailed relationship information can be viewed, and acquiring the analysis result, wherein the analysis result comprises a user behavior path indication map after the relationship splitting;

or when the query entry acquires the filtering relation instruction according to the relation attribute screening condition, hiding nodes which do not accord with the relation attribute screening condition and corresponding relations to obtain the analysis result, wherein the analysis result comprises a user behavior path indication map of a hidden part of nodes;

or when the query entry acquires the relation weight calculation instruction, calculating the proportion of the internal quantity of a certain path to the total quantity of paths between two points, and marking the thickness of the relation according to the proportion, wherein the larger the proportion is, the thicker lines between the relations are, and the analysis result is obtained and comprises a user behavior path indication map after the relation is marked;

or when the query entry acquires the graph index calculation instruction, calculating the user behavior path knowledge graph to obtain a link file, wherein the analysis result comprises the link file.

4. The method of claim 2 or 3, wherein the query instruction further comprises: at least one of a node statistical instruction and a relationship statistical instruction; when the query entrance acquires a query instruction, analyzing the content in the user behavior path knowledge graph according to the query instruction to obtain an analysis result, including:

when the query entry acquires the node statistical instruction, the number of each node in the user behavior path knowledge graph or the number queried at this time is counted according to the name of the node label or the node attribute, or, according to the value of the node label or the node attribute, counting the distribution condition of each node in the user behavior path knowledge graph or the distribution condition of different values in the query result, or, according to a specific attribute in the target node, counting the value distribution of the target node in the user behavior path knowledge graph or the value distribution in the current query result, or dividing a plurality of communities according to the function modules to which the node labels belong, and counting the node distribution conditions and the corresponding quantity information in different communities;

or when the query entry acquires the relationship statistic instruction, according to the name of a relationship label or a relationship attribute, counting the number of each node in the user behavior path knowledge graph or the number queried this time, or according to the value of the relationship label or the relationship attribute, counting the distribution situation of each node in the user behavior path knowledge graph or the distribution situation of different values in the query result this time, or according to a specific attribute in a target relationship, counting the distribution situation of the value of the target relationship in the user behavior path knowledge graph or the distribution situation of the value in the query result this time, or according to a 3 sigma principle, counting the collection and distribution number of a main flow path and a little path, and highlighting all illegal paths according to a full-volume display principle, or counting all reverse paths under certain conditions; the main flow path is a path with a path passing frequency greater than or equal to a frequency threshold, the Xiaozhong path is a path with a path passing frequency less than the frequency threshold, and the illegal path is a user operation path beyond a program allowable range.

5. The method of any of claims 1-3, wherein the first interface further comprises a sliding time window, the method further comprising:

when a start-stop time modification to the sliding time window is received, the first interface displays a portion of the user behavior path knowledge graph within the time of the sliding time window after the modification.

6. The method of any of claims 1-3, wherein displaying the first interface comprises:

sending a post request to a graph database;

obtaining a result of the post request; the result is js object numbered notation json character string data;

converting the result into a json data format required by visualization;

performing visual rendering on the basis of a preset visual rendering technology frame, force guide layout, visual style configuration information and a json data format result required by the visualization to obtain a visual result; and the visualization result supports one or more functions of node dragging, node relation attribute interactive display and node right-click expansion.

7. The method of claim 6, wherein the visualization style configuration information comprises: node style configuration information and relationship style configuration information;

the node style configuration information includes: one or more of node character display content, node size, node color and node icon;

the relationship style configuration information includes: the relation text display content, the thickness of the relation line, the color of the relation line and the line type of the relation line.

8. The method according to any one of claims 1-3, further comprising:

collecting a user behavior log;

and constructing the user behavior path knowledge graph based on the entity data and the relation data.

9. The method of claim 8, wherein the user behavior log comprises a landed log and a routing log; the entity data processing comprises: processing an entity label and an entity attribute; the relational data processing comprises: processing a relation label and processing a relation attribute;

wherein the physical tag processing comprises: processing a buried point entity label and a routing entity label; in the processing of the embedded point entity label, the embedded point entity label is divided according to different entity service requirements to obtain different entities, wherein the division according to the different entity service requirements comprises one or more of the following steps: dividing according to the type of the control, dividing according to the functions of different click objects, dividing according to the service functions and the like; in the routing entity label processing, the routing entity labels are divided according to one or more of the following steps: dividing according to the depth of a page, dividing according to the breadth of the page and dividing according to the function of the page;

the entity attribute processing comprises: dividing the tags obtained after the entity tag processing into a plurality of first tags according to different service requirements, taking one of the plurality of first tags as the entity tag, and taking the rest information except the entity tag in the plurality of first tags as the entity attribute information;

dividing the relation labels according to different relation service requirements to obtain different entities in the processing of the relation labels; the partitioning according to different relationship business requirements includes one or more of the following: dividing according to path frequency, dividing according to path value, dividing according to path residence time, dividing according to whether path starting depths are the same or not, and dividing according to whether path starting depths are the same or not and according to whether shallow-to-deep access is performed;

the relationship attribute processing comprises: dividing the labels obtained after the processing of the relationship labels into a plurality of second labels according to different service requirements, taking one of the plurality of second labels as the relationship label, and taking the rest information except the relationship label in the plurality of second labels as the relationship attribute information.

10. The method of claim 9, wherein building the user behavior path knowledge graph based on the entity data and the relationship data comprises:

using the entity data and the relation data to perform ontology construction and knowledge mapping to obtain the user behavior path knowledge graph;

wherein the ontology building comprises: constructing nodes by using the entity labels, wherein the entity labels correspond to the nodes one by one, and each node corresponds to respective entity attribute information; establishing a relationship by using the relationship labels, wherein the relationship labels correspond to the relationship one by one, and each relationship corresponds to respective relationship attribute information;

the knowledge mapping includes: and mapping fields of the data file in actual data processing one by one according to the entity label, the relation label and the attribute information corresponding to the entity label and the relation label in the body construction.

11. A data analysis apparatus, comprising:

the first display module is used for displaying a first interface, and the first interface comprises a query entrance and part or all of a user behavior path knowledge graph; the user behavior path knowledge graph comprises a plurality of nodes, relationships among the plurality of nodes, node attributes of any one of the nodes, and relationship attributes of any one of the relationships; the node attribute comprises one or more of a node type and a click frequency, and the relationship attribute comprises one or more of a path frequency, a path type or a path occurrence time of a user behavior path;

the analysis module is used for analyzing the content in the user behavior path knowledge graph according to the query instruction when the query entrance receives the query instruction to obtain an analysis result;

and the second display module is used for displaying a second interface, and the second interface comprises the analysis result.

12. An electronic device, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the method of any one of claims 1 to 10.

13. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 10.