WO2019153543A1 - 数据维度生成方法、装置、设备以及计算机可读存储介质 - Google Patents

数据维度生成方法、装置、设备以及计算机可读存储介质 Download PDF

Info

Publication number
WO2019153543A1
WO2019153543A1 PCT/CN2018/085258 CN2018085258W WO2019153543A1 WO 2019153543 A1 WO2019153543 A1 WO 2019153543A1 CN 2018085258 W CN2018085258 W CN 2018085258W WO 2019153543 A1 WO2019153543 A1 WO 2019153543A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
table information
dimension
algorithm
preset
Prior art date
Application number
PCT/CN2018/085258
Other languages
English (en)
French (fr)
Inventor
陈健鹏
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019153543A1 publication Critical patent/WO2019153543A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/217Database tuning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof

Definitions

  • the present application relates to the field of computer technologies, and in particular, to a data dimension generation method, apparatus, device, and computer readable storage medium.
  • the so-called data dimension refers to data with specific attributes, such as time attributes, geographical attributes, spatial attributes, etc., for the analysis of data dimensions, Data needs to be used as an entity.
  • the embodiments of the present application provide a data dimension generation method, apparatus, device, and computer readable storage medium, which can greatly reduce the time consumption of dimension statistics and analysis, and simplify the processing steps.
  • the embodiment of the present application provides a data dimension generation method, where the method includes:
  • the table information is analyzed according to a preset analysis algorithm
  • the analysis result of the table information is optimized according to a preset optimization function, and a data dimension about the data table is generated.
  • an embodiment of the present application provides a data dimension generating apparatus, where the apparatus includes:
  • An obtaining unit configured to obtain a data table in which dimension data is saved
  • a parsing unit configured to parse the data table to obtain table information
  • An analyzing unit configured to analyze the table information according to a preset analysis algorithm
  • an optimization unit configured to optimize an analysis result of the table information according to a preset optimization function, and generate a data dimension about the data table.
  • the embodiment of the present application further provides a data dimension generating device, including:
  • a memory for storing a program that implements a data dimension generation method
  • a processor configured to execute a program implemented in the memory to implement a data dimension generation method, to perform the following operations:
  • the table information is analyzed according to a preset analysis algorithm
  • the analysis result of the table information is optimized according to a preset optimization function, and a data dimension about the data table is generated.
  • an embodiment of the present application further provides a computer readable storage medium storing one or more programs, the one or more programs being executable by one or more processors Execute to implement the following steps:
  • the table information is analyzed according to a preset analysis algorithm
  • the analysis result of the table information is optimized according to a preset optimization function, and a data dimension about the data table is generated.
  • the embodiment of the present application greatly reduces the time consumption of dimension statistics and analysis, and simplifies the processing steps by improving the data dimension generation method.
  • FIG. 1 is a schematic flowchart of a data dimension generation method provided by an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of a data dimension generation method provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a method for generating a data dimension according to an embodiment of the present application
  • FIG. 4 is a schematic flowchart of a data dimension generation method according to another embodiment of the present application.
  • FIG. 5 is a schematic block diagram of a data dimension generating apparatus according to an embodiment of the present application.
  • FIG. 6 is another schematic block diagram of a data dimension generating apparatus according to an embodiment of the present application.
  • FIG. 7 is another schematic block diagram of a data dimension generating apparatus according to an embodiment of the present application.
  • FIG. 8 is another schematic block diagram of a data dimension generating apparatus according to an embodiment of the present application.
  • FIG. 9 is another schematic block diagram of a data dimension generating apparatus according to an embodiment of the present disclosure.
  • FIG. 10 is a schematic structural diagram of a data dimension generating device according to an embodiment of the present application.
  • FIG. 1 is a schematic flowchart of a method for generating a data dimension according to an embodiment of the present application.
  • the method can be run on terminals such as smart phones (such as Android phones, IOS phones, etc.), tablets, laptops, and smart devices.
  • the data dimension generation method described in the embodiment of the present application can greatly reduce the time consumption of dimension statistics and analysis, and simplify the processing steps.
  • FIG. 1 is a schematic flowchart of a data dimension generation method provided by an embodiment of the present application. The method includes steps S101 to S104.
  • the multi-dimensionality in the data table refers to observing data from different dimensions to obtain different results, so that people can understand the essence of things more comprehensively and clearly, and the common data table is more common.
  • Dimensional analysis operations mainly include: drilling (up and down drilling), slicing, dicing, and rotation.
  • drilling refers to changing the level of the dimension, transforming the granularity of the analysis; and drilling also includes drilling and drilling.
  • Drilling is a process of summarizing low-level detail data to high-level summary data in a certain dimension, reducing the dimension of the analysis; while drilling is the opposite, it is to refine the high-level summary data.
  • the process of deepening into the low-level detail data increases the dimension of the analysis; the slice refers to the multi-dimensional analysis. If a value is defined in a certain dimension, it is called a slice of the original analysis; It means that if multiple dimensions are defined, each dimension is limited to a set of value ranges, which is called a dicing of the original analysis; rotation means that in multidimensional analysis, the dimensions are all according to a certain Shun Displaying, if the order and direction conversion dimension, two dimensions or position of the exchange.
  • the table information includes a data dimension field and a data depth
  • the parsing the data table to obtain table information specifically includes the following steps: parsing the data table to obtain a data dimension field and a data depth.
  • the data table in the svn path corresponding to the job application is scanned to obtain the data table, and the data table may be one or more, and the number of data tables is not limited herein.
  • the dimension data is saved in the data table, and the data of the table is parsed by a preset parsing tool.
  • a preset parsing tool such as CapAnalysis (visual data table parsing tool), specifically, the parsing tool will automatically scan the parsing tool.
  • the data table is obtained by acquiring a keyword (for example, a primary key) of the data table, and the data table is parsed by using the obtained keyword, and the table information of the data table storing the dimensional data, such as a dimension field, a data depth, and the like, can be obtained.
  • a keyword for example, a primary key
  • the data table is parsed by using the obtained keyword, and the table information of the data table storing the dimensional data, such as a dimension field, a data depth, and the like, can be obtained.
  • the parsed table information is analyzed.
  • the preset analysis algorithm used herein includes a shortest path algorithm and a minimum spanning tree algorithm, wherein the shortest path algorithm refers to starting from a vertex and along the edge of the graph. In the path through which another vertex passes, the path with the smallest sum of weights on each side is called the shortest path; the minimum spanning tree calculation refers to the cost of all edges and the smallest spanning tree in all spanning trees of the connected network.
  • the main goal is to enable communication between any two of the n cities, but the cost of laying fiber optic cables is high, and each city The cost of laying fiber optic cables varies, so another goal is to minimize the total cost of laying fiber optic cables.
  • index refers to a specific dimension element that can be measured by the total amount or the ratio, and another common name is measure. For example: population, GDP, income, number of users, profit margin, retention rate, coverage, etc.
  • KPI indicator system which is to measure the business operation of the company through several key indicators. Indicators need to be aggregated and averaged, and they need to be aggregated under certain preconditions, such as time, place, and scope, which is the statistical caliber and scope we often say. Indicators can be divided into absolute number indicators and relative number indicators.
  • Absolute number indicators reflect scale indicators, such as population, GDP, income, and number of users, while relative indicators are mainly used to reflect quality indicators such as profit. Rate, retention rate, coverage, etc.
  • the data itself is deceptive.
  • the sales of Company A this month is 600W
  • the other two largest competing products, Company B and C are 300W and 250W respectively.
  • On the surface there is no sales of Company A, but in fact, Company A’s sales last month were 800W, and the competing products were 200W and 180W respectively.
  • the data looks very beautiful, but in terms of contrast and ratio, A The company has fallen far. Therefore, for data analysis, it is best to exist in the form of ratio.
  • To have comparative data of relative nature it is meaningless to have one or several data cases. The points are connected into lines, and the lines constitute faces to display. Finally, a series of analysis executions of the dimensional analysis of the indicators of the generated data dimensions are performed, and the analysis results are generated.
  • the preset analysis algorithm includes a shortest path algorithm and a minimum spanning tree algorithm
  • the step S103 includes steps S201 to S202.
  • the table information is analyzed according to a shortest path algorithm to obtain a single source and no negative power shortest path of the table information.
  • the shortest path algorithm refers to a path from a vertex that passes through the edge of the graph to another vertex, and the sum of the weights on each side is the smallest, using the shortest path algorithm.
  • the table information is analyzed, and the table information is abstracted into a path map, and any point in the path map can be selected as a vertex.
  • the single source of the table information can be obtained, and the weight of the table information can be shortest. path.
  • the minimum spanning tree calculation refers to the cost of all edges and the smallest spanning tree in all spanning trees of the connected network.
  • the table information is analyzed by the minimum spanning tree algorithm, and the table information is abstracted into a spanning tree, which can be selected and generated. Any side of the tree is used as the starting edge.
  • the multi-source and negative-weight shortest path of the table information can be obtained. For example, in actual use, using the minimum spanning tree algorithm to lay optical cables between n cities, the main goal is to enable communication between any two of the n cities, but the cost of laying the optical cable is very high. And the cost of laying fiber optic cables between cities is different, so another goal is to minimize the total cost of laying fiber optic cables.
  • the preset optimization function refers to a method for optimizing an analysis result by using a dynamic rule having a standard mathematical expression and a clear and clear problem solving method
  • the dynamic programming is often directed to An optimization problem, because the nature of various problems is different, the conditions for determining the optimal solution are also different from each other. Therefore, the method of dynamic programming has different problem solving methods for different problems, and there is no omnipotence.
  • the dynamic programming algorithm can solve various optimization problems. Specific problems must be analyzed and processed in a specific way, with a rich imagination to build models, using creative techniques to solve.
  • the specific optimization process may be: (1) determining the decision object of the problem; (2) dividing the decision process stage; (3) determining the state variable for each stage; (4) determining the cost function according to the state variable and Objective function; (5) Establish the transfer process of state variables at each stage and determine the state transition equation.
  • the step S104 includes steps S301 to S304.
  • the optimization stage is divided into the analysis result.
  • the analysis result of the table information is used as a decision object for optimization, and the optimization stage is divided for the decision object, for example, the optimization process is divided into a plurality of interrelated stages in a preset order. So that they can be optimized in a certain order, such as 4 stages of ABCDE in spatial order, and so on.
  • each stage needs to make a decision, and the decision is determined according to the situation of the system.
  • the status is the information necessary to describe the condition of the system.
  • the starting point position of each stage is the state.
  • the state can be described by a variable, which is called a state variable.
  • the cost function and the objective function are used to measure the pros and cons of the optimization process.
  • the state variable X k performs decision in the state of phase k, which not only brings about the transfer of the system state, but also affects the cost function and the objective function.
  • the phase effect is the influence of the cost function and the objective function in the execution stage decision.
  • the state variable X k+1 of the k+1th stage is also determined accordingly. That is, X k+1 is a function of X k and U k .
  • the embodiment of the present application obtains the data table in which the dimension data is saved; parses the data table to obtain the table information; analyzes the table information according to the preset analysis algorithm; and compares the table according to a preset optimization function.
  • the analysis results of the information are optimized and a data dimension is generated with respect to the data table.
  • the embodiment of the present application greatly reduces the time consumption of dimension statistics and analysis, and simplifies the processing steps by improving the data dimension generation method.
  • FIG. 4 is a schematic flowchart of a data dimension generation method according to an embodiment of the present application.
  • the method can be run on terminals such as smart phones (such as Android phones, IOS phones, etc.), tablets, laptops, and smart devices. As shown in FIG. 4, the method includes steps S401 to S405.
  • the multi-dimensionality in the data table refers to observing data from different dimensions to obtain different results, so that people can more fully and clearly understand the essence of things, and the common data table.
  • the multi-dimensional analysis operations mainly include: drilling (up and down drilling), slicing, dicing, and rotation. Among them, drilling refers to changing the level of the dimension, transforming the granularity of the analysis; and drilling also includes drilling and lowering. Drilling and drilling are processes that summarize low-level detail data into high-level summary data in a certain dimension, reducing the dimension of the analysis; while drilling is the opposite, it is to fine-level the high-level summary data.
  • the process of deepening into the low-level detail data increases the dimension of the analysis; the slice refers to the multi-dimensional analysis. If a value is defined in a certain dimension, it is called a slice of the original analysis; A block means that if multiple dimensions are defined, each dimension is defined as a set of value ranges, which is called a dicing of the original analysis; rotation means that in multidimensional analysis, the dimensions are according to a certain One Display order, if the order and direction conversion dimension, two dimensions or position of the exchange.
  • the table information includes a data dimension field and a data depth
  • the parsing the data table to obtain table information specifically includes the following steps: parsing the data table to obtain a data dimension field and a data depth.
  • the data table in the svn path corresponding to the job application is scanned to obtain the data table, and the data table may be one or more, and the number of data tables is not limited herein.
  • the dimension data is saved in the data table, and the data of the table is parsed by a preset parsing tool.
  • a preset parsing tool such as CapAnalysis (visual data table parsing tool), specifically, the parsing tool will automatically scan the parsing tool.
  • the data table is obtained by acquiring a keyword (for example, a primary key) of the data table, and the data table is parsed by using the obtained keyword, and the table information of the data table storing the dimensional data, such as a dimension field, a data depth, and the like, can be obtained.
  • a keyword for example, a primary key
  • the data table is parsed by using the obtained keyword, and the table information of the data table storing the dimensional data, such as a dimension field, a data depth, and the like, can be obtained.
  • the parsed table information is analyzed.
  • the preset analysis algorithm used herein includes a shortest path algorithm and a minimum spanning tree algorithm, wherein the shortest path algorithm refers to starting from a vertex and along the edge of the graph. In the path through which another vertex passes, the path with the smallest sum of weights on each side is called the shortest path; the minimum spanning tree calculation refers to the cost of all edges and the smallest spanning tree in all spanning trees of the connected network.
  • the main goal is to enable communication between any two of the n cities, but the cost of laying fiber optic cables is high, and each city The cost of laying fiber optic cables varies, so another goal is to minimize the total cost of laying fiber optic cables.
  • index refers to a specific dimension element that can be measured by the total amount or the ratio, and another common name is measure. For example: population, GDP, income, number of users, profit margin, retention rate, coverage, etc.
  • KPI indicator system which is to measure the business operation of the company through several key indicators. Indicators need to be aggregated and averaged, and they need to be aggregated under certain preconditions, such as time, place, and scope, which is the statistical caliber and scope we often say. Indicators can be divided into absolute number indicators and relative number indicators.
  • Absolute number indicators reflect scale indicators, such as population, GDP, income, and number of users, while relative indicators are mainly used to reflect quality indicators such as profit. Rate, retention rate, coverage, etc.
  • the data itself is deceptive.
  • the sales of Company A this month is 600W
  • the other two largest competing products, Company B and C are 300W and 250W respectively.
  • On the surface there is no sales of Company A, but in fact, Company A’s sales last month were 800W, and the competing products were 200W and 180W respectively.
  • the data looks very beautiful, but in terms of contrast and ratio, A The company has fallen far. Therefore, for data analysis, it is best to exist in the form of ratio.
  • To have comparative data of relative nature it is meaningless to have one or several data cases. The points are connected into lines, and the lines constitute faces to display. Finally, a series of analysis executions of the dimensional analysis of the indicators of the generated data dimensions are performed, and the analysis results are generated.
  • the preset optimization function refers to a method for optimizing an analysis result by using a dynamic rule having a standard mathematical expression and a clear and clear problem solving method
  • the dynamic programming is often directed to An optimization problem, because the nature of various problems is different, the conditions for determining the optimal solution are also different from each other. Therefore, the method of dynamic programming has different problem solving methods for different problems, and there is no omnipotence.
  • the dynamic programming algorithm can solve various optimization problems. Specific problems must be analyzed and processed in a specific way, with a rich imagination to build models, using creative techniques to solve.
  • the specific optimization process may be: (1) determining the decision object of the problem; (2) dividing the decision process stage; (3) determining the state variable for each stage; (4) determining the cost function according to the state variable and Objective function; (5) Establish the transfer process of state variables at each stage and determine the state transition equation.
  • the embodiment of the present application further provides a data dimension generation apparatus, and the apparatus 100 includes: an acquisition unit 101, a parsing unit 102, an analysis unit 103, and an optimization unit 104.
  • the obtaining unit 101 is configured to acquire a data table in which dimension data is stored.
  • the parsing unit 102 is configured to parse the data table to obtain table information.
  • the analyzing unit 103 is configured to analyze the table information according to a preset analysis algorithm.
  • the optimization unit 104 is configured to optimize the analysis result of the table information according to a preset optimization function, and generate a data dimension about the data table.
  • the embodiment of the present application obtains a data table in which dimension data is stored; parses the data table to obtain table information; analyzes the table information according to a preset analysis algorithm; and the table information according to a preset optimization function The results of the analysis are optimized and a data dimension is generated about the data table.
  • the embodiment of the present application greatly reduces the time consumption of dimension statistics and analysis, and simplifies the processing steps by improving the data dimension generation method.
  • the table information includes a data dimension field and a data depth
  • the parsing unit 102 includes:
  • the parsing subunit 1021 is configured to parse the data table to obtain a data dimension field and a data depth.
  • the preset analysis algorithm includes a shortest path algorithm and a minimum spanning tree algorithm
  • the analyzing unit 103 includes:
  • the first analyzing sub-unit 1031 is configured to analyze the table information according to a shortest path algorithm to obtain a single source and no negative power shortest path of the table information.
  • the second analysis sub-unit 1032 is configured to analyze the table information according to a minimum spanning tree algorithm to obtain a multi-source, negative-weight shortest path of the table information.
  • the optimization unit 104 includes:
  • the dividing unit 1041 is configured to divide the optimization result into the analysis result.
  • the first determining unit 1042 is configured to determine a state variable of each optimization stage.
  • the second determining unit 1043 is configured to determine the cost function and the objective function according to the state variable.
  • the establishing unit 1044 is configured to establish a transfer process of each stage state variable according to the cost function and the objective function, and determine a state transition equation to optimize the analysis result.
  • the embodiment of the present application obtains the data table in which the dimension data is saved; parses the data table to obtain the table information; analyzes the table information according to the preset analysis algorithm; and compares the table according to a preset optimization function.
  • the analysis results of the information are optimized and a data dimension is generated with respect to the data table.
  • the embodiment of the present application greatly reduces the time consumption of dimension statistics and analysis, and simplifies the processing steps by improving the data dimension generation method.
  • the embodiment of the present application further provides a data dimension generation apparatus, where the apparatus 200 includes: an acquisition unit 201, an analysis unit 202, an analysis unit 203, an optimization unit 204, and a sorting unit. 205.
  • the obtaining unit 201 is configured to acquire a data table in which dimension data is stored.
  • the parsing unit 202 is configured to parse the data table to obtain table information.
  • the analyzing unit 203 is configured to analyze the table information according to a preset analysis algorithm.
  • the optimization unit 204 is configured to optimize the analysis result of the table information according to a preset optimization function, and generate a data dimension about the data table.
  • the sorting unit 205 is configured to sort the data dimensions of the data table according to the correlation size.
  • the above data dimension generating means can be implemented in the form of a computer program which can be run on the device as shown in FIG.
  • FIG. 10 is a schematic structural diagram of a data dimension generating device according to the present application.
  • the device may be a terminal or a server, wherein the terminal may be a communication device such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant, and a wearable device.
  • the server can be a standalone server or a server cluster consisting of multiple servers.
  • the computer device 500 includes a processor 502, a non-volatile storage medium 503, an internal memory 504, and a network interface 505 connected by a system bus 501.
  • the non-volatile storage medium 503 of the computer device 500 can store an operating system 5031 and a computer program 5032.
  • the processor 502 can be caused to execute a data dimension generation method.
  • the processor 502 of the computer device 500 is used to provide computing and control capabilities to support the operation of the entire computer device 500.
  • the internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503, which when executed by the processor, causes the processor 502 to perform a data dimension generation method.
  • the network interface 505 of the computer device 500 is used to perform network communications, such as sending assigned tasks and the like. It will be understood by those skilled in the art that the structure shown in FIG. 10 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation of the computer device to which the solution of the present application is applied.
  • the specific computer device may It includes more or fewer components than those shown in the figures, or some components are combined, or have different component arrangements.
  • the processor 502 performs the following operations:
  • the table information is analyzed according to a preset analysis algorithm
  • the analysis result of the table information is optimized according to a preset optimization function, and a data dimension about the data table is generated.
  • the table information includes a data dimension field and a data depth
  • the parsing the data table to obtain the table information includes:
  • the data table is parsed to obtain a data dimension field and a data depth.
  • the preset analysis algorithm includes a shortest path algorithm and a minimum spanning tree algorithm, and the analyzing the table information according to a preset analysis algorithm, including:
  • the table information is analyzed according to a minimum spanning tree algorithm to obtain a multi-source, negative-weight shortest path of the table information.
  • the optimizing the analysis result of the table information according to a preset optimization function comprises:
  • the transfer process of the state variables of each stage is established according to the cost function and the objective function, and the state transition equation is determined to optimize the analysis result.
  • the processor 502 also performs the following operations:
  • the data dimensions of the data table are sorted according to the size of the correlation.
  • the embodiment of the data dimension generating device shown in FIG. 10 does not constitute a limitation on the specific configuration of the data dimension generating device.
  • the data dimension generating device may include more than the illustration. Or fewer parts, or combine some parts, or different parts.
  • the data dimension generation device includes only the memory and the processor. In such an embodiment, the structure and function of the memory and the processor are the same as those of the embodiment shown in FIG. 10, and details are not described herein again.
  • the application provides a computer readable storage medium storing one or more computer programs, the one or more computer programs being executable by one or more processors to implement the data dimensions described above The generation method.
  • the foregoing storage medium of the present application includes: a magnetic disk, an optical disk, a read-only memory (ROM), and the like, which can store various program codes.
  • the units in all the embodiments of the present application may be implemented by a general-purpose integrated circuit, such as a CPU (Central Processing Unit), or by an ASIC (Application Specific Integrated Circuit).
  • a general-purpose integrated circuit such as a CPU (Central Processing Unit), or by an ASIC (Application Specific Integrated Circuit).
  • the steps in the data dimension generation method of the embodiment of the present application may be sequentially adjusted, merged, and deleted according to actual needs.
  • the units in the data dimension generating apparatus of the embodiment of the present application may be combined, divided, and deleted according to actual needs.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请实施例公开了一种数据维度生成方法、装置、设备以及计算机可读存储介质,其中,所述方法包括:获取保存有维度数据的数据表;解析所述数据表以得到表信息;根据预设分析算法对所述表信息进行分析;根据预设优化函数对所述表信息的分析结果进行优化,并生成关于所述数据表的数据维度。本申请实施例通过改进数据维度的生成方法,很大程度上减少维度统计、分析的耗时,并简化处理步骤。

Description

数据维度生成方法、装置、设备以及计算机可读存储介质
本申请要求于2018年2月9日提交中国专利局、申请号为CN 201810135918.5、申请名称为“数据维度生成方法、装置、设备以及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及一种数据维度生成方法、装置、设备以及计算机可读存储介质。
背景技术
现有技术中,对于数据维度的分析没有一个完整的工具去完成,其中,所谓数据维度指的是具有特定属性的数据,如:时间属性、地域属性、空间属性等,针对数据维度的分析,需要以数据作为实体,在现有分析方法中,需要拼写复杂的代码才能完成数据各个维度的分析,导致数据维度的统计、分析运行时间长,并且需要消耗大量的人力资源。
发明内容
有鉴于此,本申请实施例提供一种数据维度生成方法、装置、设备以及计算机可读存储介质,能够在很大程度上减少维度统计、分析的耗时,并简化处理步骤。
一方面,本申请实施例提供了一种数据维度生成方法,该方法包括:
获取保存有维度数据的数据表;
解析所述数据表以得到表信息;
根据预设分析算法对所述表信息进行分析;
根据预设优化函数对所述表信息的分析结果进行优化,并生成关于所述数据表的数据维度。
另一方面,本申请实施例提供了一种数据维度生成装置,所述装置包括:
获取单元,用于获取保存有维度数据的数据表;
解析单元,用于解析所述数据表以得到表信息;
分析单元,用于根据预设分析算法对所述表信息进行分析;
优化单元,用于根据预设优化函数对所述表信息的分析结果进行优化,并生成关于所述数据表的数据维度。
又一方面,本申请实施例还提供了一种数据维度生成设备,其包括:
存储器,用于存储实现数据维度生成方法的程序;以及
处理器,用于运行所述存储器中存储的实现数据维度生成方法的程序,以执行以下操作:
获取保存有维度数据的数据表;
解析所述数据表以得到表信息;
根据预设分析算法对所述表信息进行分析;
根据预设优化函数对所述表信息的分析结果进行优化,并生成关于所述数据表的数据维度。
再一方面,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有一个或者一个以上程序,所述一个或者一个以上程序可被一个或者一个以上的处理器执行,以实现以下步骤:
获取保存有维度数据的数据表;
解析所述数据表以得到表信息;
根据预设分析算法对所述表信息进行分析;
根据预设优化函数对所述表信息的分析结果进行优化,并生成关于所述数据表的数据维度。
本申请实施例通过改进数据维度的生成方法,很大程度上减少维度统计、分析的耗时,并简化处理步骤。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种数据维度生成方法的示意流程图;
图2是本申请实施例提供的一种数据维度生成方法的示意流程图;
图3是本申请实施例提供的一种数据维度生成方法的示意流程图;
图4是本申请另一实施例提供的一种数据维度生成方法的示意流程图;
图5是本申请实施例提供的一种数据维度生成装置的示意性框图;
图6是本申请实施例提供的一种数据维度生成装置的另一示意性框图;
图7是本申请实施例提供的一种数据维度生成装置的另一示意性框图;
图8是本申请实施例提供的一种数据维度生成装置的另一示意性框图;
图9是本申请实施例提供的一种数据维度生成装置的另一示意性框图;
图10是本申请实施例提供的一种数据维度生成设备的结构组成示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。
请参阅图1,图1为本申请实施例提供的一种数据维度生成方法的示意流程图。该方法可以运行在智能手机(如Android手机、IOS手机等)、平板电脑、笔记本电脑以及智能设备等终端中。本申请实施例所述的数据维度生成方法,能够在很大程度上减少维度统计、分析的耗时,并简化处理步骤。图1是本申请实施例提供的数据维度生成方法的示意流程图。该方法包括步骤S101~S104。
S101,获取保存有维度数据的数据表。
在本申请实施例中,所谓数据表中的多维度指的是从不同的维度对数据进行观察以得到不同的结果,以便人们更加全面、清楚地认识事物的本质,常见的针对数据表的多维度分析操作主要有:钻取(上钻和下钻)、切片、切块、旋转,其中,钻取指的是改变维度的层次,变换分析的粒度;而钻取还包括上钻和下钻,上钻是在某一维上将低层次的细节数据概括到高层次的汇总数据的过程,减少了分析的维数;而下钻则是相反,它是将高层次的汇总数据进行细化,深入到低层次细节数据的过程,增加了分析的维数;切片指的是在多维分析中,如果在某一维度上限定了一个值,则称为对原有分析的一个切片;切块指的是如果对多个维度进行限定,每个维度限定为一组取值范围,则称为对原有分析的一个切块;旋转指的是,在多维分析中,维度都是按某一顺序进行显示,如果变换维度的顺序和方向,或交换两个维度的位置。
S102,解析所述数据表以得到表信息。
在本申请实施例中,所述表信息包括数据维度字段和数据深度,所述解析所述数据表以得到表信息,具体包括以下步骤:解析所述数据表以得到数据维度字段和数据深度。具体的,通过运行自动收集数据表脚本,扫描job应用对应的svn路径下的数据表以获取所述数据表,其中,该数据表可以是一个或者多个,数据表的数量在此不作限制,假设维度数据保存在数据表中,通过预设解析工具对这张表的数据进行解析,例如,预设的解析工具如CapAnalysis(可视化数据表解析工具),具体的,首先解析工具会自动扫描该数据表,并获取该数据表的关键字(例如主键),利用所获取的关键字对该数据表进行解析,可以获取保存有维度数据的数据表的表信息,例如维度字段、数据深度等。
S103,根据预设分析算法对所述表信息进行分析。
在本申请实施例中,对所解析得到的表信息进行分析,这里使用的预设分析算法包括最短路径算法和最小生成树算法,其中最短路径算法指的是从某顶点出发,沿图的边到达另一顶点所经过的路径中,各边上权值之和最小的一条路径叫做最短路径;最小生成树算指的是在连通网的所有生成树中,所有边的代价和最小的生成树,例如,利用最小生成树算法如要在n个城市之间铺设光缆,主要目标是要使这n个城市的任意两个之间都可以通信,但铺设光缆的费用很高,且各个城市之间铺设光缆的费用不同,因此另一个目标是要使铺设光缆的总费用最低。
需要说明的是,通过预设分析算法对表信息进行分析能够方便快捷的生成数据维度的各项指标,所谓指标指的是可以按总数或比值衡量的具体维度元素,其另一常用叫法为度量。例如:人口数、GDP、收入、用户数、利润率、留存率、覆盖率等。很多公司都有自己的KPI指标体系,就是通过几个关键指标来衡量公司业务运营情况的好坏。指标需要经过加和、平均等汇总计算方式得到,并且是需要在一定的前提条件进行汇总计算,如时间、地点、范围,也就是我们常说的统计口径与范围。指标可以分为绝对数指标和相对数指标,绝对数指标反映的是规模大小的指标,如人口数、GDP、收入、用户数,而相对数指标主要用来反映质量好坏的指标,如利润率、留存率、覆盖率等。我们分析一个事物发展程度就可以从数量跟质量两个角度入手分析,以全面衡量事物发展程度。
对于数据而言,数据本身是存在欺骗性的,比如A公司本月的销售额是600W,另外两家最大竞品B公司和C公司分别是300W和250W,B公司和C公司的竟品总额加起来表面上没有A公司的销售额大,但是事实上,A公司上个月的销售额为800W,竞品分别是200W和180W,从数据看上去很漂亮,但是按对比和比率来讲A公司远远下降了。因此,对于数据分析来说,最好一定是以比率的形式存在,要有对比性质的相对数据,单纯一个或几个数据情况是没有意义的,点连成线,线构成面去展示。最后,通过所生成的数据维度的各项指标的展示分析维度数据完整等一系列分析执行,并产生分析结果。
进一步地,如图2所示,所述预设分析算法包括最短路径算法和最小生成树算法,所述步骤S103包括步骤S201~S202。
S201,根据最短路径算法对所述表信息进行分析以得到所述表信息的单源、无负权最短路径。
在本申请实施例中,所谓最短路径算法指的是从某顶点出发,沿图的边到达另一顶点所经过的路径中,各边上权值之和最小的一条路径,利用最短路径算法对所述表信息进行分析,将表信息抽象成路径图,可以选择路径图中任意一点为顶点,通过分析该顶点到路径图中终点的最短路径,可以得到表信息的单源、无负权最短路径。
S202,根据最小生成树算法对所述表信息进行分析以得到所述表信息的多源、负权最短路径。
最小生成树算指的是在连通网的所有生成树中,所有边的代价和最小的生成树,利用最小生成树算法对所述表信息进行分析,将表信息抽象成生成树,可以选择生成树中任意一边作为起始边,通过分析该起始边到目标边的最短路径,可以得到表信息的多源、负权最短路径。例如,在实际使用中,利用最小生成树算法如要在n个城市之间铺设光缆,主要目标是要使这n个城市的任意两个之间都可以通信,但铺设光缆的费用很高,且各个城市之间铺设光缆的费用不同,因此另一个目标是要使铺设光缆的总费用最低。
S104,根据预设优化函数对所述表信息的分析结果进行优化,并生成关于所述数据表的数据维度。
在本申请实施例中,所述预设优化函数指的是使用动态规则对分析结果进行优化的方法,该动态规则具有一个标准的数学表达式和明确清晰的解题方法,动态规划往往是针对一种最优化问题,由于各种问题的性质不同,确定最优解的条件也互不相同,因而动态规划的方法对不同的问题,有各具特色的解题方法,而不存在一种万能的动态规划算法,可以解决各类最优化问题。必须具体问题具体分析处理,以丰富的想象力去建立模型,用创造性的技巧去求解。在实际使用时,具体的优化过程可以是:(1)确定问题的决策对象;(2)对决策过程划分阶段;(3)对各阶段确定状态变量;(4)根据状态变量确定费用函数和目标函数;(5)建立各阶段状态变量的转移过程,确定状态转移方程。
进一步地,如图3所示,所述步骤S104包括步骤S301~S304。
S301,对所述分析结果划分优化阶段。
在本申请实施例中,将对所述表信息的分析结果作为用于优化的决策对象,针对该决策对象划分优化阶段,例如,将优化过程按预设的次序划分成若干相互联系的阶段,以便能按一定次序去优化,如可以按空间次序划分为A-B-C-D-E 4个阶段等等。
S302,确定所述各优化阶段的状态变量。
在本申请实施例中,在多阶段决策过程中,每阶段都需要作出决策,而决策是根据***所处情况决定的。状态是描述***情况所必需的信息。例如,在所划分的4个阶段中,每阶段的出发点位置就是状态,一般地,状态可以用一个变量来描述,称为状态变量,实际应用时,记第k阶段的状态变量为X k,其中k=1,2,...,n。
S303,根据所述状态变量确定费用函数和目标函数。
在本申请实施例中,费用函数和目标函数用来衡量优化过程优劣的指标。状态变量X k在阶段k的状态下执行决策,不仅带来***状态的转移,而且也必然对费用函数和目标函数给予影响,阶段效应就是执行阶段决策时给费用函数和目标函数的影响。其中,费用函数可以为各阶段目标函数的和,即:R=r1(x1,u1)+r2(x2,u2)+...+rn(xn,un);目标函数可以为各阶段目标函数的积,即:R=r1(x1,u1)*r2(x2,u2)*...*rn(xn,un)。
S304,根据所述费用函数和目标函数建立各阶段状态变量的转移过程,并确定状态转移方程以对分析结果进行优化。
在本申请实施例中,在多阶段决策过程中,如果给定了k阶段的状态变量X k和决策变量U k,则第k+1阶段的状态变量X k+1也会随之而确定,也就是说X k+1是X k和U k的函数,这种关系可记为X k+1=T(X k,U k)称之为状态转移方程。
由以上可见,本申请实施例通过获取保存有维度数据的数据表;解析所述数据表以得到表信息;根据预设分析算法对所述表信息进行分析;根据预设优化函数对所述表信息的分析结果进行优化,并生成关于所述数据表的数据维度。本申请实施例通过改进数据维度的生成方法,很大程度上减少维度统计、分析的耗时,并简化处理步骤。
请参阅图4,图4是本申请实施例提供的一种数据维度生成方法的示意流程图。该方法可以运行在智能手机(如Android手机、IOS手机等)、平板电脑、笔记本电脑以及智能设备等终端中。如图4所示,该方法包括步骤S401~S405。
S401,获取保存有维度数据的数据表。
在本申请实施例中,所谓数据表的中的多维度指的是从不同的维度对数据进行观察以得到不同的结果,以便人们更加全面、清楚地认识事物的本质,常见的针对数据表的多维度分析操作主要有:钻取(上钻和下钻)、切片、切块、旋转,其中,钻取指的是改变维度的层次,变换分析的粒度;而钻取还包括上钻和下钻,上钻是在某一维上将低层次的细节数据概括到高层次的汇总数据的过程,减少了分析的维数;而下钻则是相反,它是将高层次的汇总数据进行细化,深入到低层次细节数据的过程,增加了分析的维数;切片指的是在多维分析中,如果在某一维度上限定了一个值,则称为对原有分析的一个切片;切块指的是如果对多个维度进行限定,每个维度限定为一组取值范围,则称为对原 有分析的一个切块;旋转指的是,在多维分析中,维度都是按某一顺序进行显示,如果变换维度的顺序和方向,或交换两个维度的位置。
S402,解析所述数据表以得到表信息。
在本申请实施例中,所述表信息包括数据维度字段和数据深度,所述解析所述数据表以得到表信息,具体包括以下步骤:解析所述数据表以得到数据维度字段和数据深度。具体的,通过运行自动收集数据表脚本,扫描job应用对应的svn路径下的数据表以获取所述数据表,其中,该数据表可以是一个或者多个,数据表的数量在此不作限制,假设维度数据保存在数据表中,通过预设解析工具对这张表的数据进行解析,例如,预设的解析工具如CapAnalysis(可视化数据表解析工具),具体的,首先解析工具会自动扫描该数据表,并获取该数据表的关键字(例如主键),利用所获取的关键字对该数据表进行解析,可以获取保存有维度数据的数据表的表信息,例如维度字段、数据深度等。
S403,根据预设分析算法对所述表信息进行分析。
在本申请实施例中,对所解析得到的表信息进行分析,这里使用的预设分析算法包括最短路径算法和最小生成树算法,其中最短路径算法指的是从某顶点出发,沿图的边到达另一顶点所经过的路径中,各边上权值之和最小的一条路径叫做最短路径;最小生成树算指的是在连通网的所有生成树中,所有边的代价和最小的生成树,例如,利用最小生成树算法如要在n个城市之间铺设光缆,主要目标是要使这n个城市的任意两个之间都可以通信,但铺设光缆的费用很高,且各个城市之间铺设光缆的费用不同,因此另一个目标是要使铺设光缆的总费用最低。
需要说明的是,通过预设分析算法对表信息进行分析能够方便快捷的生成数据维度的各项指标,所谓指标指的是可以按总数或比值衡量的具体维度元素,其另一常用叫法为度量。例如:人口数、GDP、收入、用户数、利润率、留存率、覆盖率等。很多公司都有自己的KPI指标体系,就是通过几个关键指标来衡量公司业务运营情况的好坏。指标需要经过加和、平均等汇总计算方式得到,并且是需要在一定的前提条件进行汇总计算,如时间、地点、范围,也就是我们常说的统计口径与范围。指标可以分为绝对数指标和相对数指标,绝对数指标反映的是规模大小的指标,如人口数、GDP、收入、用户数,而相对数指标主要用来反映质量好坏的指标,如利润率、留存率、覆盖率等。我们分析一个 事物发展程度就可以从数量跟质量两个角度入手分析,以全面衡量事物发展程度。
对于数据而言,数据本身是存在欺骗性的,比如A公司本月的销售额是600W,另外两家最大竞品B公司和C公司分别是300W和250W,B公司和C公司的竟品总额加起来表面上没有A公司的销售额大,但是事实上,A公司上个月的销售额为800W,竞品分别是200W和180W,从数据看上去很漂亮,但是按对比和比率来讲A公司远远下降了。因此,对于数据分析来说,最好一定是以比率的形式存在,要有对比性质的相对数据,单纯一个或几个数据情况是没有意义的,点连成线,线构成面去展示。最后,通过所生成的数据维度的各项指标的展示分析维度数据完整等一系列分析执行,并产生分析结果。
S404,根据预设优化函数对所述表信息的分析结果进行优化,并生成关于所述数据表的数据维度。
在本申请实施例中,所述预设优化函数指的是使用动态规则对分析结果进行优化的方法,该动态规则具有一个标准的数学表达式和明确清晰的解题方法,动态规划往往是针对一种最优化问题,由于各种问题的性质不同,确定最优解的条件也互不相同,因而动态规划的方法对不同的问题,有各具特色的解题方法,而不存在一种万能的动态规划算法,可以解决各类最优化问题。必须具体问题具体分析处理,以丰富的想象力去建立模型,用创造性的技巧去求解。在实际使用时,具体的优化过程可以是:(1)确定问题的决策对象;(2)对决策过程划分阶段;(3)对各阶段确定状态变量;(4)根据状态变量确定费用函数和目标函数;(5)建立各阶段状态变量的转移过程,确定状态转移方程。
S405,将所述数据表的数据维度按照相关性大小进行排序。
在本申请实施例中,在维度相关性分析的基础上,我们将维度按照相关性大小进行排序,使得相邻之间的维度有较强的相关性。我们首先需要选择一个维度放在序列的第一个位置,然后找与其最相关的维度放在其后,以此类推,直到所有的维度排列完成。
请参阅图5,对应上述一种数据维度生成方法,本申请实施例还提出一种数据维度生成装置,该装置100包括:获取单元101、解析单元102、分析单元103、优化单元104。
其中,所述获取单元101,用于获取保存有维度数据的数据表。
解析单元102,用于解析所述数据表以得到表信息。
分析单元103,用于根据预设分析算法对所述表信息进行分析。
优化单元104,用于根据预设优化函数对所述表信息的分析结果进行优化,并生成关于所述数据表的数据维度。
由以上可见,本申请实施例获取保存有维度数据的数据表;解析所述数据表以得到表信息;根据预设分析算法对所述表信息进行分析;根据预设优化函数对所述表信息的分析结果进行优化,并生成关于所述数据表的数据维度。本申请实施例通过改进数据维度的生成方法,很大程度上减少维度统计、分析的耗时,并简化处理步骤。
如图6所示,所述表信息包括数据维度字段和数据深度,所述解析单元102,包括:
解析子单元1021,用于解析所述数据表以得到数据维度字段和数据深度。
如图7所示,所述预设分析算法包括最短路径算法和最小生成树算法,所述分析单元103,包括:
第一分析子单元1031,用于根据最短路径算法对所述表信息进行分析以得到所述表信息的单源、无负权最短路径。
第二分析子单元1032,用于根据最小生成树算法对所述表信息进行分析以得到所述表信息的多源、负权最短路径。
如图8所示,所述优化单元104,包括:
划分单元1041,用于对所述分析结果划分优化阶段。
第一确定单元1042,用于确定所述各优化阶段的状态变量。
第二确定单元1043,用于根据所述状态变量确定费用函数和目标函数。
建立单元1044,用于根据所述费用函数和目标函数建立各阶段状态变量的转移过程,并确定状态转移方程以对分析结果进行优化。
由以上可见,本申请实施例通过获取保存有维度数据的数据表;解析所述数据表以得到表信息;根据预设分析算法对所述表信息进行分析;根据预设优化函数对所述表信息的分析结果进行优化,并生成关于所述数据表的数据维度。本申请实施例通过改进数据维度的生成方法,很大程度上减少维度统计、分析的耗时,并简化处理步骤。
请参阅图9,对应上述一种数据维度生成方法,本申请实施例还提出一种数 据维度生成装置,该装置200包括:获取单元201、解析单元202、分析单元203、优化单元204、排序单元205。
其中,所述获取单元201,用于获取保存有维度数据的数据表。
解析单元202,用于解析所述数据表以得到表信息。
分析单元203,用于根据预设分析算法对所述表信息进行分析。
优化单元204,用于根据预设优化函数对所述表信息的分析结果进行优化,并生成关于所述数据表的数据维度。
排序单元205,用于将所述数据表的数据维度按照相关性大小进行排序。
上述数据维度生成装置可以实现为一种计算机程序的形式,计算机程序可以在如图10所示的设备上运行。
图10为本申请一种数据维度生成设备的结构组成示意图。该设备可以是终端,也可以是服务器,其中,终端可以是智能手机、平板电脑、笔记本电脑、台式电脑、个人数字助理和穿戴式装置等具有通信功能的电子装置。服务器可以是独立的服务器,也可以是多个服务器组成的服务器集群。参照图10,该计算机设备500包括通过***总线501连接的处理器502、非易失性存储介质503、内存储器504和网络接口505。其中,该计算机设备500的非易失性存储介质503可存储操作***5031和计算机程序5032,该计算机程序5032被执行时,可使得处理器502执行一种数据维度生成方法。该计算机设备500的处理器502用于提供计算和控制能力,支撑整个计算机设备500的运行。该内存储器504为非易失性存储介质503中的计算机程序5032的运行提供环境,该计算机程序被处理器执行时,可使得处理器502执行一种数据维度生成方法。计算机设备500的网络接口505用于进行网络通信,如发送分配的任务等。本领域技术人员可以理解,图10中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
其中,所述处理器502执行如下操作:
获取保存有维度数据的数据表;
解析所述数据表以得到表信息;
根据预设分析算法对所述表信息进行分析;
根据预设优化函数对所述表信息的分析结果进行优化,并生成关于所述数据表的数据维度。
在一个实施例中,所述表信息包括数据维度字段和数据深度,所述解析所述数据表以得到表信息包括:
解析所述数据表以得到数据维度字段和数据深度。
在一个实施例中,所述预设分析算法包括最短路径算法和最小生成树算法,所述根据预设分析算法对所述表信息进行分析,包括:
根据最短路径算法对所述表信息进行分析以得到所述表信息的单源、无负权最短路径;
根据最小生成树算法对所述表信息进行分析以得到所述表信息的多源、负权最短路径。
在一个实施例中,所述根据预设优化函数对所述表信息的分析结果进行优化,包括:
对所述分析结果划分优化阶段;
确定所述各优化阶段的状态变量;
根据所述状态变量确定费用函数和目标函数;
根据所述费用函数和目标函数建立各阶段状态变量的转移过程,并确定状态转移方程以对分析结果进行优化。
在一个实施例中,所述处理器502还执行如下操作:
将所述数据表的数据维度按照相关性大小进行排序。
本领域技术人员可以理解,图10中示出的数据维度生成设备的实施例并不构成对数据维度生成设备具体构成的限定,在其他实施例中,数据维度生成设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。例如,在一些实施例中,数据维度生成设备仅包括存储器及处理器,在这样的实施例中,存储器及处理器的结构及功能与图10所示实施例一致,在此不再赘述。
本申请提供了一种计算机可读存储介质,计算机可读存储介质存储有一个或者一个以上计算机程序,所述一个或者一个以上计算机程序可被一个或者一个以上的处理器执行,以实现上述数据维度生成方法。
本申请前述的存储介质包括:磁碟、光盘、只读存储记忆体(Read-Only  Memory,ROM)等各种可以存储程序代码的介质。
本申请所有实施例中的单元可以通过通用集成电路,例如CPU(Central Processing Unit,中央处理器),或通过ASIC(Application Specific Integrated Circuit,专用集成电路)来实现。
本申请实施例数据维度生成方法中的步骤可以根据实际需要进行顺序调整、合并和删减。
本申请实施例数据维度生成装置中的单元可以根据实际需要进行合并、划分和删减。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (20)

  1. 一种数据维度生成方法,其特征在于,所述方法包括:
    获取保存有维度数据的数据表;
    解析所述数据表以得到表信息;
    根据预设分析算法对所述表信息进行分析;
    根据预设优化函数对所述表信息的分析结果进行优化,并生成关于所述数据表的数据维度。
  2. 如权利要求1所述的方法,其特征在于,所述表信息包括数据维度字段和数据深度,所述解析所述数据表以得到表信息包括:
    解析所述数据表以得到数据维度字段和数据深度。
  3. 如权利要求1所述的方法,其特征在于,所述预设分析算法包括最短路径算法和最小生成树算法,所述根据预设分析算法对所述表信息进行分析,包括:
    根据最短路径算法对所述表信息进行分析以得到所述表信息的单源、无负权最短路径;
    根据最小生成树算法对所述表信息进行分析以得到所述表信息的多源、负权最短路径。
  4. 如权利要求1所述的方法,其特征在于,所述根据预设优化函数对所述表信息的分析结果进行优化,包括:
    对所述分析结果划分优化阶段;
    确定所述各优化阶段的状态变量;
    根据所述状态变量确定费用函数和目标函数;
    根据所述费用函数和目标函数建立各阶段状态变量的转移过程,并确定状态转移方程以对分析结果进行优化。
  5. 如权利要求1所述的方法,其特征在于,所述方法还包括:
    将所述数据表的数据维度按照相关性大小进行排序。
  6. 一种数据维度生成装置,其特征在于,所述装置包括:
    获取单元,用于获取保存有维度数据的数据表;
    解析单元,用于解析所述数据表以得到表信息;
    分析单元,用于根据预设分析算法对所述表信息进行分析;
    优化单元,用于根据预设优化函数对所述表信息的分析结果进行优化,并生成关于所述数据表的数据维度。
  7. 如权利要求6所述的装置,其特征在于,所述表信息包括数据维度字段和数据深度,所述解析单元,包括:
    解析子单元,用于解析所述数据表以得到数据维度字段和数据深度。
  8. 如权利要求6所述的装置,其特征在于,所述预设分析算法包括最短路径算法和最小生成树算法,所述分析单元,包括:
    第一分析子单元,用于根据最短路径算法对所述表信息进行分析以得到所述表信息的单源、无负权最短路径;
    第二分析子单元,用于根据最小生成树算法对所述表信息进行分析以得到所述表信息的多源、负权最短路径。
  9. 如权利要求6所述的装置,其特征在于,所述优化单元,包括:
    划分单元,用于对所述分析结果划分优化阶段;
    第一确定单元,用于确定所述各优化阶段的状态变量;
    第二确定单元,用于根据所述状态变量确定费用函数和目标函数;
    建立单元,用于根据所述费用函数和目标函数建立各阶段状态变量的转移过程,并确定状态转移方程以对分析结果进行优化。
  10. 如权利要求6所述的装置,其特征在于,所述装置还包括:
    排序单元,用于将所述数据表的数据维度按照相关性大小进行排序。
  11. 一种数据维度生成设备,其特征在于,包括:
    存储器,用于存储实现数据维度生成方法的程序;以及
    处理器,用于运行所述存储器中存储的实现数据维度生成方法的程序,以执行以下操作:
    获取保存有维度数据的数据表;
    解析所述数据表以得到表信息;
    根据预设分析算法对所述表信息进行分析;
    根据预设优化函数对所述表信息的分析结果进行优化,并生成关于所述数据表的数据维度。
  12. 如权利要求11所述的设备,其特征在于,所述表信息包括数据维度字段和数据深度,所述解析所述数据表以得到表信息包括:
    解析所述数据表以得到数据维度字段和数据深度。
  13. 如权利要求11所述的设备,其特征在于,所述预设分析算法包括最短路径算法和最小生成树算法,所述根据预设分析算法对所述表信息进行分析,包括:
    根据最短路径算法对所述表信息进行分析以得到所述表信息的单源、无负权最短路径;
    根据最小生成树算法对所述表信息进行分析以得到所述表信息的多源、负权最短路径。
  14. 如权利要求11所述的设备,其特征在于,所述根据预设优化函数对所述表信息的分析结果进行优化,包括:
    对所述分析结果划分优化阶段;
    确定所述各优化阶段的状态变量;
    根据所述状态变量确定费用函数和目标函数;
    根据所述费用函数和目标函数建立各阶段状态变量的转移过程,并确定状态转移方程以对分析结果进行优化。
  15. 如权利要求11所述的设备,其特征在于,所述处理器还执行以下操作:
    将所述数据表的数据维度按照相关性大小进行排序。
  16. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有一个或者一个以上计算机程序,所述一个或者一个以上计算机程序可被一个或者一个以上的处理器执行,以实现以下步骤:
    获取保存有维度数据的数据表;
    解析所述数据表以得到表信息;
    根据预设分析算法对所述表信息进行分析;
    根据预设优化函数对所述表信息的分析结果进行优化,并生成关于所述数据表的数据维度。
  17. 如权利要求16所述的计算机可读存储介质,其特征在于,所述表信息包括数据维度字段和数据深度,所述解析所述数据表以得到表信息包括:
    解析所述数据表以得到数据维度字段和数据深度。
  18. 如权利要求16所述的计算机可读存储介质,其特征在于,所述预设分析算法包括最短路径算法和最小生成树算法,所述根据预设分析算法对所述表信 息进行分析,包括:
    根据最短路径算法对所述表信息进行分析以得到所述表信息的单源、无负权最短路径;
    根据最小生成树算法对所述表信息进行分析以得到所述表信息的多源、负权最短路径。
  19. 如权利要求16所述的计算机可读存储介质,其特征在于,所述根据预设优化函数对所述表信息的分析结果进行优化,包括:
    对所述分析结果划分优化阶段;
    确定所述各优化阶段的状态变量;
    根据所述状态变量确定费用函数和目标函数;
    根据所述费用函数和目标函数建立各阶段状态变量的转移过程,并确定状态转移方程以对分析结果进行优化。
  20. 如权利要求16所述的计算机可读存储介质,其特征在于,所述步骤还包括:
    将所述数据表的数据维度按照相关性大小进行排序。
PCT/CN2018/085258 2018-02-09 2018-05-02 数据维度生成方法、装置、设备以及计算机可读存储介质 WO2019153543A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810135918.5A CN108415981B (zh) 2018-02-09 2018-02-09 数据维度生成方法、装置、设备以及计算机可读存储介质
CN201810135918.5 2018-02-09

Publications (1)

Publication Number Publication Date
WO2019153543A1 true WO2019153543A1 (zh) 2019-08-15

Family

ID=63128191

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/085258 WO2019153543A1 (zh) 2018-02-09 2018-05-02 数据维度生成方法、装置、设备以及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN108415981B (zh)
WO (1) WO2019153543A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112685674A (zh) * 2020-12-30 2021-04-20 百果园技术(新加坡)有限公司 一种影响用户留存的特征评估方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102135994A (zh) * 2011-03-17 2011-07-27 新太科技股份有限公司 一种基于olap的智能分析方法
CN106997386A (zh) * 2017-03-28 2017-08-01 上海跬智信息技术有限公司 一种olap预计算模型、自动建模方法及自动建模***
US20170286502A1 (en) * 2015-12-22 2017-10-05 Opera Solutions Usa, Llc System and Method for Interactive Reporting in Computerized Data Modeling and Analysis
CN107657010A (zh) * 2017-09-25 2018-02-02 南京市城市与交通规划设计研究院股份有限公司 机动车数据分析***及方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8566360B2 (en) * 2010-05-28 2013-10-22 Drexel University System and method for automatically generating systematic reviews of a scientific field
CN104282026B (zh) * 2014-10-24 2017-06-13 上海交通大学 基于分水岭算法和最小生成树的分布均匀性评估方法
CN104573039A (zh) * 2015-01-19 2015-04-29 北京航天福道高技术股份有限公司 一种关系数据库的关键词查询方法
CN106548206B (zh) * 2016-10-27 2019-08-02 太原理工大学 基于最小生成树的多模态磁共振影像数据分类方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102135994A (zh) * 2011-03-17 2011-07-27 新太科技股份有限公司 一种基于olap的智能分析方法
US20170286502A1 (en) * 2015-12-22 2017-10-05 Opera Solutions Usa, Llc System and Method for Interactive Reporting in Computerized Data Modeling and Analysis
CN106997386A (zh) * 2017-03-28 2017-08-01 上海跬智信息技术有限公司 一种olap预计算模型、自动建模方法及自动建模***
CN107657010A (zh) * 2017-09-25 2018-02-02 南京市城市与交通规划设计研究院股份有限公司 机动车数据分析***及方法

Also Published As

Publication number Publication date
CN108415981B (zh) 2020-10-09
CN108415981A (zh) 2018-08-17

Similar Documents

Publication Publication Date Title
WO2022126971A1 (zh) 基于密度的文本聚类方法、装置、设备及存储介质
US8527865B2 (en) Spreadsheet formula translation of server calculation rules
US9043348B2 (en) System and method for performing set operations with defined sketch accuracy distribution
CN111177231A (zh) 报表生成方法和报表生成装置
WO2023124029A1 (zh) 深度学习模型的训练方法、内容推荐方法和装置
EP3356951B1 (en) Managing a database of patterns used to identify subsequences in logs
TW201915777A (zh) 金融非結構化文本分析系統及其方法
CN113326420B (zh) 问题检索方法、装置、电子设备和介质
CN109901987A (zh) 一种生成测试数据的方法和装置
Wang et al. Improved confidence intervals for estimating the position of a mass extinction boundary
US20190050379A1 (en) Method for providing data management service having automatic cell merging function and service providing server for performing the method
CN111475588A (zh) 数据处理方法及装置
CN110309293A (zh) 文本推荐方法和装置
KR20190020801A (ko) 분산 컴퓨팅 프레임워크 및 분산 컴퓨팅 방법
EP3961433A2 (en) Data annotation method and apparatus, electronic device and storage medium
US11361195B2 (en) Incremental update of a neighbor graph via an orthogonal transform based indexing
CN112148841B (zh) 一种对象分类以及分类模型构建方法和装置
WO2019153543A1 (zh) 数据维度生成方法、装置、设备以及计算机可读存储介质
JP2019528522A (ja) タスク処理方法及び分散コンピューティングフレームワーク
CN115344305A (zh) 分析微服务架构下函数调用关系的方法及装置
US11036710B2 (en) Scalable selection management
CN114139798A (zh) 企业风险预测方法、装置及电子设备
CN109857838B (zh) 用于生成信息的方法和装置
CN113505131A (zh) 调整数据排序的方法和装置
CN112966031A (zh) 数据处理方法、装置、电子设备及计算机可读存储介质

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 04.11.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18905837

Country of ref document: EP

Kind code of ref document: A1