CN114969161A - Data processing method and device and data center system - Google Patents

Data processing method and device and data center system Download PDF

Info

Publication number
CN114969161A
CN114969161A CN202210719522.1A CN202210719522A CN114969161A CN 114969161 A CN114969161 A CN 114969161A CN 202210719522 A CN202210719522 A CN 202210719522A CN 114969161 A CN114969161 A CN 114969161A
Authority
CN
China
Prior art keywords
data
service
type
business
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210719522.1A
Other languages
Chinese (zh)
Other versions
CN114969161B (en
Inventor
李翛然
于琳琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202210719522.1A priority Critical patent/CN114969161B/en
Publication of CN114969161A publication Critical patent/CN114969161A/en
Application granted granted Critical
Publication of CN114969161B publication Critical patent/CN114969161B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The disclosure provides a data middling station system, a data processing method and a data processing device, and relates to the technical field of big data. A specific implementation scheme of the data center station system is as follows: receiving at least one preset type of business data related to at least one industry; responding to the fact that certain preset type of service data meets the corresponding standard, obtaining standard data corresponding to the preset type of service data, and fusing the standard data corresponding to at least one preset type of service data to obtain an access data packet; storing an access data packet by adopting a pre-constructed data warehouse, processing the access data packet to obtain processed data, and calculating and counting the processed data to obtain at least one business index of an industry; and displaying different application strategies for different business objects based on the processing data and the business indexes. The embodiment breaks the problem of data isolated island and improves the efficiency and accuracy of data output.

Description

Data processing method and device and data center system
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to the field of big data and other technologies, and in particular, to a data processing method and apparatus, a data console system, an electronic device, a computer-readable storage medium, and a computer program product.
Background
The data center station is a concept, namely, business datamation and data business are achieved, data and business are communicated really, the data center station is connected with a data foreground and a data background, data limitation is broken through, more flexible, efficient and low-cost data analysis and mining services are provided for enterprises, and the problem that the enterprises put in a large amount of high-cost and repetitive data development cost for meeting certain data analysis requirements of a certain specific department is avoided.
The existing data center system generally interacts with upstream data among a plurality of businesses, extracts general attributes among all industries to extract, store, manage and apply the data, and provides output of all general indexes to the downstream.
Disclosure of Invention
The present disclosure provides a data center system, a data processing method and apparatus, an electronic device, a computer-readable storage medium, and a computer program product.
According to a first aspect, there is provided a data processing method, the method comprising: receiving at least one preset type of business data related to at least one industry; responding to the fact that certain preset type of service data meets the corresponding standard, obtaining standard data corresponding to the preset type of service data, and fusing the standard data corresponding to at least one preset type of service data to obtain an access data packet; storing an access data packet by adopting a pre-constructed data warehouse, processing the access data packet to obtain processed data, and calculating and counting the processed data to obtain at least one business index of an industry; and displaying different application strategies for different business objects based on the processing data and the business indexes.
According to a second aspect, there is provided a data processing apparatus comprising: a receiving unit configured to receive at least one preset type of service data related to at least one industry; the association unit is configured to respond to the fact that certain preset type of service data meets the corresponding standard, obtain standard data corresponding to the preset type of service data, and fuse the standard data corresponding to at least one preset type of service data to obtain an access data packet; the computing unit is configured to store the access data packet by adopting a pre-constructed data warehouse, process the access data packet to obtain processed data, and obtain at least one business index of an industry through calculation and statistics of the processed data; and the display unit is configured to display different application strategies for different business objects based on the processing data and the business indexes.
According to a third aspect, there is provided a data middlebox system, comprising: the data access module receives at least one preset type of service data related to at least one industry, obtains standard data corresponding to the preset type of service data in response to the fact that one type of service data meets a corresponding standard, and fuses the standard data corresponding to the at least one preset type of service data to obtain an access data packet; the data management module is used for storing the access data packet by adopting a pre-constructed data warehouse, processing the access data packet to obtain processed data, and calculating and counting the processed data to obtain at least one business index of an industry; and the data application module is used for displaying different application strategies on the processing data and the service indexes aiming at different service objects.
According to a fourth aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively connected to the at least one processor, wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the method as described in any one of the implementations of the first aspect.
According to a fifth aspect, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method as described in any one of the implementations of the first aspect.
According to a sixth aspect, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method as described in any of the implementations of the first aspect.
The data processing method provided by the embodiment of the disclosure includes the steps of firstly, receiving at least one preset type of service data related to at least one industry; secondly, in response to the fact that certain preset type of service data meets the corresponding standard, obtaining standard data corresponding to the preset type of service data, and fusing the standard data corresponding to at least one preset type of service data to obtain an access data packet; thirdly, storing the access data packet by adopting a pre-constructed data warehouse, processing the access data packet to obtain processed data, and calculating and counting the processed data to obtain at least one business index of the industry; and finally, displaying different application strategies for different business objects based on the processing data and the business indexes. Therefore, the unique characteristics of different businesses in each industry are reserved through the business indexes of each industry generated by the data warehouse; the data access of various service data can quickly and accurately realize the infrastructure of a service data circulation system, so that the data can be conveniently and low-cost multiplexed in a plurality of industry fields; different application strategies are provided for different business objects, reliable guidance is provided for positioning and development of products, and more data and manpower support is provided for mining deep values of data.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a flow diagram of one embodiment of a data processing method according to the present disclosure;
FIG. 2 is a schematic block diagram of one embodiment of a data processing apparatus according to the present disclosure;
FIG. 3 is a block diagram illustrating one embodiment of a station system according to the present disclosure;
FIG. 4 is a schematic block diagram of one embodiment of a data management module according to the present disclosure;
fig. 5 is a block diagram of an electronic device for implementing a data processing method of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
FIG. 1 shows a flow 100 of one embodiment of a data processing method according to the present disclosure, the data processing method comprising the steps of:
step 101, receiving at least one preset type of business data related to at least one industry.
In this embodiment, the business data is data related to businesses of industries, where at least one industry may be one industry or multiple industries, and for example, the industry may be a hotel, a movie ticket, an express delivery, a education program, a visiting service, a living service, and the like. Businesses are different services in different industries, for example, lodging businesses in hotels.
In this embodiment, the execution main body on which the data processing method operates may obtain the preset type of service data in a plurality of ways, for example, the execution main body may obtain the service data stored therein from the database server in a wired connection way or a wireless connection way. For another example, the user may obtain, in real time, at least one preset type of service data related to at least one industry collected by the terminal by communicating with the terminal.
In this embodiment, the preset type of service data may be determined based on a service, for example, the preset type includes: log type data or dimensional data.
In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the related business data all conform to the regulations of related laws and regulations and do not violate the customs of the public order.
Step 102, in response to that a certain preset type of service data meets a corresponding standard, obtaining standard data corresponding to the preset type of service data, and fusing the standard data corresponding to at least one preset type of service data to obtain an access data packet.
In this embodiment, each preset type of data has its own specification standard, that is, the service data corresponds to one specification standard, and the specification standard of each preset type of data may be extracted based on the basic features of the preset type, for example, the log-type data is data describing user behavior, and reflects the features of time and space information based on the log-type data, and the specification standard of the log-type data needs to satisfy the time and place requirements and other information.
In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the data of the related user behaviors are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.
In this embodiment, the corresponding standard judgment is performed on each preset type of service data in the service data to obtain the standard data corresponding to the preset type of service data, so that access of junk data can be reduced, the data format is unified, the problem of data isolation is solved, and the related service data having an association relationship in the standard data are fused together, so that the heterogeneous data of multiple sources can be unified together, the problem of data islanding is solved, the development efficiency is improved, and the data communication cost of developers is reduced.
In this embodiment, when collecting the service data, the service data itself carries the dimension information, and the dimension information is used to reflect the service characteristics of the service data. Therefore, after the log-type data meet the specification standard corresponding to the log-type data, the dimension information in the log-type data is extracted, the log-type data and the dimension information are fused, and the information of the log-type data can be effectively represented in multiple aspects.
In this embodiment, the following four objectives can be achieved by setting the specification standard for various preset types of data in the service data: (1) the data is complete and accurate: the data coverage can meet the requirements of different services, and the output data is accurate and credible. (2) The output is stable and timely: the resource allocation is reasonable and the calculation efficiency is high. (3) The operation and maintenance are convenient and efficient: monitoring alarm and daily task maintenance are convenient and efficient, and metadata coverage is complete. (4) Simple and convenient to use: the output data result is simple and easy to use, and the data is output and convenient to query.
In this embodiment, the service data of at least one industry does not distinguish the industry any more, but performs the criterion judgment according to the data type, and the various types of data that satisfy the criterion are performed downstream of the data intermediate bin, and when any type of data in the service data does not satisfy the corresponding criterion, the current type of data may be discarded or a request may be sent upstream to cause the upstream to resend the data that satisfy the criterion.
And 103, storing the access data packet by adopting a pre-constructed data warehouse, processing the access data packet to obtain processed data, and calculating and counting the processed data to obtain at least one business index of the industry.
In this embodiment, the data warehouse may store and manage the access data, form a service theme table for the services of each industry in at least one industry, and calculate the service index of each service in each industry based on the data stored in the data warehouse and the service theme table.
And 104, displaying different application strategies for different business objects based on the processing data and the business indexes.
In this embodiment, the different service objects are objects focusing on observing different aspects of the service, for example, the service object may be a service product, and the application policy display on the service product may be a BI report generated based on the stored data and the service index, so that the service object corresponding to the service product can observe the current status of the service product in real time.
The business object may also be a person, and the application policy display for the person may be based on the stored data and the business index, to generate a monitoring scene corresponding to different persons, for example, a monitoring scene of abnormal data, a monitoring scene of a test scene, or the like.
The data processing method provided by the embodiment of the disclosure comprises the steps of firstly, receiving at least one preset type of service data related to at least one industry; secondly, in response to that certain preset type of service data meets the corresponding standard, obtaining standard data corresponding to the preset type of service data, and fusing the standard data corresponding to at least one preset type of service data to obtain an access data packet; thirdly, storing the access data packet by adopting a pre-constructed data warehouse, processing the access data packet to obtain processed data, and calculating and counting the processed data to obtain at least one business index of the industry; and finally, displaying different application strategies for different business objects based on the processing data and the business indexes. Therefore, the unique characteristics of different businesses in each industry are reserved through the business indexes of each industry generated by the data warehouse; the data access of various service data can quickly and accurately realize the infrastructure of a service data circulation system, so that the data can be conveniently and low-cost multiplexed in a plurality of industry fields; different application strategies are provided for different business objects, reliable guidance is provided for positioning and development of products, and more data and manpower support is provided for mining deep values of data.
In another embodiment of the present disclosure, the data processing method further includes: and supplementing the service data of the type based on the attribute of the service data of the type in response to the condition that the service data of a certain preset type does not meet the corresponding standard.
In the data processing method provided in this embodiment, when at least one preset type of service data in the service data does not satisfy the corresponding specification standard, the service data of the type is supplemented based on the attribute of the service data of the type.
When the preset type of service data is log-type data, and at least one preset type of data in the service data does not meet the corresponding specification standard, based on the attribute of the type of service data, the supplementing of the type of service data includes: and receiving log type data, and when the precision of the log type data is not session granularity, using the log data of a preset time period or a preset time point as the data for supplementing the log type through the attribute of the log type data.
Optionally, in response to that a certain preset type of service data in the service data does not satisfy the corresponding specification standard, determining a data source of the service data that does not satisfy the corresponding specification standard, and sending a request for sending data again to the data source, so that the data source outputs the service data that satisfies the specification standard.
Optionally, when a certain preset type of service data in the service data does not meet the corresponding specification standard, a third party data may be accessed, and the service data related to the preset type of data may be acquired from the third party data.
Optionally, when a certain preset type of service data in the service data does not meet the corresponding specification standard, the service data that does not meet the corresponding specification standard may also be directly discarded.
In the data processing method provided in this embodiment, when at least one preset type of service data in the service data does not meet the corresponding specification standard, the service data of the type is supplemented based on the attribute of the service data of the type, and the comprehensiveness of the data can be ensured by effectively supplementing the data.
In some optional implementations of this embodiment, the service data includes: log type data; the obtaining of the specification data corresponding to the service data of a preset type in response to that the service data of a certain preset type meets the corresponding specification standard, and fusing the specification data corresponding to the service data of at least one preset type to obtain the access data packet includes: responding to the log type data meeting the 5W standard, and detecting whether the precision of the log type data is a preset granularity or not; responding to the fact that the precision of the log type data is a preset granularity, and detecting whether the log type data meets a transmission consistency rule or not; responding to that the log data meets the transmission consistency rule, obtaining standard data, and detecting whether data related to the standard data exist or not; and responding to the data related to the specification data, and associating the data related to the specification data with the specification data to obtain an access data packet.
In this embodiment, the preset granularity is a granularity related to an industry, and preset granularities corresponding to log-type data of different industries are different, for example, in a community industry, the preset granularity is a granularity in the level of days; for example, in the e-commerce industry, the preset granularity is a session granularity, where the session granularity is a life cycle granularity.
According to the method for obtaining the access data packet provided by the optional implementation mode, whether the log-type data meet the 5W standard, the preset granularity, the transmission consistency rule and the data correlation is judged step by step for the log-type data, the log-type data with different sources can be unified, the problem of log-type data isolated island is solved, the development efficiency is improved, and the communication cost is reduced.
In some optional implementations of this embodiment, the service data includes: the service data further includes: dimension data; responding to that certain preset type of service data meets corresponding standard, obtaining standard data corresponding to the preset type of service data, fusing the standard data corresponding to at least one preset type of service data to obtain an access data packet, and further comprising: determining the richness of the dimension data; responding to the richness of the dimension data meeting a preset richness value, and detecting whether the dimension data meets a preset format; and in response to the dimensional data meeting a preset format, taking the dimensional data as one of the specification data, and enabling the dimensional data and the log-type data to use the same key value when the dimensional data and the log-type data are related.
According to the method for obtaining the access data packet, provided by the optional implementation mode, whether the dimension data meet the preset abundance, the preset format and the correlation with the log-type data is judged step by step for the dimension data, the heterogeneous dimension data with multiple sources can be unified, the problem of isolated island of the dimension data is solved, the development efficiency is improved, and the communication cost is reduced.
In some optional implementation manners of this embodiment, the obtaining, in response to that certain preset type of service data meets a corresponding specification standard, specification data corresponding to the preset type of service data, and fusing the specification data corresponding to at least one preset type of service data to obtain an access data packet further includes: key values of relevant dimension data are recorded in log type data.
According to the method for obtaining the access data packet provided by the optional implementation mode, the key values of the related dimension data are recorded in the log type data, so that when the log type data are related to the dimension data, the log type data are convenient to carry the characteristics of the dimension data, and the fusion effect of various types of data is improved.
In some optional implementation manners of this embodiment, the calculating and counting the processed data to obtain the service index of the at least one industry includes: extracting, converting and loading the processed data to obtain a basic data table; performing feature extraction and dimension aggregation on the basic data table to obtain an intermediate table; obtaining a business theme table of at least one industry based on the basic data table and the intermediate table; and calculating and displaying the business indexes of at least one industry based on the basic data table, the intermediate table and the business theme table.
Optionally, the performing data index calculation on the stored data to obtain at least one service index may further include: and storing dimension data in the access data by adopting a data dimension layer. Wherein, the data dimension layer is a hierarchical structure of the data warehouse, and the data dimension layer can store and update dimension data.
Optionally, the processing the access data packet to obtain the processing data includes: and storing and updating the access data packet by adopting a data access layer. The data access layer is a hierarchical structure of the data warehouse above a basic data layer, a middle data layer, a data subject layer and a data application layer, and can also delete expired data in the accessed data packet periodically to obtain processed data.
The method for obtaining the business indexes of each industry provided by the optional implementation mode stores and analyzes the access data in different aspects through different hierarchical structures of the data warehouse, so that the deep value of the data is mined.
In some optional implementation manners of this embodiment, the displaying different application policies for different business objects based on the processing data and the business index includes: aiming at the business product, constructing and displaying a business product report based on the processing data and the business index; determining key indexes based on business indexes aiming at engineering objects, constructing key index alarm rules, and extracting and displaying abnormal data through the alarm rules and processing data; and aiming at the strategy object, constructing a small flow test platform based on the processing data and the service index, and feeding back the test result of the test platform.
The display method for different application strategies on different business objects, provided by the optional implementation mode, is oriented on the product, improves the efficiency and accuracy of data output, assists in multidimensional data analysis, and provides guidance for positioning and development of the product; in the research and development efficiency improvement direction, abnormal data monitoring is perfected, the problem is quickly positioned, the research and development efficiency is improved, and the product quality is ensured; in the strategy effect improvement direction, the construction of an experiment management platform improves the strategy model iteration efficiency.
In some optional implementations of this embodiment, the service data includes: vehicle industry dimension data and individual behavior data; the access data comprises: vehicle appearance, vehicle type, price, individual location; the processing data includes: the system comprises an individual and vehicle appearance association table, a vehicle and price association table, an individual and price association table and a vehicle type association table, wherein the individual and vehicle appearance association table is arranged at different positions; the service indexes comprise: a service coverage distribution index and a service coverage trend index.
In this optional implementation, the processing data is intermediate processing data in the data warehouse, and the processing data may exist in a data form directly or in a table form, and in an optional implementation, the processing data includes a plurality of association tables obtained by processing based on the access data, where the association tables of the individual at different positions and the vehicle appearance refer to: associations between individuals in different locations and different vehicle appearances, e.g., vehicles with X provinces having a preference for Y appearance; the vehicle and price association table refers to the association relationship between different vehicles and different prices; the individual and price association table of different positions refers to: the incidence relation between individuals at different positions and different vehicle prices; the association table of individuals and vehicle types at different positions refers to: the association between individuals in different positions and different vehicle types, for example, the Z-province individuals are biased toward T-type vehicle types.
In this optional implementation manner, the service coverage distribution index refers to a distribution condition of different service data in a preset number of service data. The service coverage tendency index refers to a variation tendency of different service data in a preset amount of service data.
The method for obtaining the service index provided by the optional implementation mode is applied to the vehicle industry, is used for searching the service index between an individual and different service data in the vehicle industry, and improves the accuracy of vehicle service analysis.
With further reference to fig. 2, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of a data processing apparatus, which is particularly applicable in various electronic devices.
As shown in fig. 2, the data processing apparatus 200 provided in the present embodiment includes: the device comprises a receiving unit 201, an association unit 202, a calculation unit 203 and a display unit 204. The receiving unit 201 may be configured to receive at least one preset type of service data related to at least one industry. The association unit 202 may be configured to, in response to that certain preset type of service data meets a corresponding specification standard, obtain specification data corresponding to the preset type of service data, and merge the specification data corresponding to at least one preset type of service data to obtain an access data packet. The calculating unit 203 may be configured to store the access data packet by using a pre-constructed data warehouse, process the access data packet to obtain processed data, and obtain the service index of the at least one industry through calculation and statistics on the processed data. The presentation unit 204 may be configured to perform presentation of different application policies on different business objects based on the processing data and the business index.
In the present embodiment, in the data processing apparatus 200: the detailed processing of the receiving unit 201, the associating unit 202, the calculating unit 203, and the displaying unit 204 and the technical effects thereof can refer to the related descriptions of step 101, step 102, step 103, and step 104 in the corresponding embodiments of fig. 1, which are not described herein again.
In some optional implementations of this embodiment, the apparatus 200 includes: a supplementary unit (not shown in the figure). The supplementing unit may be configured to, in response to that certain preset type of service data does not satisfy a corresponding specification standard, supplement the type of service data based on an attribute of the type of service data.
In some optional implementations of this embodiment, the service data includes: log type data; the association unit 202 is further configured to: responding to the log type data meeting the 5W standard, and detecting whether the precision of the log type data is a preset granularity or not; responding to the fact that the precision of the log type data is a preset granularity, and detecting whether the log type data meets a transmission consistency rule or not; responding to that the log data meets the transmission consistency rule, obtaining standard data, and detecting whether data related to the standard data exist or not; and responding to the data related to the specification data, and associating the data related to the specification data with the specification data to obtain an access data packet.
In some optional implementation manners of this embodiment, the service data further includes: dimension data; the association unit 202 is further configured to: determining the richness of the dimension data; responding to the richness of the dimension data meeting a preset richness value, and detecting whether the dimension data meets a preset format; and in response to the dimensional data meeting a preset format, taking the dimensional data as one of the specification data, and enabling the dimensional data and the log-type data to use the same key value when the dimensional data and the log-type data are related.
In some optional implementations of the present embodiment, the associating unit 202 is further configured to: key values of relevant dimension data are recorded in log-type data.
In some optional implementations of the present embodiment, the calculating unit 203 is further configured to: extracting, converting and loading the processed data to obtain a basic data table; performing feature extraction and dimension aggregation on the basic data table to obtain an intermediate table; obtaining a business theme table of at least one industry based on the basic data table and the intermediate table; and calculating and displaying the business indexes of at least one industry based on the basic data table, the intermediate table and the business theme table.
In some optional implementations of the present embodiment, the presentation unit 204 is further configured to: aiming at the business product, constructing and displaying a business product report based on the processing data and the business index; determining key indexes based on business indexes aiming at engineering objects, constructing key index alarm rules, and extracting and displaying abnormal data through the alarm rules and processing data; and aiming at the strategy object, constructing a small flow test platform based on the processing data and the service index, and feeding back the test result of the test platform.
In the data processing apparatus provided by the embodiment of the present disclosure, first, the receiving unit 201 receives at least one preset type of service data related to at least one industry; secondly, in response to that certain preset type of service data meets the corresponding specification standard, the association unit 202 obtains specification data corresponding to the preset type of service data, and fuses the specification data corresponding to at least one preset type of service data to obtain an access data packet; thirdly, the computing unit 203 stores the access data packet by using a pre-constructed data warehouse, processes the access data packet to obtain processed data, and calculates and counts the processed data to obtain at least one business index of the industry; finally, the display unit 204 performs display of different application strategies on different business objects based on the processing data and the business indexes. Therefore, the unique characteristics of different businesses in each industry are reserved through the business indexes of each industry generated by the data warehouse; the data access of various service data can quickly and accurately realize the infrastructure of a service data circulation system, so that the data can be conveniently and low-cost multiplexed in a plurality of industry fields; different application strategies are provided for different business objects, reliable guidance is provided for positioning and development of products, and more data and manpower support is provided for mining deep values of data.
To address the problem of data islanding between industry data and industry data in the conventional technology, the present disclosure provides a data center system, and fig. 3 shows a flow 300 according to an embodiment of the data center system of the present disclosure, where the data center system includes: a data access module 301, a data management module 302 and a data application module 303.
The data access module 301 receives at least one preset type of service data related to at least one industry, obtains specification data corresponding to the preset type of service data in response to that a certain type of service data meets a corresponding specification standard, and fuses the specification data to obtain an access data packet.
The data management module 302 is configured to store the access data packet by using a pre-established data warehouse, process the access data packet to obtain processed data, and obtain the service index of the at least one industry by calculating and counting the processed data.
And the data application module 303 is configured to display different application strategies for the processing data and the service index according to different service objects.
In this embodiment, the whole data circulation system of the data center system is divided into three stages, i.e., upstream, midstream, and downstream. The data access module 301 belongs to the upstream of a data circulation system and is used for realizing data generation and access; the data management module 302 belongs to a data circulation system midstream and is used for realizing data storage and management; the data application module 303 belongs to the downstream of the data circulation system and is used for implementing data application.
In this embodiment, at least one industry may be one industry or multiple industries, the service data is data related to a service, the service data may be log-type data for recording user behaviors or dimensional data for recording service dimensions, and the like, and the source of the preset type of service data of at least one industry may be multiple, such as an hdfs file, json, a mongo database, a mysql database, and a log.
The log data is generally obtained in a form of a buried point, and the obtaining method of the log data comprises the following steps: a front end buried point and a rear end buried point. For example, data such as the number of clicks and the number of people for a certain button is counted and is often acquired through a front end buried point. If the number of opinions made by a user on a post and the number of praise times for a certain viewpoint need to be counted, a rear-end buried point is used for obtaining the opinions.
In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the log-type data of the user behaviors meet the regulations of related laws and regulations without violating the good customs of the public order.
In this embodiment, the data access module 301 performs standard judgment on at least one preset type of data related to at least one industry, and may screen compliant data in each type of data, where the compliant data is standard data, reduce access of junk data, unify data formats, solve the problem of data isolation, associate data with an association relationship, and may unify heterogeneous data of multiple sources, break through the problem of data islanding, improve development efficiency, and reduce data communication cost of developers.
In this embodiment, the following four objectives can be achieved by setting a standard for preset data in service data: (1) the data is complete and accurate: the data coverage can meet the requirements of different services, and the output data is accurate and credible. (2) The output is stable and timely: the resource allocation is reasonable and the calculation efficiency is high. (3) The operation and maintenance are convenient and efficient: monitoring alarm and daily task maintenance are convenient and efficient, and metadata coverage is complete. (4) Simple and convenient to use: the output data result is simple and easy to use, and the data is output and convenient to query.
In this embodiment, the business data of at least one industry does not distinguish the industry any more, but performs specification standard judgment according to the data type, the specification data meeting the specification standard enters the data warehouse, and when any kind of data in the business data does not meet the corresponding specification standard, the current kind of data may be discarded or a request may be sent upstream to cause the upstream to resend the data meeting the specification standard.
In this embodiment, the above-mentioned fusion of the specification data corresponding to at least one preset type of service data may be the fusion of different data in the same preset type of data, or the fusion between multiple preset types of data. Further, the fusing of the specification data corresponding to the at least one preset type of service data includes: and correlating related service data in the same preset type data, for example, the track points of the same individual at different moments, and the track points at different moments are correlated under the same individual. Optionally, the merging of the specification data corresponding to the at least one preset type of service data further includes: and associating the related service data in the different preset types of data, for example, associating the time and the position of the same individual.
In this embodiment, all the related data in the same or different preset types of service data may be related to various service data in the same industry, or related to various service data between different industries, for example, data of the hotel industry is related to data of the travel industry, and data of the hotel industry is related to data of the e-commerce industry.
In this embodiment, the data management module 302 stores and manages data by using a pre-constructed data warehouse and a data warehouse management sub-module, and may store all access data output by the data access module in the data warehouse, where the data warehouse processes (such as data cleaning, data updating, and deleting) an access data packet to obtain processed data, and the data management sub-module selects how to process the processed data into intermediate data results (such as an intermediate table) according to the data type and the relation among the data. The data management module of the embodiment combines a plurality of service lines of a plurality of industries, and realizes storage and management from source data to data indexes. It should be noted that the service indexes of different industries may be the same or may be the same. The data warehouse can perform feature extraction and dimension aggregation on different business data of at least one industry to obtain business indexes of each industry.
In this embodiment, the data application module 303 provides different application strategies for different service objects, so that the data really serves the service, and an effective and reliable development direction is guided for the service.
In this embodiment, the business object may be a product, and for the product, all general indicators related to the product may be converted into a data table structure having fields, and the data stored in the data management module is combined with the data table structure, so as to generate an intelligent report for the product, and visually display the intelligent report, so that business personnel can know the current situation of the product in real time.
Furthermore, all the intelligent reports of the general indexes related to the product can be displayed to business personnel completely, or partial intelligent reports can be displayed based on the retrieval requirements of the business personnel, so that a reliable direction is provided for the positioning and development of the product.
In the data relay station system provided in the embodiment of the present disclosure, the data access module 301 receives at least one preset type of service data related to at least one industry, obtains specification data corresponding to a certain type of service data in response to that the certain type of service data meets a corresponding specification standard, and fuses the specification data to obtain an access data packet. The data management module 302 is configured to store the access data packet by using a pre-established data warehouse, process the access data packet to obtain processed data, and obtain the service index of the at least one industry by calculating and counting the processed data. And the data application module 303 is configured to display different application strategies for the processing data and the service index according to different service objects. Therefore, the unique characteristics of different businesses in each industry are reserved through the business indexes of each industry generated by the data warehouse; the data access of various service data can quickly and accurately realize the infrastructure of a service data circulation system, so that the data can be conveniently and low-cost multiplexed in a plurality of industry fields; different application strategies are provided aiming at different business objects, reliable guidance is provided for positioning and development of products, and more data and manpower support is provided for mining deep values of data.
In some optional implementations of this embodiment, the service data includes: log-type data and dimensional data; the specification standards for log-type data include: whether the standard of 5W is met, whether the precision is a preset granularity, whether the transmission meets the consistency rule and the relevance degree of the dimension data.
In this embodiment, the 5W standard includes: WHO: who retained the data? That is, recording the basic information of the user, including user basic data such as user name, user source, etc.; second, WHEN: when a buried point of departure? Namely, a time stamp triggered by a buried point, a user state, a network environment and the like; ③ WHERE: where? The user-triggered buried points belong to which product in the product matrix, on what page, and buried points under what component. Fourthly, WHAT: what was done? The behavior of the user, namely the type of the embedded point, shows, clicks or markets. WHY: why the behavior is generated is used for recording the continuous behavior relation of the user, and the last behavior of the user is combined.
In this embodiment, the preset granularity is a granularity related to an industry, and the preset granularities corresponding to log-type data of different industries are different, such as a community-type industry, and the preset granularity is a granularity in the level of days; for example, in the e-commerce industry, the preset granularity is a session granularity, where the session granularity is a life cycle granularity.
When the log-type data is accurate to the session granularity, the user can access the product for multiple times in the same day. For each subsequent classification of a user's lifecycle, the log data should label the data within the same lifecycle.
In this embodiment, for a scenario in which a plurality of product scenarios jointly form a user life cycle, whether the transmission of the log-type data satisfies the consistency rule includes: the log type data has the same data identification at different circulation stages. Since different scenes have own embedded point standards and characteristics, in order to prevent continuity faults of user behaviors caused by format changes of logs, the whole course of labeled information such as user IDs (identity) and session IDs (identity) should be transmitted.
When log-type data has corresponding dimension data, both the log-type data and the dimension data are indispensable, and for this reason, it is necessary to determine the association between the log-type data and the dimension data by the degree of association with the dimension data. The association degree with the dimension data includes: and analyzing whether the key value of the corresponding dimension data recorded in the log-type data is empty or not, and determining the association degree with the dimension data. And when the key value of the corresponding dimension data recorded in the log-type data exists, if so, determining that the log-type data is associated with the dimension, so that the log-type data and the dimension can be associated conveniently in subsequent analysis.
In this embodiment, the specification standard of the dimension data includes: whether a preset richness value is satisfied, whether a preset format is satisfied, and a degree of association with the log-type data.
In this embodiment, the dimension data is data related to a business, for example, the dimension data is data related to a user, an order, a thread, a material, and the like. Further, the dimension data may include data updated in real time such as accumulated user and order data, and data updated once for a long time such as service city and store information. Dimension data is typically stored in a relational database (mysql), a non-relational database (mongo).
In this embodiment, whether the preset richness value is satisfied includes: whether the dimension data describe at least two aspects of information, the service data has richer dimensions, and more angles and comprehensive and detailed results can be obtained during data analysis. Such as the age, gender, and dimensions of the user.
Whether the preset format is satisfied includes: in the embodiment, the service data are often crossed, and the stored format is required to be standardized and unified for the data of the same type. For example, some tables store city information "beijing", and some tables store "beijing city", which is a phenomenon of non-uniform format. For another example, the school information stored in the two dimension tables is the chinese name "beijing university" and the code "10001" of the school, respectively, and a third correspondence table needs to be obtained, so that the correspondence between the name of the college and the code of the school can be queried.
In this embodiment, the association between the log-type data and the dimension data needs to be determined according to the association between the log-type data and the dimension data, and the association between the log-type data and the dimension data includes: the dimension data should use a unique key value in the log data, so that the log data is prevented from being matched with error data or dirty data when the dimension data is matched.
In the optional implementation mode, based on the main type of the service data, the service data is divided into log-type data for recording user behaviors and dimensional data for recording the basis of the service data, and different standard standards are respectively set for the log-type data and the dimensional data, so that the reliability of service data processing and the uniformity of data in a data center system are improved.
In some optional implementations of this embodiment, the data warehouse includes: the data access system comprises a data dimension layer, a data access layer, a basic data layer, a middle data layer, a data theme layer and a data application layer, wherein the data dimension layer is used for storing and updating dimension data; the data access layer is used for storing and updating the access data packet output by the data access module; the data management module also comprises a warehouse counting management submodule for controlling the data warehouse, wherein the warehouse counting management submodule performs extraction, conversion and loading processing on data of the data access layer to obtain a basic data table, and stores the basic data table into the basic data layer; the warehouse counting management submodule performs characteristic extraction and dimension aggregation on the basic data table to obtain an intermediate table, and stores the intermediate table in an intermediate data layer; the warehouse counting management submodule obtains a business theme table of at least one industry based on the basic data table and the middle table and stores the business theme table into a data theme layer; and the warehouse counting management submodule calculates and displays at least one business index of the industry based on the basic data table, the intermediate table and the business theme table, and stores the business index into the data application layer.
In this embodiment, the data warehouse is an integrated, topic-oriented, relatively stable data set reflecting historical changes, which is used to support decision analysis processes of an enterprise or organization. In the data warehouse, the data dimension layer: the data storage module is used for storing the dimension data and updating the dimension data according to the data category of the business data. A data access layer: the method comprises the steps of storing access data packets collected at the upstream (data access module) and periodically deleting data in the access data packets. And a control module (such as a warehouse management sub-module) for controlling the data warehouse performs Extract-Transform-Load (ETL) processing on the access data packet, wherein the ETL is used for describing the process of extracting, converting and loading the data from the source end to the destination end, and storing the data processed by the ETL in a basic data layer. And the control module performs feature extraction and dimension aggregation on the basic data table generated by the basic data layer to generate an intermediate table, and stores the intermediate table in the intermediate data layer. The control module constructs a business theme table based on different businesses in different industries, and stores the business theme table in a data theme layer. And the warehouse counting management submodule performs index calculation according to the basic data table, the intermediate table and the business theme table to obtain business indexes, the business indexes are stored in the data application layer, and the data application layer displays and applies the data and the business indexes in various forms.
In this embodiment, in the six-layer structure of the data warehouse, in addition to the data dimension layer, five layers of the other data access layer, the basic data layer, the intermediate data layer, the data subject layer, and the data application layer are in a progressive relationship, and in the five layers, the upper layer data can only be derived from the next layer or the next layer.
The data warehouse provided by the optional implementation mode adopts a six-layer structure of a data dimension layer, a data access layer, a basic data layer, a middle data layer, a data subject layer and a data application layer, and realizes storage and management from source data to data indexes by combining a plurality of service lines of a plurality of industries.
In some optional implementations of this embodiment, the data management module is further configured to: and receiving the metadata and mapping the metadata and the service data.
In this optional implementation, the metadata is data describing service data, and the metadata may be data input to the data center station by a developer through a service foreground.
The data management module ensures that the data standard of the whole data center can be on the ground, and can realize the management of the metadata besides mapping the metadata and the service data.
Optionally, as shown in fig. 4, the upper data management module 400 further includes: a buried point management submodule 401, a bin counting management submodule 402 and a variable management submodule 403.
The buried point management submodule 401: since the full process of the embedded point relates to a plurality of roles such as a service product, front (back) end development, testing, data development and the like, and is a work needing multi-party cooperation, the embedded point management submodule 401 standardizes the work which each role needs to be responsible for in the stage of being responsible for. Meanwhile, the method is beneficial to management, use and updating of the buried points, and the quality and efficiency of data output are improved.
Bin count management submodule 402: the data warehouse is divided into six levels, each level relates to dozens of data tables, and more data tables are hundreds, and the warehouse management submodule 402 manages the metadata of each table and records the blood relationship among the data tables, so that the use cost of the data is reduced, and the development efficiency is improved.
The variable management submodule 403: in the whole data circulation system, the production of data, the extraction, the conversion, the storage and the warehousing of the data and the final visualization and use are all based on automatic operation. The code is prevented from being repeatedly modified by developers, the redundancy of the code is reduced, and the introduction of new problems is avoided. The variable management submodule 403 manages all variables of the data circulation system, such as adding, deleting, modifying and searching variables of channel numbers, sources and the like, and further liberates research and development manpower.
The data management module provided by the optional implementation mode can enable research personnel to effectively determine the conditions of data distribution and the like in the data center station system by mapping the metadata and the service data, and ensures the reliability of the management of the data in the data center station system.
In some optional implementations of this embodiment, the business object includes: business products, engineering objects, policy objects; the application strategy of the data application module to the business product comprises the following steps: constructing and displaying a business product report based on the processing data and the business indexes; the application strategy of the data application module to the engineering object comprises the following steps: determining key indexes based on the service indexes, constructing a key index alarm rule, and extracting and displaying abnormal data through the alarm rule and the processing data; the application strategy of the strategy object by the data application module comprises the following steps: and constructing a small-flow test platform based on the processed data and the service indexes, and feeding back the test result of the test platform.
In this embodiment, the processing data is obtained after performing processing such as updating, deleting, or data cleaning on the access data packet, and the processing data may be data of each layer in the data warehouse or data finally obtained after performing the data warehouse.
Optionally, applying the policy to the policy object may further include: and extracting data characteristic engineering based on the processed data and the service indexes to obtain service data characteristics, and verifying the rationality of the architecture of the data center system through the service data characteristics.
In the optional implementation manner, for the service product, a service BI (Business Intelligence) report can be constructed based on the processing data and the service index, and the common index is displayed rationally and visually. And business personnel can know the current situation of the product in time. And according to the time-sequenced service data, analyzing the next development discovery of the service, and realizing the aim of data-driven service.
For the engineering object, the key indexes can be monitored, index abnormity alarming and abnormal index extraction are established, engineering personnel are assisted to quickly find the bugs and repair the bugs in time, and a solid foundation is laid for ensuring the product quality.
For the strategy object, on one hand, a small-flow experiment platform is constructed, and the experiment result can be fed back in time through the small-flow experiment platform, so that strategy iteration is promoted. On the other hand, data characteristic engineering extraction is carried out, dimension data are enriched, and the accuracy of the power model is improved.
The data application module for displaying the application strategies of different business objects, which is provided by the optional implementation mode, is oriented on the product, so that the efficiency and the accuracy of data output are improved, multi-dimensional data analysis is assisted, and guidance is provided for positioning and development of the product; in the research and development efficiency improvement direction, abnormal data monitoring is perfected, the problem is quickly positioned, the research and development efficiency is improved, and the product quality is ensured; in the strategy effect improvement direction, the construction of an experiment management platform improves the strategy model iteration efficiency.
In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 5 illustrates a schematic block diagram of an example electronic device 500 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 5, the apparatus 500 comprises a computing unit 501 which may perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM503, various programs and data required for the operation of the device 500 can also be stored. The calculation unit 501, the ROM 502, and the RAM503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
A number of components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 501 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 501 executes the respective methods and processes described above, such as the data processing method. For example, in some embodiments, the data processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into the RAM503 and executed by the computing unit 501, one or more steps of the data processing method described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the data processing method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus or data center system such that the program codes, when executed by the processor or controller, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (23)

1. A method of data processing, the method comprising:
receiving at least one preset type of business data related to at least one industry;
responding to the fact that certain preset type of service data meets the corresponding standard, obtaining standard data corresponding to the preset type of service data, and fusing the standard data corresponding to at least one preset type of service data to obtain an access data packet;
storing the access data packet by adopting a pre-constructed data warehouse, and processing the access data packet to obtain processing data, wherein the processing data is calculated and counted to obtain a service index of the at least one industry;
and displaying different application strategies for different business objects based on the processing data and the business indexes.
2. The method of claim 1, further comprising:
and supplementing the service data of the type based on the attribute of the service data of the type in response to the condition that the service data of a certain preset type does not meet the corresponding standard.
3. The method of claim 1, wherein the traffic data comprises: log type data; the obtaining of the specification data corresponding to a certain preset type of service data in response to the fact that the certain preset type of service data meets the corresponding specification standard, and the obtaining of the access data packet by fusing the specification data corresponding to at least one preset type of service data includes:
responding to the fact that the log type data meet a 5W standard, and detecting whether the precision of the log type data is a preset granularity or not;
responding to the fact that the precision of the log type data is a preset granularity, and detecting whether the log type data meets a transmission consistency rule or not;
responding to the fact that the log type data meet a transmission consistency rule, obtaining standard data, and detecting whether data related to the standard data exist or not;
and responding to the data related to the specification data, and associating the data related to the specification data with the specification data to obtain an access data packet.
4. The method of claim 3, wherein the traffic data further comprises: dimension data; the method includes the following steps that in response to that certain preset type of service data meets a corresponding standard, standard data corresponding to the preset type of service data is obtained, the standard data corresponding to at least one preset type of service data is fused, and an access data packet is obtained, and the method further includes:
determining richness of the dimension data;
detecting whether the dimension data meets a preset format or not in response to the richness of the dimension data meeting a preset richness value;
and in response to the dimensional data meeting a preset format, using the dimensional data as one of the specification data, and enabling the dimensional data and the log-type data to use the same key value when the dimensional data and the log-type data are related.
5. The method of claim 4, wherein the obtaining of the canonical data corresponding to a certain preset type of service data in response to that the certain preset type of service data meets a corresponding canonical standard, and the fusing of the canonical data corresponding to at least one preset type of service data to obtain an access data packet further comprises:
and recording key values of related dimension data in the log type data.
6. The method according to any one of claims 1-5, wherein the calculating and counting the processed data to obtain the business indicator of the at least one industry comprises:
extracting, converting and loading the processing data to obtain a basic data table;
performing feature extraction and dimension aggregation on the basic data table to obtain an intermediate table;
obtaining a business theme table of the at least one industry based on the basic data table and the intermediate table;
and calculating and displaying the business indexes of the at least one industry based on the basic data table, the intermediate table and the business theme table.
7. The method of claim 6, wherein the exposing different application policies to different business objects based on the processing data and the business indicators comprises:
constructing and displaying a business product report for a business product based on the processing data and the business index;
determining a key index based on the business index aiming at an engineering object, constructing a key index alarm rule, and extracting and displaying abnormal data through the alarm rule and the processing data;
and aiming at the strategy object, constructing a small flow test platform based on the processing data and the service index, and feeding back a test result of the test platform.
8. The method of claim 7, wherein the traffic data comprises: vehicle industry dimension data and individual behavior data; the access data comprises: vehicle appearance, vehicle type, price, individual location; the processing data includes: the system comprises an individual and vehicle appearance association table, a vehicle and price association table, an individual and price association table and a vehicle type association table, wherein the individual and vehicle appearance association table is arranged at different positions; the service indexes comprise: a service coverage distribution index and a service coverage trend index.
9. A data processing apparatus, the apparatus comprising:
a receiving unit configured to receive at least one preset type of service data related to at least one industry;
the association unit is configured to respond to the fact that certain preset type of service data meets the corresponding standard, obtain standard data corresponding to the preset type of service data, and fuse the standard data corresponding to at least one preset type of service data to obtain an access data packet;
the computing unit is configured to store the access data packet by adopting a pre-constructed data warehouse, process the access data packet to obtain processed data, and obtain the service index of the at least one industry through computation and statistics of the processed data;
and the display unit is configured to display different application strategies for different business objects based on the processing data and the business indexes.
10. The apparatus of claim 9, the apparatus further comprising:
and the supplementing unit is configured to respond to the fact that the service data of a certain preset type does not meet the corresponding standard, and supplement the service data of the type based on the attribute of the service data of the type.
11. The apparatus of claim 9, wherein the traffic data comprises: log type data; the association unit is further configured to: responding to the fact that the log type data meet a 5W standard, and detecting whether the precision of the log type data is a preset granularity or not; responding to the fact that the precision of the log type data is a preset granularity, and detecting whether the log type data meets a transmission consistency rule or not; responding to the fact that the log type data meet a transmission consistency rule, obtaining standard data, and detecting whether data related to the standard data exist or not; and responding to the data related to the specification data, and associating the data related to the specification data with the specification data to obtain an access data packet.
12. The apparatus of claim 11, wherein the traffic data further comprises: dimension data; the association unit is further configured to: determining the richness of the dimension data; detecting whether the dimension data meets a preset format or not in response to the richness of the dimension data meeting a preset richness value; and in response to the dimensional data meeting a preset format, using the dimensional data as one of the specification data, and enabling the dimensional data and the log-type data to use the same key value when the dimensional data and the log-type data are related.
13. The apparatus of claim 12, wherein the associating unit is further configured to: and recording key values of related dimension data in the log type data.
14. The apparatus according to one of claims 9-13, wherein the computing unit is further configured to:
extracting, converting and loading the processing data to obtain a basic data table;
performing feature extraction and dimension aggregation on the basic data table to obtain an intermediate table;
obtaining a business theme table of the at least one industry based on the basic data table and the intermediate table;
and calculating and displaying the business indexes of the at least one industry based on the basic data table, the intermediate table and the business theme table.
15. The method of claim 14, wherein the presentation unit is further configured to: constructing and displaying a business product report for a business product based on the processing data and the business index; determining a key index based on the business index aiming at an engineering object, constructing a key index alarm rule, and extracting and displaying abnormal data through the alarm rule and the processing data; and aiming at the strategy object, constructing a small flow test platform based on the processing data and the service index, and feeding back a test result of the test platform.
16. A data center system, the system comprising:
the data access module is used for receiving at least one preset type of service data related to at least one industry, responding to the condition that a certain type of service data meets the corresponding standard, obtaining the standard data corresponding to the preset type of service data, and fusing the standard data corresponding to the at least one preset type of service data to obtain an access data packet;
the data management module is used for storing the access data packet by adopting a pre-constructed data warehouse, processing the access data packet to obtain processed data, and calculating and counting the processed data to obtain the service index of the at least one industry;
and the data application module is used for displaying different application strategies on the processing data and the service indexes aiming at different service objects.
17. The system of claim 16, wherein the traffic data comprises: log-type data and dimensional data;
the specification standards of the log type data include: whether the standard of 5W is met, whether the precision is a preset granularity, whether the transmission meets a coherence rule, and the degree of association with the dimension data;
the specification standards of the dimension data include: whether a preset richness value is satisfied, whether a preset format is satisfied, and a degree of association with the log-type data.
18. The system of claim 17, wherein the data repository comprises: the system comprises a data dimension layer, a data access layer, a basic data layer, a middle data layer, a data theme layer and a data application layer; the data dimension layer is used for storing and updating dimension data; the data access layer is used for storing and updating the access data packet output by the data access module;
the data management module also comprises a warehouse counting management submodule for controlling the data warehouse, and the warehouse counting management submodule extracts, converts and loads data of the data access layer to obtain a basic data table and stores the basic data table into the basic data layer;
the warehouse counting management submodule performs characteristic extraction and dimension aggregation on the basic data table to obtain an intermediate table, and stores the intermediate table into the intermediate data layer;
the warehouse counting management submodule obtains a business theme table of the at least one industry based on the basic data table and the intermediate table and stores the business theme table into the data theme layer;
and the warehouse counting management submodule calculates and displays the service indexes of the at least one industry based on the basic data table, the intermediate table and the service theme table, and stores the service indexes into the data application layer.
19. The system of any of claims 16-18, wherein the data management module is further to: and receiving metadata and mapping the metadata and the service data.
20. The system of claim 16, wherein the business object comprises: business products, engineering objects, policy objects;
the application strategy of the data application module to the business product comprises the following steps: constructing and displaying a business product report based on the processing data and the business index;
the application strategy of the data application module to the engineering object comprises the following steps: determining key indexes based on the service indexes, constructing a key index alarm rule, and extracting and displaying abnormal data through the alarm rule and the processing data;
the application of the policy to the policy object by the data application module comprises: and constructing a small-flow test platform based on the processing data and the service index, and feeding back a test result of the test platform.
21. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.
22. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-8.
23. A computer program product comprising a computer program which, when executed by a processor, implements the method of any one of claims 1-8.
CN202210719522.1A 2022-06-23 2022-06-23 Data processing method and device and data center system Active CN114969161B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210719522.1A CN114969161B (en) 2022-06-23 2022-06-23 Data processing method and device and data center system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210719522.1A CN114969161B (en) 2022-06-23 2022-06-23 Data processing method and device and data center system

Publications (2)

Publication Number Publication Date
CN114969161A true CN114969161A (en) 2022-08-30
CN114969161B CN114969161B (en) 2023-09-08

Family

ID=82966156

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210719522.1A Active CN114969161B (en) 2022-06-23 2022-06-23 Data processing method and device and data center system

Country Status (1)

Country Link
CN (1) CN114969161B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115794827A (en) * 2022-11-29 2023-03-14 广发银行股份有限公司 Data table structure management system and method
CN116226894A (en) * 2023-05-10 2023-06-06 杭州比智科技有限公司 Data security treatment system and method based on meta bin

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105631004A (en) * 2015-12-28 2016-06-01 华夏银行股份有限公司 Data processing method and system
WO2020143298A1 (en) * 2019-01-10 2020-07-16 华为技术有限公司 Method, device, and system for implementing service continuity
CN113392646A (en) * 2021-07-07 2021-09-14 上海软中信息技术有限公司 Data relay system, construction method and device
CN113407649A (en) * 2021-06-30 2021-09-17 北京百度网讯科技有限公司 Data warehouse modeling method and device, electronic equipment and storage medium
CN113987086A (en) * 2021-10-26 2022-01-28 北京百度网讯科技有限公司 Data processing method, data processing device, electronic device, and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105631004A (en) * 2015-12-28 2016-06-01 华夏银行股份有限公司 Data processing method and system
WO2020143298A1 (en) * 2019-01-10 2020-07-16 华为技术有限公司 Method, device, and system for implementing service continuity
CN113407649A (en) * 2021-06-30 2021-09-17 北京百度网讯科技有限公司 Data warehouse modeling method and device, electronic equipment and storage medium
CN113392646A (en) * 2021-07-07 2021-09-14 上海软中信息技术有限公司 Data relay system, construction method and device
CN113987086A (en) * 2021-10-26 2022-01-28 北京百度网讯科技有限公司 Data processing method, data processing device, electronic device, and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115794827A (en) * 2022-11-29 2023-03-14 广发银行股份有限公司 Data table structure management system and method
CN116226894A (en) * 2023-05-10 2023-06-06 杭州比智科技有限公司 Data security treatment system and method based on meta bin

Also Published As

Publication number Publication date
CN114969161B (en) 2023-09-08

Similar Documents

Publication Publication Date Title
CN114969161B (en) Data processing method and device and data center system
CN106570778A (en) Big data-based data integration and line loss analysis and calculation method
CN113872813B (en) Full life cycle management method and system for carrier communication equipment
CN108170832B (en) Monitoring system and monitoring method for heterogeneous database of industrial big data
Ding et al. Massive heterogeneous sensor data management in the Internet of Things
CN112269885A (en) Method, apparatus, device and storage medium for processing data
CN109615172A (en) A kind of method and terminal handling examination data
CN112052134A (en) Service data monitoring method and device
CN112561332A (en) Model management method, model management apparatus, electronic device, storage medium, and program product
CN115457211A (en) Transformer substation management method and system based on digital twins
CN115574867A (en) Mutual inductor fault detection method and device, electronic equipment and storage medium
CN113987086A (en) Data processing method, data processing device, electronic device, and storage medium
CN114756301B (en) Log processing method, device and system
CN116823570A (en) Government work data processing method and device, electronic equipment and storage medium
CN116955856A (en) Information display method, device, electronic equipment and storage medium
CN113407587B (en) Data processing method, device and equipment for online analysis processing engine
CN115767601A (en) 5GC network element automatic nanotube method and device based on multidimensional data
CN114049036A (en) Data computing platform, method, device and storage medium
CN112929198B (en) Local hotspot processing method and device, electronic equipment and storage medium
CN113722141A (en) Method and device for determining delay reason of data task, electronic equipment and medium
CN112817938A (en) General data service construction method and system based on data productization
CN112784129A (en) Pump station equipment operation and maintenance data supervision platform
CN115809256B (en) Public security management integrated information system and visual display method
CN113408920B (en) Migration mode determining method, migration mode determining device, migration mode determining equipment and storage medium
CN115017875B (en) Enterprise information processing method, device, system, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant