CN112416945A - Data processing method and system based on big data platform and computer equipment - Google Patents

Data processing method and system based on big data platform and computer equipment Download PDF

Info

Publication number
CN112416945A
CN112416945A CN202011418673.0A CN202011418673A CN112416945A CN 112416945 A CN112416945 A CN 112416945A CN 202011418673 A CN202011418673 A CN 202011418673A CN 112416945 A CN112416945 A CN 112416945A
Authority
CN
China
Prior art keywords
data
user
data structure
platform
processing method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011418673.0A
Other languages
Chinese (zh)
Inventor
李仁旺
刘俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Enyike Beijing Data Technology Co ltd
Original Assignee
Enyike Beijing Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Enyike Beijing Data Technology Co ltd filed Critical Enyike Beijing Data Technology Co ltd
Priority to CN202011418673.0A priority Critical patent/CN112416945A/en
Publication of CN112416945A publication Critical patent/CN112416945A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a data processing method, a system and computer equipment based on a big data platform. The processing method comprises the following steps: a data structure acquisition step, wherein unique field identification is carried out on user attribute data or/and user behavior data, and the unique field identification is arranged into a preset data structure; a data processing step, namely sending the data structure to a message queue, and processing the data structure by the message queue and storing the data structure into a corresponding data table of a storage layer; and a data operation step, namely, operating the data table by adopting a query tool through a data query and analysis platform. By constraining the data structure during data storage, the problem that correlation analysis cannot be performed on the user attribute and the event attribute data is solved, the user attribute and the event attribute can be correlated in big data processing, the complexity of the data structure is reduced, the pressure of development and operation and maintenance personnel is reduced, and data verification is facilitated.

Description

Data processing method and system based on big data platform and computer equipment
Technical Field
The present application relates to the field of big data technologies, and in particular, to a data processing method, system and computer device based on a big data platform.
Background
With the rise of science, technology, intelligence and digitization, more and more behaviors of users can be collected and digitally recorded, for example: the data volume of the click behavior, the purchase behavior, the social behavior, the entertainment behavior and the like is larger and larger as more and more recorded behaviors are recorded, the behavior data are always continuously changed along with time and cannot be predicted, but basically, the data such as basic attributes of the user and the like are rarely changed, such as the gender of the user, the birthday of the user, the mobile phone number used by the user and the like. For the two types of data, all the data are recorded at one time, which results in data duplication and redundancy, and as the amount of users increases, the data are increased explosively, which generates great pressure on later use of the data or query of the data, and a scheme capable of simplifying storage is needed to relieve the pressure.
In the prior art, selection is performed from two directions, one is selection by considering an application scene, and the current big data application scene can be classified into OLAP and OLTP. Namely, On Line Analytical Processing and On Line Transaction Processing. The former mainly aims at data analysis and emphasizes timeliness, and the latter mainly aims at data service processing and emphasizes serviceability. The second is to select the model by considering the data volume, which is determined by the service, and finally, the data model is difficult to select due to the uncertainty of the service.
The prior art has the following defects: firstly, the stored data structure is not considered and analyzed; and secondly, the correlation analysis of the user attribute and the event attribute cannot be carried out.
At present, no effective solution is provided for the problem that the correlation analysis of the user attribute and the event attribute cannot be performed in the related technology.
Disclosure of Invention
The embodiment of the application provides a data processing method, a data processing system and computer equipment based on a big data platform, which are applied to a model in which user attributes and event attributes are associated in big data storage, and query and analysis can be conveniently carried out through association relations in subsequent analysis and query by constraining a data structure in data storage.
In a first aspect, an embodiment of the present application provides a data processing method based on a big data platform, including the following steps:
a data structure acquisition step, wherein unique field identification is carried out on user attribute data or/and user behavior data, and the unique field identification is arranged into a preset data structure;
a data processing step, namely sending the data structure to a message queue, wherein the message queue processes the data structure and stores the data structure into a corresponding data table of a storage layer;
and a data operation step, namely, operating the data table by adopting a query tool through a data query and analysis platform.
In some embodiments, the data structure obtaining step specifically includes:
a data acquisition step, wherein the user attribute data or/and the user behavior data are acquired from a data source;
a field identification step, namely identifying and outputting an identification condition by adopting a preset identification rule according to the USER attribute data or/and the USER field in the USER behavior data;
and a structure arrangement step, namely arranging the data structure into a preset json data structure according to the identification condition.
In some embodiments, the specific content of the USER field includes a visitor ID, an account ID, and a USER ID, wherein:
the user ID is generated according to either or both of the account ID and the visitor ID;
the three correspond to each other one by one, and the priority of the three is from high to low: user ID > account ID > guest ID.
In some embodiments, when the user attribute data or/and the user behavior data contain a new account ID, attempting to bind with the current guest ID, and when the guest ID exists and no other account ID is bound, the binding is successful, and the same user ID is used for the account ID and the data identified by the guest ID; and when the visitor ID does not exist or is bound by other account IDs, the binding fails and is not carried out any more, and the unbound ID is NULL.
In some embodiments, the data processing step specifically includes:
a structure processing step, wherein the message queue performs ETL processing on the data structure according to a corresponding business logic rule by reading and analyzing a configuration file of a configuration center;
and a data entry step, wherein the message queue creates a corresponding data table in the storage layer and stores the processed data structure into the data table.
In some embodiments, in the step of entering the data into the table, when the unique field already exists in the corresponding data table, the data structure is directly stored in the existing data table.
In a second aspect, an embodiment of the present application provides a data processing system based on a big data platform, where the data processing method according to the first aspect is applied, and includes:
the structure acquisition module is used for carrying out unique field identification on the user attribute data or/and the user behavior data and sorting the unique field identification into a preset data structure;
the data processing module is used for sending the data structure to a message queue, and the message queue is used for processing the data structure and storing the data structure into a corresponding data table of a storage layer;
and the data operation module is used for operating the data table by adopting a query tool through a data query and analysis platform.
In some of these embodiments, the structure acquisition module comprises:
the data acquisition unit is used for acquiring the user attribute data or/and the user behavior data from a data source;
the field identification unit is used for identifying and outputting an identification condition by adopting a preset identification rule according to the USER attribute data or/and the USER field in the USER behavior data;
and the structure arrangement unit is used for arranging the data structure into a preset json data structure according to the identification condition.
In some of these embodiments, the data processing module comprises:
the structure processing unit is used for reading and analyzing a configuration file of a configuration center by the message queue and carrying out ETL processing on the data structure according to a corresponding business logic rule;
and the message queue creates a corresponding data table in the storage layer and stores the processed data structure into the data table.
In a third aspect, an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the data processing method based on a big data platform as described in the first aspect is implemented.
Compared with the related art, the data processing method based on the big data platform is provided by the embodiment of the application. By constraining the data structure during data storage, the problem that correlation analysis cannot be performed on the user attribute and the event attribute data is solved, the user attribute and the event attribute can be correlated in big data processing, the complexity of the data structure is reduced, the pressure of development and operation and maintenance personnel is reduced, and data verification is facilitated.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow chart of a big data platform based data processing method according to an embodiment of the present application;
FIG. 2 is a flow chart of data structure acquisition steps according to an embodiment of the present application;
FIG. 3 is a flow chart of data processing steps according to an embodiment of the present application;
FIG. 4 is a flow chart of a data processing method according to a preferred embodiment of the present application;
FIG. 5 is a block diagram of a big data platform based data processing system according to an embodiment of the present application;
fig. 6 is a hardware structure diagram of a computer device according to an embodiment of the present application.
Description of the drawings:
1. a structure acquisition module; 2. a data processing module; 3. a data manipulation module;
11. a data acquisition unit; 12. a field identification unit; 13. a structure arrangement unit;
21. a structure processing unit; 22. a data entry unit; 81. a processor;
83. a communication interface; 80. a bus; 82. a memory.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.
It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
The data structure comprises a plurality of data properties including Event, USER and data rule, wherein Event represents a certain or a series of meaningful behaviors of the USER, such as that the USER adds a commodity into a shopping cart, browses a video and the like, one Event mainly comprises two parts of information, one part is used for describing how the behavior occurs, the name (What) of the behavior is mainly the name (What) of the behavior, the USER (Who) generating the behavior and the time (When) generating the behavior; another part is the nature of the behavior, such as the video name of the viewed video, or the amount paid in the event of a payment, etc. User is used to describe the status and fixed attributes of each User, such as the User's ID, registration time or cumulative payment amount, etc., where the most important is the ID identifying a User, which is a unique identification of the User's identity. The data rules are used for organizing and appointing data according to rules, and finally forming the structure and the style required by the model.
Presto is a facebook open-source distributed SQL query engine, is suitable for interactive analysis query, and supports GB to PB bytes in data volume. The architecture of presto has evolved from that of relational databases. presto can stand out in various memory computing databases in the following points:
(1) the clear architecture is a system capable of operating independently and independent of any other external system. For example, scheduling, presto itself provides monitoring for the cluster, and scheduling may be completed according to the monitoring information.
(2) Simple data structures, columnar stores, logical rows, most of which can be easily transformed into the data structures required for presto.
(3) The method is characterized by comprising the steps of enriching plug-in interfaces, perfectly butting an external storage system, or adding a self-defined function.
The embodiment provides a data processing method based on a big data platform. Fig. 1 is a flowchart of a data processing method based on a big data platform according to an embodiment of the present application, and as shown in fig. 1, the flowchart includes the following steps:
a data structure obtaining step S1, wherein unique field identification is carried out on the user attribute data or/and the user behavior data, and the unique field identification is arranged into a preset data structure;
a data processing step S2, sending the data structure to a message queue, where the message queue processes the data structure and stores it in a corresponding data table of a storage layer;
and a data operation step S3, wherein a query tool is used to operate on the data table through a data query and analysis platform.
Through the steps, a data processing method for a model comprising user attributes and event attributes is provided, unique field identification is carried out on user attribute data or/and user behavior data, and the user attribute data or/and the user behavior data are arranged into a preset data structure; the complexity of a data structure is reduced, the pressure of development and operation and maintenance personnel is reduced, and data verification is facilitated. By storing the USER attributes and event attributes of the same USER field in the same data table, the storage pressure is relieved. The invention adopts the json data structure, reduces the pressure of data transmission, is easy to expand, is convenient for the operation of users and reduces the difficulty of the developers.
In practical applications, the data structure comprises a system predetermined field and attributes, wherein the system predetermined field comprises a unique field, a distinguishing field and a content field, the content field comprises a plurality of attributes contained in the data information,
the data structure is divided into a user attribute data structure and an event attribute data structure, and both comprise a system attribute and a personal attribute, wherein the system attribute of the two data structures is set into a plurality of system preset fields comprising an ID field, a distinguishing field (# type), a time field (# time) and a # uuid, the system attribute of the event attribute data structure also comprises an event field (# event _ name), and the personal attribute (properties) comprises a preset user or event attribute.
The data structure is mainly JSON data, and the JSON data is in row units: one line of JSON data corresponds to one piece of data in a physical sense, the data sense corresponds to a user generating one-time behavior, or one-time user attribute is set, and a JSON data structure is adopted, so that the pressure of data transmission is reduced, and the extension is easy; the operation of the user is convenient, and the difficulty of the developer in the operation is reduced.
See table below for data structures:
table 1-user attribute data structure example
Figure BDA0002821244960000061
Figure BDA0002821244960000071
TABLE 2 event Attribute data Structure
Figure BDA0002821244960000072
It should be noted that the line wrapping of the above example data is convenient for display, and the real data is not wrapped.
And the same-layer field of properties describes and constitutes basic information of the piece of data, wherein the value of the # type field is used for distinguishing user attributes and field attributes, if the user attributes are the user attributes, the # type field is taken as a user, if the event attributes are the event attributes, the # type field is taken as a track, and the only field related and bound between the user attributes and the event attributes is the # user _ id.
The inner field of properties describes and constitutes the content of the piece of data, namely, the property in the event or the user property needing to be set, and is directly used as the property or the analysis object during analysis.
It should be noted that the same-layer field of properties needs to be started with "#", and if there are other attributes started with "#", the same-layer field needs to be placed in the inner layer of the properties.
Fig. 2 is a flowchart of a data structure obtaining step according to an embodiment of the present application, and as shown in fig. 2, in some embodiments, the data structure obtaining step S1 specifically includes:
a data acquisition step S11, acquiring user attribute data or/and user behavior data from a data source;
a field identification step S12, wherein a preset identification rule is adopted to identify and output an identification condition according to the USER attribute data or/and the USER field in the USER behavior data;
and a structure arrangement step S13, arranging the data structure into a preset json data structure according to the identification condition.
In some embodiments, the specific content of the USER field includes a visitor ID (DISTINCT _ ID), an ACCOUNT ID (ACCOUNT _ ID), and a USER ID (USER _ ID), wherein the USER ID is generated according to either or both of the ACCOUNT ID and the visitor ID;
the three correspond to each other one by one, and the priority of the three is from high to low: user ID > account ID > guest ID. In practical application, the USER _ ID is based on the ACCOUNT _ ID and the DISTINCT _ ID data given by the upstream system, and after processing, a generated ID can be regarded as a single original ID, and a natural person can be found based on the identification; wherein, the ACCOUNT _ ID is an ACCOUNT ID and is used as a login ID of the user; the DISTINCT _ ID is a guest ID as an ID of the user in a non-logged-in state.
The USER ID, the account ID and the visitor ID are in one-to-one correspondence, the condition that one binding is large does not exist, the following table is an ID binding example, and the table indicates a natural person if the # USER _ IDs are the same.
TABLE 3 ID binding examples
#distinct_id #account_id #user_id
A First of all 1
A Second step 2
B Second step 2
B null 2
A null 1
B C3 3
In some embodiments, when the obtained user attribute data or/and user behavior data contains a new account ID, binding with the current visitor ID is attempted, and when the visitor ID exists and other account IDs are not bound, the binding is successful, and the data identified by the account ID and the visitor ID are stored in a data table corresponding to the same user ID; and when the visitor ID does not exist or is bound by other account IDs, the binding fails and is not carried out any more, and the unbound ID is NULL.
Fig. 3 is a flowchart of data processing steps according to an embodiment of the present application, and as shown in fig. 3, in some embodiments, the data processing step S2 specifically includes:
a structure processing step S21, wherein the message queue performs ETL processing on the data structure according to the corresponding business logic rule by reading and analyzing a configuration file of a configuration center;
and a data entering step S22, wherein the message queue creates a corresponding data table in the storage layer and stores the processed data structure into the data table.
In some embodiments, in the step of entering the data into the table, when the unique field already exists in the corresponding table, the data structure is directly stored in the existing data table.
The embodiments of the present application are described and illustrated below by means of preferred embodiments.
Fig. 4 is a flow chart of a data processing method according to a preferred embodiment of the present application.
In step S301, data to be stored and analyzed is divided into two types, one type is user attribute data, and the other type is user behavior data, i.e., event attribute data, and the two types of data are associated and bound by "# user _ id". It is arranged into a style conforming to the data structure model and sent to the Sync _ log (Message Queue).
Step S302, the Sync _ log reads and analyzes the configuration file of the configuration center, ETL processing is carried out on the data of the data source according to the corresponding business logic rule, meanwhile, a corresponding table is established in a Kudu (storage layer), and warehousing operation is carried out on the result data after the data processing is finished.
Step S303, after the warehousing is finished, providing services to the outside through a corresponding data query and analysis platform, and providing services such as a report system, a dashboard display or other third party platforms to the outside by taking Presto as a query tool.
It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.
The present embodiment also provides a data processing system based on a big data platform, where the system is used to implement the foregoing embodiments and preferred embodiments, and the description of the system that has been already made is omitted. As used hereinafter, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 5 is a block diagram of a data processing system based on a big data platform according to an embodiment of the present application, and as shown in fig. 5, the apparatus includes:
the structure acquisition module 1 is used for acquiring and identifying user attribute data or/and user behavior data according to the unique field and sorting the user attribute data or/and the user behavior data into a data structure which is in accordance with the preset data structure;
the data processing module 2 is used for sending the data structure to a message queue, and the message queue processes the data structure and stores the data structure into a corresponding data table of a storage layer;
and the data operation module 3 adopts a query tool to operate the data table through the data query and analysis platform.
In some of these embodiments, the structure acquisition module 1 comprises:
the data acquisition unit 11 is used for acquiring the user attribute data or/and the user behavior data from a data source;
the field identification unit 12 is used for identifying and outputting an identification condition by adopting a preset identification rule according to the USER attribute data or/and the USER field in the USER behavior data;
and the structure sorting unit 13 sorts the data structure into a preset json data structure according to the identification condition.
In some of these embodiments, the data processing module 2 comprises:
the structure processing unit 21 is used for reading and analyzing a configuration file of a configuration center by the message queue and carrying out ETL processing on the data structure according to a corresponding service logic rule;
and a data entry unit 22, wherein the message queue creates a corresponding data table in the storage layer and stores the processed data structure into the data table.
The above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.
In addition, the data processing method described in the embodiment of the present application with reference to fig. 1 may be implemented by a computer device. Fig. 6 is a hardware structure diagram of a computer device according to an embodiment of the present application.
The computer device may comprise a processor 81 and a memory 82 in which computer program instructions are stored.
Specifically, the processor 81 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
Memory 82 may include, among other things, mass storage for data or instructions. By way of example, and not limitation, memory 82 may include a Hard Disk Drive (Hard Disk Drive, abbreviated to HDD), a floppy Disk Drive, a Solid State Drive (SSD), flash memory, an optical Disk, a magneto-optical Disk, tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 82 may include removable or non-removable (or fixed) media, where appropriate. The memory 82 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 82 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, Memory 82 includes Read-Only Memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), Electrically rewritable ROM (EAROM), or FLASH Memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a Static Random-Access Memory (SRAM) or a Dynamic Random-Access Memory (DRAM), where the DRAM may be a Fast Page Mode Dynamic Random-Access Memory (FPMDRAM), an Extended data output Dynamic Random-Access Memory (EDODRAM), a Synchronous Dynamic Random-Access Memory (SDRAM), and the like.
The memory 82 may be used to store or cache various data files for processing and/or communication use, as well as possible computer program instructions executed by the processor 81.
The processor 81 realizes any one of the data processing methods in the above-described embodiments by reading and executing computer program instructions stored in the memory 82.
In some of these embodiments, the computer device may also include a communication interface 83 and a bus 80. As shown in fig. 6, the processor 81, the memory 82, and the communication interface 83 are connected via the bus 80 to complete communication therebetween.
The communication interface 83 is used for implementing communication between modules, devices, units and/or equipment in the embodiment of the present application. The communication port 83 may also be implemented with other components such as: the data communication is carried out among external equipment, image/data acquisition equipment, a database, external storage, an image/data processing workstation and the like.
Bus 80 includes hardware, software, or both to couple the components of the computer device to each other. Bus 80 includes, but is not limited to, at least one of the following: data Bus (Data Bus), Address Bus (Address Bus), Control Bus (Control Bus), Expansion Bus (Expansion Bus), and Local Bus (Local Bus). By way of example, and not limitation, Bus 80 may include an Accelerated Graphics Port (AGP) or other Graphics Bus, an Enhanced Industry Standard Architecture (EISA) Bus, a Front-Side Bus (FSB), a Hyper Transport (HT) Interconnect, an ISA (ISA) Bus, an InfiniBand (InfiniBand) Interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a microchannel Architecture (MCA) Bus, a PCI (Peripheral Component Interconnect) Bus, a PCI-Express (PCI-X) Bus, a Serial Advanced Technology Attachment (SATA) Bus, a Video Electronics Bus (audio Electronics Association), abbreviated VLB) bus or other suitable bus or a combination of two or more of these. Bus 80 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.
The computer device may execute the identification rule in the embodiment of the present application based on the acquired user attribute data and event attribute data, thereby implementing the data processing method described in conjunction with fig. 1.
In addition, in combination with the data processing method in the foregoing embodiments, the embodiments of the present application may provide a computer-readable storage medium to implement. The computer readable storage medium having stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement any of the data processing methods in the above embodiments.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
According to different actual use scenes and services, the message queue Sync _ log can be replaced or further packaged by a user according to service requirements, and only the function of the message queue needs to be met.
The database storing the results of the data structure processing may also be located in other databases, such as hive.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A data processing method based on a big data platform is characterized by comprising the following steps:
a data structure acquisition step, wherein unique field identification is carried out on user attribute data or/and user behavior data, and the unique field identification is arranged into a preset data structure;
a data processing step, namely sending the data structure to a message queue, wherein the message queue processes the data structure and stores the data structure into a corresponding data table of a storage layer;
and a data operation step, namely, operating the data table by adopting a query tool through a data query and analysis platform.
2. The big data platform-based data processing method according to claim 1, wherein the data structure obtaining step specifically includes:
a data acquisition step, wherein the user attribute data or/and the user behavior data are acquired from a data source;
a field identification step, namely identifying and outputting an identification condition by adopting a preset identification rule according to the USER attribute data or/and the USER field in the USER behavior data;
and a structure arrangement step, namely arranging the data structure into a preset json data structure according to the identification condition.
3. The big data platform-based data processing method according to claim 2, wherein the specific content of the USER field includes a visitor ID, an account ID, and a USER ID, wherein:
the user ID is generated according to either or both of the account ID and the visitor ID;
the three correspond to each other one by one, and the priority of the three is from high to low: user ID > account ID > guest ID.
4. The big data platform-based data processing method according to claim 3, wherein when the user attribute data or/and the user behavior data contains a new account ID, the binding with the current guest ID is attempted, and when the guest ID exists and no other account ID is bound, the binding is successful, and the same user ID is used for the data identified by the account ID and the guest ID; and when the visitor ID does not exist or is bound by other account IDs, the binding fails and is not carried out any more, and the unbound ID is NULL.
5. The big data platform-based data processing method according to claim 1, wherein the data processing step specifically comprises:
a structure processing step, wherein the message queue performs ETL processing on the data structure according to a corresponding business logic rule by reading and analyzing a configuration file of a configuration center;
and a data entry step, wherein the message queue creates a corresponding data table in the storage layer and stores the processed data structure into the data table.
6. The big data platform based data processing method according to claim 5, wherein in the data-in-table step, when the unique field already exists in the corresponding data table, the data structure is directly stored in the existing data table.
7. A data processing system based on big data platform, which applies the data processing method of any one of the above claims 1-6, and is characterized by comprising:
the structure acquisition module is used for carrying out unique field identification on the user attribute data or/and the user behavior data and sorting the unique field identification into a preset data structure;
the data processing module is used for sending the data structure to a message queue, and the message queue is used for processing the data structure and storing the data structure into a corresponding data table of a storage layer;
and the data operation module is used for operating the data table by adopting a query tool through a data query and analysis platform.
8. The big data platform-based data processing system of claim 7, wherein the structure acquisition module comprises:
the data acquisition unit is used for acquiring the user attribute data or/and the user behavior data from a data source;
the field identification unit is used for identifying and outputting an identification condition by adopting a preset identification rule according to the USER attribute data or/and the USER field in the USER behavior data;
and the structure arrangement unit is used for arranging the data structure into a preset json data structure according to the identification condition.
9. The big data platform-based data processing system of claim 8, wherein the data processing module comprises:
the structure processing unit is used for reading and analyzing a configuration file of a configuration center by the message queue and carrying out ETL processing on the data structure according to a corresponding business logic rule;
and the message queue creates a corresponding data table in the storage layer and stores the processed data structure into the data table.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the big data platform based data processing method according to any of claims 1 to 6 when executing the computer program.
CN202011418673.0A 2020-12-07 2020-12-07 Data processing method and system based on big data platform and computer equipment Pending CN112416945A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011418673.0A CN112416945A (en) 2020-12-07 2020-12-07 Data processing method and system based on big data platform and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011418673.0A CN112416945A (en) 2020-12-07 2020-12-07 Data processing method and system based on big data platform and computer equipment

Publications (1)

Publication Number Publication Date
CN112416945A true CN112416945A (en) 2021-02-26

Family

ID=74775112

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011418673.0A Pending CN112416945A (en) 2020-12-07 2020-12-07 Data processing method and system based on big data platform and computer equipment

Country Status (1)

Country Link
CN (1) CN112416945A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113220657A (en) * 2021-05-14 2021-08-06 上海哔哩哔哩科技有限公司 Data processing method and device and computer equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105868252A (en) * 2015-12-22 2016-08-17 乐视网信息技术(北京)股份有限公司 User behavior data processing method and apparatus
CN106959949A (en) * 2016-01-08 2017-07-18 中国科学院声学研究所 A kind of data structured processing method for commending system
CN109816410A (en) * 2017-11-21 2019-05-28 北京奇虎科技有限公司 The analysis method and device of advertisement major product audience
CN110825731A (en) * 2019-09-18 2020-02-21 平安科技(深圳)有限公司 Data storage method and device, electronic equipment and storage medium
CN111292164A (en) * 2020-01-21 2020-06-16 上海风秩科技有限公司 Commodity recommendation method and device, electronic equipment and readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105868252A (en) * 2015-12-22 2016-08-17 乐视网信息技术(北京)股份有限公司 User behavior data processing method and apparatus
CN106959949A (en) * 2016-01-08 2017-07-18 中国科学院声学研究所 A kind of data structured processing method for commending system
CN109816410A (en) * 2017-11-21 2019-05-28 北京奇虎科技有限公司 The analysis method and device of advertisement major product audience
CN110825731A (en) * 2019-09-18 2020-02-21 平安科技(深圳)有限公司 Data storage method and device, electronic equipment and storage medium
CN111292164A (en) * 2020-01-21 2020-06-16 上海风秩科技有限公司 Commodity recommendation method and device, electronic equipment and readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113220657A (en) * 2021-05-14 2021-08-06 上海哔哩哔哩科技有限公司 Data processing method and device and computer equipment

Similar Documents

Publication Publication Date Title
CN108133008B (en) Method, device, equipment and storage medium for processing service data in database
CN108897874B (en) Method and apparatus for processing data
JP6168996B2 (en) Content control method, content control apparatus, and program
WO2019062081A1 (en) Salesman profile formation method, electronic device and computer readable storage medium
CN113220657B (en) Data processing method and device and computer equipment
CN109471893B (en) Network data query method, equipment and computer readable storage medium
CN107977678A (en) Method and apparatus for output information
CN112015806A (en) Method and device for storing data by block chain
CN111385294B (en) Data processing method, system, computer device and storage medium
CN108154024A (en) A kind of data retrieval method, device and electronic equipment
CN112416945A (en) Data processing method and system based on big data platform and computer equipment
CN115827903A (en) Violation detection method and device for media information, electronic equipment and storage medium
CN113297453A (en) Network request response method and device, electronic equipment and storage medium
CN117093619A (en) Rule engine processing method and device, electronic equipment and storage medium
CN117171108A (en) Virtual model mapping method and system
CN112507229A (en) Document recommendation method and system and computer equipment
CN112434012A (en) Front-end multistage condition screening method, system, equipment and storage medium based on React
CN115858322A (en) Log data processing method and device and computer equipment
CN113704365A (en) Method, system, device and storage medium for intelligently dividing data subjects
CN110895582A (en) Data processing method and device
CN109299112B (en) Method and apparatus for processing data
CN111611056A (en) Data processing method and device, computer equipment and storage medium
CN113268598A (en) Event context generation method and device, terminal equipment and storage medium
CN112667682A (en) Data processing method, data processing device, computer equipment and storage medium
CN107832464B (en) Data bleaching method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination