CN112039968A - Data processing system - Google Patents

Data processing system Download PDF

Info

Publication number
CN112039968A
CN112039968A CN202010865098.2A CN202010865098A CN112039968A CN 112039968 A CN112039968 A CN 112039968A CN 202010865098 A CN202010865098 A CN 202010865098A CN 112039968 A CN112039968 A CN 112039968A
Authority
CN
China
Prior art keywords
data
user
viewing
audience
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010865098.2A
Other languages
Chinese (zh)
Inventor
王雪京
李伟男
王鑫
苏超
乔立新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Media Group
Original Assignee
China Media Group
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Media Group filed Critical China Media Group
Priority to CN202010865098.2A priority Critical patent/CN112039968A/en
Publication of CN112039968A publication Critical patent/CN112039968A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • H04L67/025Protocols based on web technology, e.g. hypertext transfer protocol [HTTP] for remote control or remote monitoring of applications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)

Abstract

A data processing system, comprising: the system comprises a data calculation scheduling device, a file server, a data analysis engine, an ETL server and a database, wherein the service in the data calculation scheduling device is realized through a springMVC framework and is used for storing collected data to the file server, generating a calculation task according to the collected data, adding the calculation task to a message queue in the data calculation scheduling device, and sending the data to the ETL server and the database; the ETL server is used for processing the data, storing the processing marks to the database and sending the calculation results to an ETL result queue in the data calculation scheduling device; and the data analysis engine analyzes the data in the queue of the data calculation scheduling device and sends the calculation result to a calculation result queue and a database in the data calculation scheduling device. By adopting the scheme in the application, a large amount of data can be processed with high quality, and the data is ensured to be very accurate.

Description

Data processing system
Technical Field
The present application relates to broadcast television technology, and in particular, to a data processing system.
Background
When analyzing and processing television service data of a television station, the prior art generally utilizes RGui statistical analysis software to collect, analyze, mine and display the data. However, since the amount of viewing data of a tv station is very large, the RGui analysis requires a high consumption of server memory, and is likely to cause server abnormality due to improper memory management.
Problems existing in the prior art:
the server is abnormal in the case where the viewing data is very large.
Disclosure of Invention
The embodiment of the application provides a data processing system to solve the technical problem.
An embodiment of the present application provides a data processing system, including: a data calculation scheduling device, a file server, a data analysis engine, an ETL server and a database, wherein,
the service in the data calculation scheduling device is realized through a springMVC framework and is used for storing the acquired data to a file server, generating a calculation task according to the acquired data, adding the calculation task to a message queue in the data calculation scheduling device, and sending the data to an ETL server and a database;
the ETL server is used for processing the data, storing the processing marks to a database and sending the calculation results to an ETL result queue in the data calculation scheduling device;
and the data analysis engine analyzes the data in the queue of the data calculation scheduling device and sends the calculation result to a calculation result queue and a database in the data calculation scheduling device.
The data processing system provided by the embodiment of the application decomposes complex calculation into a plurality of servers, the data calculation scheduling device, the file server, the data analysis engine, the ETL server and the database are used for matching calculation, indexes are distributed to different modules for calculation, centralized processing calculation is not needed, the memory requirement of a single server is reduced, and by adopting the data processing system provided by the embodiment of the application, a large amount of data can be processed with high quality, and the data is very accurate.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a block diagram of a data processing system according to an embodiment of the present disclosure;
FIG. 2 is a schematic structural diagram of a data processing system according to a second embodiment of the present application;
FIG. 3 is a schematic diagram illustrating relationships between indexes analyzed by a television in the second embodiment of the present application;
fig. 4 shows a model diagram of index calculation in the second embodiment of the present application.
Detailed Description
In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following further detailed description of the exemplary embodiments of the present application with reference to the accompanying drawings makes it clear that the described embodiments are only a part of the embodiments of the present application, and are not exhaustive of all embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
Example one
Fig. 1 is a schematic structural diagram of a data processing system according to an embodiment of the present application.
As shown, the data processing system includes: a data calculation scheduling device, a file server, a data analysis engine, an ETL server and a database, wherein,
the service in the data calculation scheduling device is realized through a springMVC framework and is used for storing the acquired data to a file server, generating a calculation task according to the acquired data, adding the calculation task to a message queue in the data calculation scheduling device, and sending the data to an ETL server and a database;
the ETL server is used for processing the data, storing the processing marks to a database and sending the calculation results to an ETL result queue in the data calculation scheduling device;
and the data analysis engine analyzes the data in the queue of the data calculation scheduling device and sends the calculation result to a calculation result queue and a database in the data calculation scheduling device.
The data processing system provided by the embodiment of the application decomposes complex calculation into a plurality of servers, the data calculation scheduling device, the file server, the data analysis engine, the ETL server and the database are used for matching calculation, indexes are distributed to different modules for calculation, centralized processing calculation is not needed, the memory requirement of a single server is reduced, and by adopting the data processing system provided by the embodiment of the application, a large amount of data can be processed with high quality, and the data is very accurate.
In one embodiment, the data computation scheduling apparatus includes:
the data maintenance module is used for collecting viewing data, storing the viewing data to the file server, generating a calculation task according to the collected viewing data and adding the calculation task into a viewing queue in the data calculation scheduling device;
and the audience rating module is used for analyzing audience rating data according to the calculation tasks in the audience rating queue, sending the analyzed audience rating data to an ETL server and a database, and sending the calculation results of the audience rating data in an ETL result queue to a data analysis engine.
In one embodiment, the data analysis engine is further configured to save the stage states to a database during the analysis of the data in the ETL result queue.
In one embodiment, the data computation scheduling apparatus further includes:
and the new media module is used for analyzing new media data related to the new media analysis indexes in the audience data according to the acquired audience data and preset new media analysis indexes, transmitting the analyzed new media data to an ETL (extract transform load) server and a database, and transmitting a new media data calculation result in a new media queue of the data calculation scheduling device to a data analysis engine.
In one embodiment, the data computation scheduling apparatus further includes:
and the comprehensive evaluation module is used for analyzing comprehensive evaluation data related to the comprehensive evaluation analysis indexes in the audience data according to the acquired audience data and preset comprehensive evaluation analysis indexes, transmitting the analyzed comprehensive evaluation data to an ETL server and a database, and transmitting a comprehensive evaluation data calculation result in a comprehensive evaluation queue of the data calculation scheduling device to a data analysis engine.
In one embodiment, the data analysis engine analyzes data in a queue of the data computation scheduling apparatus, including:
cleaning the audience sample data, audience characteristic data, CSM channel table and CSM program list in the queue of the data calculation scheduling device;
and calculating according to the viewing sample data, the viewing characteristic data, the CSM channel table and the CSM program list to obtain index metadata.
In one embodiment, the obtaining of the index metadata by calculation according to the viewing sample data, the viewing characteristic data, the CSM channel table, and the CSM program list includes:
grouping and aggregating the user field uid and the user behavior mid field of the audience sample data, and calculating to obtain user audience behavior data;
time grouping is carried out on the user watching behavior data to obtain data of each minute of the user watching behavior;
aggregating the user field uid according to the viewing characteristic data to obtain user information table data;
connecting the user information table data with the per minute data of the user viewing behaviors to obtain the per minute data of the user viewing behaviors and the weight;
associating the user viewing behavior and the weighted per minute data with a CSM channel list and a CSM program list to obtain channel per minute audience flow detail list data;
the incoming and outgoing situation of the channel per minute user is determined according to the channel per minute audience flow detail table data.
In one embodiment, the obtaining of the index metadata by calculation according to the viewing sample data, the viewing characteristic data, the CSM channel table, and the CSM program list includes:
grouping and aggregating the user field uid and the user behavior mid field of the audience sample data, and calculating to obtain user audience behavior data;
aggregating the user field uid according to the viewing characteristic data to obtain user information table data;
connecting the user information table data with the user viewing behavior data to obtain user viewing behavior and weight data;
determining user time interval viewing information, channel daily viewing information and channel ID according to the user viewing behavior and the weight data;
and calculating the audience rating of the channel in the nine major time periods according to the user time period audience rating information, and calculating the scale of the channel per day audience according to the channel per day audience rating information and the channel ID.
In one embodiment, the obtaining of the index metadata by calculation according to the viewing sample data, the viewing characteristic data, the CSM channel table, and the CSM program list includes:
grouping and aggregating the user field uid and the user behavior mid field of the audience sample data, and calculating to obtain user audience behavior data;
time grouping is carried out on the user watching behavior data to obtain data of each minute of the user watching behavior;
aggregating the user field uid according to the viewing characteristic data to obtain user information table data;
connecting the user information table data with the per minute data of the user viewing behaviors to obtain the per minute data of the user viewing behaviors and the weight;
calculating to obtain channel per minute basic data according to the user viewing behavior, the weighted per minute data and a CSM channel table;
and calculating the per minute audience rating, the audience duration, the per minute audience composition and the per minute inflow and outflow data of the channel according to the per minute basic data of the channel.
In one embodiment, the obtaining of the index metadata by calculation according to the viewing sample data, the viewing characteristic data, the CSM channel table, and the CSM program list includes:
time grouping is carried out on the user watching behavior data to obtain data of each minute of the user watching behavior;
aggregating the user field uid according to the viewing characteristic data to obtain user information table data;
connecting the user information table data with the per minute data of the user viewing behaviors to obtain the per minute data of the user viewing behaviors and the weight;
calculating to obtain channel per minute basic data according to the user viewing behavior, the weighted per minute data and a CSM channel table;
calculating to obtain the daily audience data of the channel according to the channel per minute basic data and the user information table data;
and determining the average watching time length of each day of the channel according to the average watching data of each day of the channel.
Example two
In order to facilitate the implementation of the present application, the embodiments of the present application are described with a specific example.
The data processing system provided by the embodiment of the application can ensure the stability of the system and the efficient service processing capability through the combination of SPRINGMVC + NFS + AMQ + ORACLE on the premise of very large service data, and can recalculate when the service data is inaccurate.
Fig. 2 is a schematic structural diagram of a data processing system according to a second embodiment of the present application.
As shown, the data processing system includes a data computation scheduling module, a file server, a data analysis engine, an ETL server, and a database, wherein,
the data calculation scheduling module comprises a Web system and an MQ queue set, the Web system comprises a UI interface of the data maintenance module, a viewing module, a new media module, a comprehensive evaluation module and other functional modules, and the MQ queue set comprises a viewing queue, a new media queue, a comprehensive evaluation queue, a calculation result queue, an ETL result queue and other queues. The file server includes archived files and temporary files, the data analysis engine may include one or more computing nodes, and the database may employ an ORACLE database.
In order to make the graph more attractive, an e-chart scheme is adopted by a UI of the data maintenance module to visualize data, background management separates front and back data through an Angular technology, and services in the data maintenance module can acquire viewing data files through restful interface services in a springmvc framework.
In order to ensure that the system can stably calculate in the calculating process, an ActiveMQ message queue (comprising a viewing queue, a new media queue, a comprehensive evaluation queue, a basic result queue, an ETL queue and the like) is added to relieve the calculating pressure and ensure the stability of the server, and meanwhile, a large number of calculating steps are separated, so that the memory requirement on the server is reduced.
Because the analysis indexes of the television are very many and the relationship is relatively dependent and mutually influenced, and the query performance is also ensured, the embodiment of the application classifies all the indexes into different modules, wherein the common index is put into a data analysis engine, and each module carries out data deep processing.
These above modules are all implemented using springmvc.
In the embodiment of the application, the original file of the data is collected and then is placed in the file server, and the file server is realized through NFS. The method and the system can realize that a plurality of computing module servers share the same file server, and ensure the consistency of the technical server when using the original data.
The ETL server is also realized through springmvc, and the data after data acquisition is cleaned, so that the data quality is ensured, and the data repeatability is removed.
The ORACLE database is used as the database, the redundancy of some data is ensured, the efficiency of data query is ensured, and complex business calculation is processed based on the relational data.
The business data processing flow comprises the following steps:
1. a user enters raw data (e.g., viewing files) into the system through a web system.
2. And saving the file into nfs (file server) as an archived file through a data calculation scheduling module.
3. And due to large calculation amount, stable service performance is ensured, and the watching calculation task is added into a watching queue.
4. And the viewing module receives the task starting analysis data.
5. And analyzing and sending the analyzed original data to an ORACLE database and an ETL server.
And 6, cleaning processing data by the ETL service, and storing an ETL processing mark in an ORACLE database.
And 7, sending the calculation result message of the ETL to an ETL result queue.
8. The viewership module receives a result queue for the ETL.
9. And the viewing module is sent to the data analysis engine for data analysis.
10. During the analysis, the state of each phase is saved in the ORACLE database.
11. And the data analysis engine sends the calculation result to the calculation result queue.
12. And putting the calculation result into a database for storage.
Fig. 3 is a schematic diagram illustrating a relationship between indexes analyzed by a television in the second embodiment of the present application.
As shown in the figure, after the data is collected and cleaned, four types of original data, namely viewing sample data, viewing characteristic data, a CSM channel table (dictionary) and a CSM program list, can be obtained in the embodiment of the present application. And deriving each intermediate data by carrying out convergent calculation on the original data table to finally obtain index metadata.
For example, the channel per minute inflow and outflow detail index specific algorithm includes:
i. by grouping and aggregating the "viewing sample data" table field, user (uid) and user behavior (mid), the "user viewing behavior" can be calculated,
grouping the users in time to obtain data of 'user watching behaviors per minute',
aggregating the user fields (uid) by "viewing profile" to obtain "user info" table data.
Connecting the user information table with the user viewing behavior per minute data table according to the user (uid) to obtain the user viewing behavior and weight per minute data table
v. after the table of "user viewing behavior and weight per minute" is associated with the "CSM channel table (dictionary code)" and "CSM program list" by a large tag, the table data of "channel per minute viewer flow details" can be obtained.
Channel per minute inflow and outflow details can be observed in real time through a "channel per minute audience flow details" table.
Channel minute audience rating, audience duration, channel minute audience composition, channel minute inflow and outflow data and other indexes: the ' user watching behavior ' can be obtained by calculating ' watching sample data ', then the ' user watching behavior per minute ' is obtained by analyzing, the ' user watching behavior per minute ' is continuously analyzed, the ' user watching behavior and the ' channel per minute ' are obtained, the data of the ' CSM channel table (dictionary code ') is associated, and a ' channel per minute basic data ' table can be obtained by calculating and is supported by each index data.
Channel daily average audience rating duration index: the user information is obtained by analyzing the viewing characteristic data, and the two data are related and can be calculated by the basic data of the channels per minute in the steps.
The channel nine major time segment audience rating index: through the data 'user watching behaviors' and 'user information' acquired in the steps and the CSM channel table, the three tables are associated to obtain a data table 'user watching behaviors and weight', and a 'user time interval watching information' table can be obtained after time aggregation and used as the index data support.
Channel daily audience rating, channel daily audience size, and the like: through the table of user viewing behavior and weight in the process, the daily dynamics is calculated and analyzed to obtain a data table of channel daily pickup information and channel ID which is used as the index data support.
Fig. 4 shows a model diagram of index calculation in the second embodiment of the present application.
As shown in the figure, the audience rating is used as core data, the number of minutes and the arrival rate of the average audience rating can be calculated by grouping, collecting and aggregating the audience rating, and the three types of data are used as service basic data, so that various service indexes are obtained.
Audience rating: ratings examine the proportion of people watching a channel or a program in a particular time period in the population. The index is actually a uniform distribution of the number of viewers and the viewing duration over the length of a specific time period (program). When the audience is locked out as part of the overall population (e.g., 10-14 years old), the audience ratings are known as target audience ratings. It is an important basis for program arrangement and adjustment, and is a main index for program evaluation. The algorithm formula is as follows:
Figure BDA0002649472540000091
Figure BDA0002649472540000092
arrival rate: refers to the total number of people (000) or percentage (%) of the total television population that meet the reach condition in a particular time period, and in one embodiment, the reach condition is "at least 1 minute watched". The algorithm formula is as follows:
Figure BDA0002649472540000093
Figure BDA0002649472540000094
the arrival rate is a longitudinal cumulative index over time, which considers the number (or proportion) of people who watch a certain channel or column (or can be covered by a certain advertising plan) in a specific time period, and reflects the size and the spread of the contacted audiences.
Number of minutes per average person: the ratio of average daily viewing time (minutes) to the overall population of the television audience may be calculated for a particular channel or time period. The algorithm formula is as follows:
Figure BDA0002649472540000095
the number of people-averaged viewing minutes is the average distribution of the total viewing time of the viewing audience to the population of people, not the population of people.
And other indexes can be obtained by calculation according to the connection line relation according to the service calculation model. For example:
market share indicators. The number of people watching a certain channel or a certain program in a specific time period accounts for the percentage of the number of people watching television in the same time period. I.e., the percentage of the audience rating of a certain channel in a certain period of time to the total audience rating of all channels. The calculation formula is as follows:
Figure BDA0002649472540000101
the market share is that the proportion of people watching a certain channel (program) to all people watching television at that time (total audience), and the larger the value, the stronger the market competitiveness of the channel (program) in the time period.
The above calculation is only described by taking some indexes as examples, and other indexes may be provided with corresponding calculation programs according to actual needs, which is not described herein again.
In an implementation manner, in an embodiment of the present application, 10 physical machine servers may be deployed, where 2 (8 cores 16G 500G) are used for a data upload presentation service, 2 (8 cores 16G 500G) original file storage service, 2 (16 cores 64G 500G) message queue service, 2 (16 cores 64G 500G) intermediate result calculation service, and 2 database service.
1. The method comprises the steps that JAVA-WEB services and a front end UI are deployed in 2 physical machines, the 2 physical machines are required to be mutually active and standby, and when one service is down, the other service can ensure the normal provision of the service.
2. The NFS cluster is deployed in 2 physical machines which are respectively a main machine and a standby machine, so that one piece of data is guaranteed to be lost, and the other piece of data is also kept.
3. And deploying AMQ service clusters in 2 physical machines which are mutually active and standby to ensure that the service is continuously provided.
4. 2 intermediate result computing services are deployed, and the intermediate result computing services need high-performance services and support complex computing.
5. 2 database service term stores are deployed.
According to the monitoring system, when the data are inconsistent, the Web monitoring page generates a red alarm. The principle is as follows: when the original data, the ActiveMQ production data, the ActiveMQ consumption group data and the data in Oracle are inconsistent, the data can be judged to be lost or inaccurate. At this point, the system may click on "retrieve data" on the web front end page for data recalculation. The service flow is as follows: and sequentially deleting the data, the monitoring information, the intermediate table data and the Oracle data according to the sequence, then re-executing task acquisition, directly consuming the data by the front end, and manually confirming that the data is not lost, and then considering that the data is successfully recalculated.
The audience data service computing processing system provided by the embodiment of the application has the following advantages:
1. computing stability
The complex calculation is decomposed into a plurality of servers, and the existing RGui technology needs to process the calculation in a centralized way, requires high memory requirement of a single server, and easily causes memory overflow, thereby causing the failure of the calculation. The memory management of Java language is far superior to that of R language, and the stability of calculation is ensured.
2. Data visualization is more beautiful
The embodiment of the application uses the e-chart plug-in, and is more attractive compared with the graph of the RGui.
3. The calculation of the index is clearer
According to the method and the device, a large number of indexes are mutually dependent, the indexes are classified and distributed to different modules for calculation, common indexes are extracted, the calculation and development pressure is reduced, and the calculation process and the method of the whole index are simpler and clearer. These functions RGui cannot be realized simply.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The scheme in the embodiment of the application can be implemented by adopting various computer languages, such as object-oriented programming language Java and transliterated scripting language JavaScript.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A data processing system, comprising: a data calculation scheduling device, a file server, a data analysis engine, an ETL server and a database, wherein,
the service in the data calculation scheduling device is realized through a springMVC framework and is used for storing the acquired data to a file server, generating a calculation task according to the acquired data, adding the calculation task to a message queue in the data calculation scheduling device, and sending the data to an ETL server and a database;
the ETL server is used for processing the data, storing the processing marks to a database and sending the calculation results to an ETL result queue in the data calculation scheduling device;
and the data analysis engine analyzes the data in the queue of the data calculation scheduling device and sends the calculation result to a calculation result queue and a database in the data calculation scheduling device.
2. The data processing system of claim 1, wherein the data computation scheduler comprises:
the data maintenance module is used for collecting viewing data, storing the viewing data to the file server, generating a calculation task according to the collected viewing data and adding the calculation task into a viewing queue in the data calculation scheduling device;
and the audience rating module is used for analyzing audience rating data according to the calculation tasks in the audience rating queue, sending the analyzed audience rating data to an ETL server and a database, and sending the calculation results of the audience rating data in an ETL result queue to a data analysis engine.
3. The data processing system of claim 1, wherein the data analysis engine is further configured to save the stage states to a database during the analysis of the data in the ETL result queue.
4. The data processing system of claim 2, wherein the data computation scheduler further comprises:
and the new media module is used for analyzing new media data related to the new media analysis indexes in the audience data according to the acquired audience data and preset new media analysis indexes, transmitting the analyzed new media data to an ETL (extract transform load) server and a database, and transmitting a new media data calculation result in a new media queue of the data calculation scheduling device to a data analysis engine.
5. The data processing system of claim 2, wherein the data computation scheduler further comprises:
and the comprehensive evaluation module is used for analyzing comprehensive evaluation data related to the comprehensive evaluation analysis indexes in the audience data according to the acquired audience data and preset comprehensive evaluation analysis indexes, transmitting the analyzed comprehensive evaluation data to an ETL server and a database, and transmitting a comprehensive evaluation data calculation result in a comprehensive evaluation queue of the data calculation scheduling device to a data analysis engine.
6. The data processing system of claim 1, wherein the data analysis engine analyzes the data in the queue of the data computation scheduler, comprising:
cleaning the audience sample data, audience characteristic data, CSM channel table and CSM program list in the queue of the data calculation scheduling device;
and calculating according to the viewing sample data, the viewing characteristic data, the CSM channel table and the CSM program list to obtain index metadata.
7. The data processing system of claim 6, wherein the index metadata is computed from the viewing sample data, the viewing profile data, the CSM channel table, and the CSM program guide, and comprises:
grouping and aggregating the user field uid and the user behavior mid field of the audience sample data, and calculating to obtain user audience behavior data;
time grouping is carried out on the user watching behavior data to obtain data of each minute of the user watching behavior;
aggregating the user field uid according to the viewing characteristic data to obtain user information table data;
connecting the user information table data with the per minute data of the user viewing behaviors to obtain the per minute data of the user viewing behaviors and the weight;
associating the user viewing behavior and the weighted per minute data with a CSM channel list and a CSM program list to obtain channel per minute audience flow detail list data;
the incoming and outgoing situation of the channel per minute user is determined according to the channel per minute audience flow detail table data.
8. The data processing system of claim 6, wherein the index metadata is computed from the viewing sample data, the viewing profile data, the CSM channel table, and the CSM program guide, and comprises:
grouping and aggregating the user field uid and the user behavior mid field of the audience sample data, and calculating to obtain user audience behavior data;
aggregating the user field uid according to the viewing characteristic data to obtain user information table data;
connecting the user information table data with the user viewing behavior data to obtain user viewing behavior and weight data;
determining user time interval viewing information, channel daily viewing information and channel ID according to the user viewing behavior and the weight data;
and calculating the audience rating of the channel in the nine major time periods according to the user time period audience rating information, and calculating the scale of the channel per day audience according to the channel per day audience rating information and the channel ID.
9. The data processing system of claim 6, wherein the index metadata is computed from the viewing sample data, the viewing profile data, the CSM channel table, and the CSM program guide, and comprises:
grouping and aggregating the user field uid and the user behavior mid field of the audience sample data, and calculating to obtain user audience behavior data;
time grouping is carried out on the user watching behavior data to obtain data of each minute of the user watching behavior;
aggregating the user field uid according to the viewing characteristic data to obtain user information table data;
connecting the user information table data with the per minute data of the user viewing behaviors to obtain the per minute data of the user viewing behaviors and the weight;
calculating to obtain channel per minute basic data according to the user viewing behavior, the weighted per minute data and a CSM channel table;
and calculating the per minute audience rating, the audience duration, the per minute audience composition and the per minute inflow and outflow data of the channel according to the per minute basic data of the channel.
10. The data processing system of claim 1, wherein the index metadata is computed from the viewing sample data, the viewing profile data, the CSM channel table, and the CSM program guide, and comprises:
time grouping is carried out on the user watching behavior data to obtain data of each minute of the user watching behavior;
aggregating the user field uid according to the viewing characteristic data to obtain user information table data;
connecting the user information table data with the per minute data of the user viewing behaviors to obtain the per minute data of the user viewing behaviors and the weight;
calculating to obtain channel per minute basic data according to the user viewing behavior, the weighted per minute data and a CSM channel table;
calculating to obtain the daily audience data of the channel according to the channel per minute basic data and the user information table data;
and determining the average watching time length of each day of the channel according to the average watching data of each day of the channel.
CN202010865098.2A 2020-08-25 2020-08-25 Data processing system Pending CN112039968A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010865098.2A CN112039968A (en) 2020-08-25 2020-08-25 Data processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010865098.2A CN112039968A (en) 2020-08-25 2020-08-25 Data processing system

Publications (1)

Publication Number Publication Date
CN112039968A true CN112039968A (en) 2020-12-04

Family

ID=73581425

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010865098.2A Pending CN112039968A (en) 2020-08-25 2020-08-25 Data processing system

Country Status (1)

Country Link
CN (1) CN112039968A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113408912A (en) * 2021-06-23 2021-09-17 中央广播电视总台 Auditing system and electronic device for television station

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120180071A1 (en) * 2010-10-11 2012-07-12 Hsbc Technologies Inc. Computer architecture and process for application processing engine
US20140201768A1 (en) * 2013-01-17 2014-07-17 Industrial Technology Research Institute System and method for recording and analyzing audience information of audio/video program
CN105187559A (en) * 2015-09-30 2015-12-23 成都智信电子技术有限公司 Data fusion governance system
CN108021621A (en) * 2017-11-15 2018-05-11 平安科技(深圳)有限公司 Database data acquisition method, application server and computer-readable recording medium
CN109726074A (en) * 2018-08-31 2019-05-07 网联清算有限公司 Log processing method, device, computer equipment and storage medium
CN109753531A (en) * 2018-12-26 2019-05-14 深圳市麦谷科技有限公司 A kind of big data statistical method, system, computer equipment and storage medium
CN110413701A (en) * 2019-08-08 2019-11-05 江苏满运软件科技有限公司 Distributed data base storage method, system, equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120180071A1 (en) * 2010-10-11 2012-07-12 Hsbc Technologies Inc. Computer architecture and process for application processing engine
US20140201768A1 (en) * 2013-01-17 2014-07-17 Industrial Technology Research Institute System and method for recording and analyzing audience information of audio/video program
CN105187559A (en) * 2015-09-30 2015-12-23 成都智信电子技术有限公司 Data fusion governance system
CN108021621A (en) * 2017-11-15 2018-05-11 平安科技(深圳)有限公司 Database data acquisition method, application server and computer-readable recording medium
CN109726074A (en) * 2018-08-31 2019-05-07 网联清算有限公司 Log processing method, device, computer equipment and storage medium
CN109753531A (en) * 2018-12-26 2019-05-14 深圳市麦谷科技有限公司 A kind of big data statistical method, system, computer equipment and storage medium
CN110413701A (en) * 2019-08-08 2019-11-05 江苏满运软件科技有限公司 Distributed data base storage method, system, equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
彭宇涛: "广西广电网络公司机顶盒用户行为分析***", 《中国有线电视》 *
陈观林等: "收视率分析预测***的设计与实现", 《中国有线电视》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113408912A (en) * 2021-06-23 2021-09-17 中央广播电视总台 Auditing system and electronic device for television station
CN113408912B (en) * 2021-06-23 2023-12-19 中央广播电视总台 Audit system for television station and electronic equipment

Similar Documents

Publication Publication Date Title
US10331156B2 (en) System and method for big data geographic information system discovery
EP3570241A1 (en) Advertisement slot optimization system and advertisement slot optimization method in broadcast media such as television
KR102378855B1 (en) Methods and apparatus to estimate demographics of users employing social media
CN110784419A (en) Method and system for visualizing professional data of railway electric affairs
CN110647512B (en) Data storage and analysis method, device, equipment and readable medium
CN111339071A (en) Method and device for processing multi-source heterogeneous data
CN106375149A (en) Auto associating and analyzing cloud computing monitor apparatus and method
CN112612768B (en) Model training method and device
CN108108491A (en) A kind of recommendation method and device of multi-medium data
CN108521582B (en) Copyright video full-network viewing record system based on block chain technology
CN106168956A (en) data statistical analysis method and system for intelligent terminal
CN104811810A (en) Real-time regional audience rating and audience share statistical system based on intelligent television and method thereof
CN105871940A (en) Information recommending method and system
CN110909061A (en) Data source processing method and device, electronic equipment and storage medium
CN112039968A (en) Data processing system
CN111339357A (en) Recommendation method and device based on live user behaviors
CN112579691B (en) Data processing method and device supporting large-screen display
CN111569412B (en) Cloud game resource scheduling method and device
US20180249212A1 (en) Viewing log analysis device, method, and storage medium
CN110941536B (en) Monitoring method and system, and first server cluster
CN112561636A (en) Recommendation method, recommendation device, terminal equipment and medium
CN104702981B (en) A kind of method and system calculating DTV target audience
CN103945239B (en) Real-time recording and analyzing system and method for video and audio program viewing information
CN104268189B (en) Evaluate the method and device of application
CN116468011A (en) Report generation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201204