CN111427914A - Data obtaining method and device - Google Patents

Data obtaining method and device Download PDF

Info

Publication number
CN111427914A
CN111427914A CN202010199200.XA CN202010199200A CN111427914A CN 111427914 A CN111427914 A CN 111427914A CN 202010199200 A CN202010199200 A CN 202010199200A CN 111427914 A CN111427914 A CN 111427914A
Authority
CN
China
Prior art keywords
data
content
attribute information
content attribute
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010199200.XA
Other languages
Chinese (zh)
Other versions
CN111427914B (en
Inventor
顾伟
付元宝
王玉东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN202010199200.XA priority Critical patent/CN111427914B/en
Publication of CN111427914A publication Critical patent/CN111427914A/en
Application granted granted Critical
Publication of CN111427914B publication Critical patent/CN111427914B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the invention provides a data acquisition method and a data acquisition device, wherein the method comprises the following steps: acquiring behavior data of a user aiming at content data to be accessed; inquiring whether content attribute information of content data to be accessed is stored in data caching equipment, wherein the data caching equipment is used for storing the content attribute information of hot content data and the content attribute information of newly added content data; if so, obtaining the content attribute information of the content data to be accessed from the data cache equipment; if not, the content attribute information of the content data to be accessed is obtained from a full information base, wherein the full information base is used for storing the content attribute information of each content data stored in the content database. By applying the scheme provided by the embodiment of the invention to obtain data, the data obtaining efficiency can be improved.

Description

Data obtaining method and device
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data obtaining method and apparatus.
Background
In order to provide better and better services to users, service providers typically analyze the behavior of users accessing content data such as tv shows, movies, entertainment programs, etc. to obtain the laws presented therein.
In the prior art, when analyzing the behavior of a user, the behavior of the user is usually analyzed in combination with content attribute information of content data accessed by the user. In order to implement data analysis, a service provider generally stores attribute information of various content data in a special database, so that when content attribute information of content data accessed by a user is required, the content attribute information of the content data accessed by the user can be obtained by querying the special database.
However, as more and more content data can be provided to the user by the service provider, more and more content attribute information is stored in the database, and more time is required for querying the required content attribute information. In addition, since the number of users of the service provider is very large, the number of content attribute information of the content data accessed by the user, which is queried by the server in a short time, is also high. In view of the above, when the manner provided by the prior art is applied to obtain the content attribute information by querying the database, the efficiency is low.
Disclosure of Invention
The embodiment of the invention aims to provide a data acquisition method and a data acquisition device so as to improve the data acquisition efficiency. The specific technical scheme is as follows:
in a first aspect, the present invention provides a data obtaining method, comprising:
acquiring behavior data of a user aiming at content data to be accessed;
inquiring whether content attribute information of the content data to be accessed is stored in a data cache device, wherein the data cache device is used for storing the content attribute information of hot content data and the content attribute information of newly added content data, the hot content data is content data of which the inquiry times are greater than the preset times within a first preset time, and the newly added content data is content data newly added in a content database within a second preset time;
if so, obtaining the content attribute information of the content data to be accessed from the data cache equipment;
if not, obtaining the content attribute information of the content data to be accessed from a full information base, wherein the full information base is used for storing the content attribute information of each content data stored in the content database.
In an embodiment of the present invention, the method further includes:
splicing the obtained content attribute information;
adding the spliced content attribute information to a message queue corresponding to the data analysis service;
and performing data analysis on the obtained content attribute information according to the sequence of the information stored in the message queue.
In an embodiment of the present invention, after obtaining the content attribute information of the content data to be accessed from the full information base, the method further includes:
storing the obtained content attribute information into the data caching device.
In an embodiment of the invention, a validity period of the data stored in the data caching device is a preset third duration.
In a second aspect, the present invention provides a data acquisition apparatus, the apparatus comprising:
the behavior data acquisition module is used for acquiring behavior data of the user aiming at the content data to be accessed;
the information query module is used for querying whether content attribute information of the content data to be accessed is stored in a data cache device, wherein the data cache device is used for storing the content attribute information of hot content data and the content attribute information of newly added content data, the hot content data is content data of which the query times are greater than the preset times within a first preset time length, and the newly added content data is content data newly added to a content database within a second preset time length; if yes, triggering a first information acquisition module; if not, triggering a second information acquisition module;
the first information obtaining module is configured to obtain content attribute information of the content data to be accessed from the data caching device;
the second information obtaining module is configured to obtain content attribute information of the content data to be accessed from a full-size information base, where the full-size information base is configured to store the content attribute information of each content data stored in the content database.
In an embodiment of the present invention, the apparatus further includes:
the information splicing module is used for splicing the obtained content attribute information;
the information adding module is used for adding the spliced content attribute information to a message queue corresponding to a service requested by the data analysis service request;
and the data analysis module is used for carrying out data analysis on the obtained content attribute information according to the sequence of the information stored in the message queue.
In an embodiment of the present invention, the apparatus further includes:
and the information storage module is used for storing the obtained content attribute information into the data caching device after the second information obtaining module.
In an embodiment of the invention, a validity period of the data stored in the data caching device is a preset third duration.
In a third aspect, an embodiment of the present invention provides a server, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;
a memory for storing a computer program;
a processor configured to implement the method steps of the first aspect when executing the program stored in the memory.
In a fourth aspect, the present invention provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the method steps described in the first aspect.
As can be seen from the above, when data is obtained by applying the scheme provided in this embodiment, behavior data of a user for content data to be accessed is obtained, whether content attribute information of the content data to be accessed is stored in a data cache device is queried, and if the content attribute information of the content data to be accessed is queried, the content attribute information is obtained from the data cache device; and if the content attribute information of the content data to be accessed is not inquired, acquiring the content attribute information from the full information base. Because the content attribute information stored in the data cache device is the content attribute information of the popular content data and the content attribute information of the newly added content data, compared with the prior art, the data amount stored in the data cache device is small, the time required for inquiring the content attribute information of the content data to be accessed is short, and the data acquisition efficiency is improved.
When the content attribute information is not searched, the content attribute information is obtained from the full-size information base. The content attribute information stored in the full database is the content attribute information of each content data stored in the content database. Therefore, it is possible to surely obtain the content attribute information for the above-mentioned content data to be accessed.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a first data obtaining method according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a second data obtaining method according to an embodiment of the present invention;
FIG. 3 is a process diagram of a data acquisition method according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a first data obtaining apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a second data acquisition apparatus according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a server according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
When a service provider queries content attribute information from a database, because the technical problem of low data obtaining efficiency exists in the prior art, in order to solve the technical problem, embodiments of the present invention provide a data obtaining method and apparatus.
In one embodiment of the present invention, there is provided a data obtaining method including:
acquiring behavior data of a user aiming at content data to be accessed;
inquiring whether content attribute information of the content data to be accessed is stored in the data cache device, wherein the data cache device is used for storing the content attribute information of hot content data and the content attribute information of newly added content data, the hot content data is the content data of which the inquiry times are greater than the preset times in a first preset time period, and the newly added content data is the newly added content data in a content database in a second preset time period;
if so, obtaining the content attribute information of the content data to be accessed from the data cache equipment;
if not, obtaining the content attribute information of the content data to be accessed from a full information base, wherein the full information base is used for storing the content attribute information of each content data stored in the content database.
As can be seen from the above, when data is obtained by applying the scheme provided in this embodiment, behavior data of a user for content data to be accessed is obtained, whether content attribute information of the content data to be accessed is stored in a data cache device is queried, and if the content attribute information of the content data to be accessed is queried, the content attribute information is obtained from the data cache device; and if the content attribute information of the content data to be accessed is not inquired, acquiring the content attribute information from the full information base. Because the content attribute information stored in the data cache device is the content attribute information of the popular content data and the content attribute information of the newly added content data, compared with the prior art, the data amount stored in the data cache device is small, the time required for inquiring the content attribute information of the content data to be accessed is short, and the data acquisition efficiency is improved.
When the content attribute information is not searched, the content attribute information is obtained from the full-size information base. The content attribute information stored in the full database is the content attribute information of each content data stored in the content database. Therefore, it is possible to surely obtain the content attribute information for the above-mentioned content data to be accessed.
Referring to fig. 1, fig. 1 is a schematic flow chart of a first data obtaining method according to an embodiment of the present invention, where the method includes:
s101, behavior data of the user aiming at the content data to be accessed is obtained.
The behavior data of the user for the content data to be accessed can be understood as follows: the user's operation behavior data for the content data to be accessed, for example: the data of the user clicking the news page and the data of the user playing the video playing page.
The behavior data may carry behavior type data, behavior duration data, an identifier of content data to be accessed, and the like.
Specifically, the behavior data of the user may be obtained through the behavior data of the user recorded by the client. For example: the client can obtain the behavior data of the user when the user accesses the news page in a manner of embedding points and the like.
S102, inquiring whether the content attribute information of the content data to be accessed is stored in the data cache device, if so, executing S103, and if not, executing S104.
The data caching device may be a Remote Dictionary service (Remote Dictionary Server) based electronic device. The Redis is a data storage system, and the query performance of the Redis is better, but the storage capacity is smaller.
The data caching device is used for storing content attribute information of hot content data and content attribute information of newly added content data.
Specifically, the content data may include: movies, television shows, entertainment programs, etc. The content attribute information of the content data may be understood as: information of individual content attributes of the content data.
For example: for movie X, the content attribute information of the video corresponding to movie X may be as shown in table 1 below.
TABLE 1
Duration of time Time of showing Lead actor Director
90min 2019-01-01 Stars A1, Star A2 Director B
In table 1, the "time length, the showing time, the director, and the director" are each content attribute of the movie X, the "90 min" is information that the content attribute is the time length, the "2019-01-01" is information that the content attribute is the showing time, the "star a1, and the" star a2 "are information that the content attribute is the director, and the" director B "is information that the content attribute is the director. Thus, "90 min, 2019-01-01, star a1, star a2, director B" is the content attribute information of movie X.
The hot content data is content data with the query times larger than the preset times within the first preset time length.
The first preset time period may be one day, one week, etc. The preset times may be 1000 times, etc. For example: assuming that the first preset time duration is one day, specifically from 2019-01-0100: 00 to 2019-01-0200: 00, the preset times are 1000, the content data comprise a television play A and a television play B, and the statistics result shows that the frequency of the television play A is in a range from 2019-01-0100: 00 to 2019-01-0200: the number of queries in 00 is 2000, and the statistics of the obtained television series B is as follows, wherein the number of the queries in 2019-1-100: 00 to 2019-1-200 is as follows: the number of queries in 00 is 900. Since 2000 is larger than 1000,900 and smaller than 1000, drama a is hot content data and drama B is not hot content data.
When the content attribute information of the hot content data is obtained, the content attribute information of the hot content data within the preset first duration may be loaded into the data cache device in a cold loading manner, that is, in an offline loading manner.
The newly added content data is newly added content data in the content database within a second preset time length.
Specifically, the content database is used for storing content data, and the content data in the content database may be added in real time.
Because the newly added content data in the content database is added in real time, in order to avoid the data volume of the newly added content data cached in the data caching device from being excessive, the newly added content is the content data newly added in the content database within the second preset time period.
Specifically, the second preset time period may be a day, a week, and the like. For example: assuming that the second preset time period is one day, specifically 2020-01-0100: 00 to 2020-01-0200: 00, the newly added content data may be newly added content data in the content database within the time period 2020-01-0100: 00 to 2020-01-0200: 00.
Specifically, the data caching device may monitor content data in the preset content database to obtain newly added content data in the preset content database.
Because the probability of the user requesting the content attribute information of the hot content data is high, and because the user is interested in the newly added content data, the probability of the user requesting the content attribute information of the newly added content data is also high, the content attribute information of the hot content data and the content attribute information of the newly added content data stored in the data cache device can meet the requirements of most users.
Specifically, in querying the data cache device, the content attribute information may be determined according to an identifier of the content data to be accessed.
The data caching device stores content attribute information of hot content data and content attribute information of newly added content data. And the user may request to obtain the content attribute information of the cold content data. The cold content data is as follows: and the content data of which the query times do not exceed the preset times in the first preset time length. Therefore, the content attribute information of the content data to be accessed may not be queried in the above-described data caching device. In this case, S104 is executed.
In an embodiment of the present invention, in S102, the validity period of the data stored in the data caching device may be a preset third duration.
Specifically, the preset third time period may be one day, two days, one week, and the like. When the data stored in the data cache device exceeds the preset third time length, the stored data can be discarded and stored again.
In this way, by setting the validity period of the data stored in the data caching device, the data stored in the data caching device can be cleared at regular time, so that the data in the data caching device is the latest content data and the content attribute information of the popular content data, and the data acquisition efficiency is improved.
And S103, obtaining the content attribute information of the content data to be accessed from the data caching device.
And S104, obtaining the content attribute information of the content data to be accessed from the full information base.
The full amount information base may be a distributed data storage system based on an Hbase table, and the full amount information base is used to store content attribute information of each content data stored in the content database.
Specifically, the full-size information base may be synchronized with the content database in real time, and when the content data stored in the content database is updated, the content attribute information of each content data stored in the full-size information base is also updated accordingly. In this way, it is possible to ensure that the content attribute information stored in the total information base is the content attribute information of each piece of content data stored in the content database.
In addition, because the probability that the user generally accesses the hot data or the latest data is high, and the content attribute information of the hot data or the latest data is stored in the data caching device, the possibility of obtaining the content attribute information of the content data to be accessed by the user from the data caching device is high, and compared with the prior art, the data obtaining efficiency is improved.
Since the full-size information base is used for storing the content attribute information of each piece of content data stored in the content database, when the content attribute information of the piece of content data to be accessed is not queried in the data cache device in S102, the content attribute information is stored in the full-size information base, and therefore, the content attribute information of the piece of content data to be accessed can be obtained from the full-size information base.
Specifically, the above-mentioned full information base may also store the content attribute information according to the manner in which the data caching device stores the content attribute information in S102, and therefore, details are not described here again.
Similarly, when the content attribute information is queried from the full database, the content attribute information may be queried in a manner that the data cache device stores the content attribute information in S102. Therefore, the description thereof is omitted.
As can be seen from the above, when data is obtained by applying the scheme provided in this embodiment, behavior data of a user for content data to be accessed is obtained, whether content attribute information of the content data to be accessed is stored in a data cache device is queried, and if the content attribute information of the content data to be accessed is queried, the content attribute information is obtained from the data cache device; and if the content attribute information of the content data to be accessed is not inquired, acquiring the content attribute information from the full information base. Because the content attribute information stored in the data cache device is the content attribute information of the popular content data and the content attribute information of the newly added content data, compared with the prior art, the data amount stored in the data cache device is small, the time required for inquiring the content attribute information of the content data to be accessed is short, and the data acquisition efficiency is improved.
When the content attribute information is not searched, the content attribute information is obtained from the full-size information base. The content attribute information stored in the full database is the content attribute information of each content data stored in the content database. Therefore, it is possible to surely obtain the content attribute information for the above-mentioned content data to be accessed.
In an embodiment of the present invention, after the content attribute information of the content data to be accessed is obtained from the full-size information base in S104, the following steps may be further included.
And storing the obtained content attribute information into the data caching device.
After the content attribute information is obtained from the full information base, the content attribute information obtained from the full information base may need to be used within a certain time, so that the obtained content attribute information may be stored in the data cache device in order to enable the content attribute information to be queried from the data cache device later. Therefore, the data query time can be saved, and the data acquisition efficiency is improved.
In an embodiment of the present invention, after the data caching device obtains the content attribute information of the popular content data and the content attribute information of the newly added content data in S102, the content attribute information may be stored in the following two ways.
In the first mode, the content data to which each piece of content attribute information belongs is classified, and each classification includes content attribute information of the content data corresponding to the classification.
For example: it is assumed that the content data obtained in the data cache device is content data 1 and content data 2, and the content attributes of the content data 1 and the content data 2 include content attribute a, content attribute b, and content attribute c, "XX, YY, and ZZ" as the content attribute information of the content data 1, and "XX, YY, and ZZ" as the content attribute information of the content data 2. The content attribute information may be stored in the storage form shown in table 2 or table 3.
TABLE 2
Figure BDA0002418738190000101
TABLE 3
Figure BDA0002418738190000102
Table 2 stores the content attribute information of the content data 1, and table 3 stores the content attribute information of the content data 2.
In the second way, the content data is classified according to the identifier of each content attribute, and each classification includes the content attribute information of each content data of the content attribute corresponding to the classification.
For example: following the examples described in tables 2 and 3, the content attribute information of the content data 1 and 2 may be stored in the storage systems shown in tables 4, 5, and 6.
TABLE 4
Figure BDA0002418738190000103
TABLE 5
Content attribute b yy-content data 1 YY-content data 2
TABLE 6
Content attribute c zz content data 1 ZZ-content data 2
Table 4 stores information that the content attribute of content data 1 or content data 2 is content attribute a when content attribute a is stored; table 5 stores information that the content attribute of content data 1 or content data 2 is content attribute b when content attribute b is stored; table 6 stores information that the content attribute of content data 1 and content data 2 is content attribute c when content attribute c is stored.
Referring to fig. 2, fig. 2 is a schematic flow chart of a second data acquisition according to an embodiment of the present invention, and after the above S103 or S104, the following S105-S107 may also be included.
And S105, splicing the obtained content attribute information.
The obtained content attribute information may be information of a plurality of content attributes of the content data, and the obtained content attribute information may be spliced to obtain content attribute information of the complete content data.
For example: following the examples corresponding to tables 2 and 3, assuming that the obtained content attribute information is content attribute information xx and content attribute information yy, the obtained content attribute information is spliced, and the spliced content attribute information may be: content attribute information xx (content attribute a information of content data 1) -content attribute information yy (content attribute b information of content data 2), and the content in parentheses after the content attribute information indicates which content attribute information is which content attribute information of which content data.
And S106, adding the spliced content attribute information to a message queue corresponding to the data analysis service.
After the spliced content attribute information is obtained, the obtained content attribute information may be distributed according to data required by each data analysis service, and the spliced content attribute information is added to a message queue corresponding to the data analysis service.
Specifically, the spliced content attribute information may be obtained according to a time sequence, for example: and sequentially adding the spliced content attribute information to a message queue corresponding to the data analysis service according to the query time, the splicing time and the like.
And S107, performing data analysis on the obtained content attribute information according to the sequence of the information stored in the message queue.
In this way, the obtained content attribute information is added to the message queue corresponding to the data analysis service, and each service can perform data analysis on the obtained content attribute information according to the content attribute information stored in the message queue.
In addition, as can be seen from the above steps S101 to S107, the execution main body according to the embodiment of the present invention not only receives behavior data of a user, that is, not only receives user traffic, but also performs operations such as query, concatenation, and splitting of content attribute information with respect to the received user traffic, so that the execution main body according to the embodiment of the present invention is considered to function as a traffic proxy, and thus may be referred to as a traffic proxy layer.
Especially, under the condition that the quantity of the content attribute information is large, the flow proxy layer can share the work of splicing the content attribute information and shunting the flow, and does not need the data cache equipment and a full database to splice the content attribute information, so that the data acquisition efficiency is improved.
The data acquisition method provided by the embodiment of the invention is described in detail by using a specific example.
Referring to fig. 3, fig. 3 is a process block diagram of a data obtaining method according to an embodiment of the present invention.
In fig. 3, a Redis caching layer, an Hbase table, a content database, a traffic proxy layer, behavior data of a user, and a message queue 1, a message queue 2, and a message queue n … … are included.
Wherein the content database is used for storing content data. The Redis cache layer and the Hbase table may obtain content attribute information of the content data from the content database.
The Redis cache layer is used for storing content attribute information of hot content data and content attribute information of newly added content data in the content data. The Redis cache layer is the data cache device in S102.
The Hbase table is for storing content attribute information of each content data stored in the content database. The Hbase table is the total amount database in S104.
The behavior data of the user is the behavior data of the user aiming at the content data to be accessed.
The message queue 1, the message queue 2, and the message queue … … correspond to a data analysis service, and are configured to perform data analysis on the stored content attribute information according to the content attribute information stored in the message queue.
The traffic proxy layer is an execution subject of the embodiment of the present invention. After acquiring behavior data of a user aiming at content data to be accessed, a flow proxy layer queries content attribute information of the content data to be accessed in a Redis cache layer, if the content attribute information of the content data to be accessed is not queried, the content attribute information used for the content data to be accessed is acquired in an Hbase table, the acquired content attribute information is spliced, and the spliced content attribute information is added to each message queue.
Corresponding to the data obtaining method, the embodiment of the invention also provides a data obtaining device.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a first data obtaining apparatus according to an embodiment of the present invention, where the apparatus includes 401 and 404.
A behavior data obtaining module 401, configured to obtain behavior data of a user for content data to be accessed;
an information query module 402, configured to query whether content attribute information of content data to be accessed is stored in a data cache device, where the data cache device is configured to store content attribute information of hot content data and content attribute information of newly added content data, the hot content data is content data whose query frequency is greater than a preset frequency within a first preset duration, and the newly added content data is content data newly added to a content database within a second preset duration; if yes, triggering a first information acquisition module; if not, triggering a second information acquisition module;
a first information obtaining module 403, configured to obtain content attribute information of content data to be accessed from the data caching device;
a second information obtaining module 404, configured to obtain content attribute information of the content data to be accessed from a full-size information base, where the full-size information base is used to store content attribute information of each content data stored in the content database.
As can be seen from the above, when data is obtained by applying the scheme provided in this embodiment, behavior data of a user for content data to be accessed is obtained, whether content attribute information of the content data to be accessed is stored in a data cache device is queried, and if the content attribute information of the content data to be accessed is queried, the content attribute information is obtained from the data cache device; and if the content attribute information of the content data to be accessed is not inquired, acquiring the content attribute information from the full information base. Because the content attribute information stored in the data cache device is the content attribute information of the popular content data and the content attribute information of the newly added content data, compared with the prior art, the data amount stored in the data cache device is small, the time required for inquiring the content attribute information of the content data to be accessed is short, and the data acquisition efficiency is improved.
When the content attribute information is not searched, the content attribute information is obtained from the full-size information base. The content attribute information stored in the full database is the content attribute information of each content data stored in the content database. Therefore, it is possible to surely obtain the content attribute information for the above-mentioned content data to be accessed.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a second data obtaining apparatus according to an embodiment of the present invention, where the apparatus further includes 405 and 407.
An information splicing module 405, configured to splice the obtained content attribute information;
an information adding module 406, configured to add the spliced content attribute information to a message queue corresponding to a service requested by the data analysis service request;
and the data analysis module 407 is configured to perform data analysis according to the sequence of the information stored in the message queue and the obtained content attribute information.
In this way, the obtained content attribute information is added to the message queue corresponding to the service for data analysis, and each service can perform data analysis on the obtained content attribute information according to the content attribute information stored in the message queue.
In an embodiment of the present invention, the apparatus further includes: an information storage module, configured to store the obtained content attribute information into the data caching device after the second information obtaining module 404.
After the content attribute information is obtained from the full information base, the content attribute information obtained from the full information base may need to be used within a certain time, so that the obtained content attribute information may be stored in the data cache device in order to enable the content attribute information to be queried from the data cache device later. Therefore, the data query time can be saved, and the data acquisition efficiency is improved.
In an embodiment of the invention, a validity period of the data stored in the data caching device is a preset third duration.
In this way, by setting the validity period of the data stored in the data caching device, the data stored in the data caching device can be cleared at regular time, so that the data in the data caching device is the latest content data and the content attribute information of the popular content data, and the data acquisition efficiency is improved.
Corresponding to the data obtaining method, the embodiment of the invention also provides a server.
Fig. 6 is a schematic structural diagram of a server according to an embodiment of the present invention, and as shown in fig. 6, fig. 6 is a schematic structural diagram of a server according to an embodiment of the present invention, including a processor 601, a communication interface 602, a memory 603, and a communication bus 604, where the processor 601, the communication interface 602, and the memory 603 complete mutual communication through the communication bus 604,
a memory 603 for storing a computer program;
the processor 601 is configured to implement the data obtaining method provided in the embodiment of the present invention when executing the program stored in the memory 603.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
In still another embodiment provided by the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the data obtaining method provided by the embodiment of the present invention.
In another embodiment provided by the present invention, a computer program product containing instructions is also provided, which when executed on a computer causes the computer to implement the data obtaining method provided by the embodiment of the present invention.
As can be seen from the above, when data is obtained by applying the scheme provided in this embodiment, behavior data of a user for content data to be accessed is obtained, whether content attribute information of the content data to be accessed is stored in a data cache device is queried, and if the content attribute information of the content data to be accessed is queried, the content attribute information is obtained from the data cache device; and if the content attribute information of the content data to be accessed is not inquired, acquiring the content attribute information from the full information base. Because the content attribute information stored in the data cache device is the content attribute information of the popular content data and the content attribute information of the newly added content data, compared with the prior art, the data amount stored in the data cache device is small, the time required for inquiring the content attribute information of the content data to be accessed is short, and the data acquisition efficiency is improved.
When the content attribute information is not searched, the content attribute information is obtained from the full-size information base. The content attribute information stored in the full database is the content attribute information of each content data stored in the content database. Therefore, it is possible to surely obtain the content attribute information for the above-mentioned content data to be accessed.
The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, e.g., from one website site, computer, server, or data center via a wired (e.g., coaxial cable, optical fiber, digital subscriber line (DS L)) or wireless (e.g., infrared, wireless, microwave, etc.) manner to another website site, computer, server, or data center.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the term "includes
"comprises," "comprising," or any other variation thereof, is intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the embodiments of the apparatus, the server, and the computer-readable storage medium, since they are substantially similar to the embodiments of the method, the description is simple, and in relation to the embodiments, reference may be made to the partial description of the embodiments of the method.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. A method of data acquisition, the method comprising:
acquiring behavior data of a user aiming at content data to be accessed;
inquiring whether content attribute information of the content data to be accessed is stored in a data cache device, wherein the data cache device is used for storing the content attribute information of hot content data and the content attribute information of newly added content data, the hot content data is content data of which the inquiry times are greater than the preset times within a first preset time, and the newly added content data is content data newly added in a content database within a second preset time;
if so, obtaining the content attribute information of the content data to be accessed from the data cache equipment;
if not, obtaining the content attribute information of the content data to be accessed from a full information base, wherein the full information base is used for storing the content attribute information of each content data stored in the content database.
2. The method of claim 1, further comprising:
splicing the obtained content attribute information;
adding the spliced content attribute information to a message queue corresponding to the data analysis service;
and performing data analysis on the obtained content attribute information according to the sequence of the information stored in the message queue.
3. The method according to claim 1, further comprising, after the obtaining the content attribute information of the content data to be accessed from the full-size information base:
storing the obtained content attribute information into the data caching device.
4. The method according to any one of claims 1 to 3, wherein the validity period of the stored data in the data caching device is a preset third duration.
5. A data acquisition apparatus, characterized in that the apparatus comprises:
the behavior data acquisition module is used for acquiring behavior data of the user aiming at the content data to be accessed;
the information query module is used for querying whether content attribute information of the content data to be accessed is stored in a data cache device, wherein the data cache device is used for storing the content attribute information of hot content data and the content attribute information of newly added content data, the hot content data is content data of which the query times are greater than the preset times within a first preset time length, and the newly added content data is content data newly added to a content database within a second preset time length; if yes, triggering a first information acquisition module; if not, triggering a second information acquisition module;
the first information obtaining module is configured to obtain content attribute information of the content data to be accessed from the data caching device;
the second information obtaining module is configured to obtain content attribute information of the content data to be accessed from a full-size information base, where the full-size information base is configured to store the content attribute information of each content data stored in the content database.
6. The apparatus of claim 5, further comprising:
the information splicing module is used for splicing the obtained content attribute information;
the information adding module is used for adding the spliced content attribute information to a message queue corresponding to a service requested by the data analysis service request;
and the data analysis module is used for carrying out data analysis on the obtained content attribute information according to the sequence of the information stored in the message queue.
7. The apparatus of claim 5, further comprising:
and the information storage module is used for storing the obtained content attribute information into the data caching device after the second information obtaining module.
8. The apparatus according to any one of claims 5 to 7, wherein the validity period of the stored data in the data caching device is a preset third duration.
9. A server is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing the communication between the processor and the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1 to 4 when executing a program stored in the memory.
10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 4.
CN202010199200.XA 2020-03-20 2020-03-20 Data acquisition method and device Active CN111427914B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010199200.XA CN111427914B (en) 2020-03-20 2020-03-20 Data acquisition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010199200.XA CN111427914B (en) 2020-03-20 2020-03-20 Data acquisition method and device

Publications (2)

Publication Number Publication Date
CN111427914A true CN111427914A (en) 2020-07-17
CN111427914B CN111427914B (en) 2024-04-19

Family

ID=71548272

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010199200.XA Active CN111427914B (en) 2020-03-20 2020-03-20 Data acquisition method and device

Country Status (1)

Country Link
CN (1) CN111427914B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114064985A (en) * 2021-01-19 2022-02-18 广州骏伯网络科技有限公司 Advertisement putting method, device, computer equipment, storage medium and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108132958A (en) * 2016-12-01 2018-06-08 阿里巴巴集团控股有限公司 A kind of multi-level buffer data storage, inquiry, scheduling and processing method and processing device
CN110795457A (en) * 2019-09-24 2020-02-14 苏宁云计算有限公司 Data caching processing method and device, computer equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108132958A (en) * 2016-12-01 2018-06-08 阿里巴巴集团控股有限公司 A kind of multi-level buffer data storage, inquiry, scheduling and processing method and processing device
CN110795457A (en) * 2019-09-24 2020-02-14 苏宁云计算有限公司 Data caching processing method and device, computer equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114064985A (en) * 2021-01-19 2022-02-18 广州骏伯网络科技有限公司 Advertisement putting method, device, computer equipment, storage medium and system

Also Published As

Publication number Publication date
CN111427914B (en) 2024-04-19

Similar Documents

Publication Publication Date Title
US10839038B2 (en) Generating configuration information for obtaining web resources
US8819035B2 (en) Providing search results based on keyword detection in media content
US11394796B2 (en) Dynamic and static data of metadata objects
US8150974B2 (en) Character differentiation system generating session fingerprint using events associated with subscriber ID and session ID
US9264774B2 (en) Seamless multi-channel TV everywhere sign-in
CN110324680B (en) Video pushing method and device, server, client and storage medium
US10158918B2 (en) Bookmarking prospective media content on computer network
CN105163142B (en) A kind of user preference determines method, video recommendation method and system
CN107301215B (en) Search result caching method and device and search method and device
US20120054295A1 (en) Method and apparatus for providing or acquiring the contents of a network resource for a mobile device
US20170078361A1 (en) Method and System for Collecting Digital Media Data and Metadata and Audience Data
US11423096B2 (en) Method and apparatus for outputting information
WO2015096609A1 (en) Method and system for creating inverted index file of video resource
CN110620828A (en) File pushing method, system, device, electronic equipment and medium
CN111488377A (en) Data query method and device, electronic equipment and storage medium
CN106557584A (en) A kind of web site collection method and device
CN110708402A (en) Accessible resource display method and device and resource access system
CN111427914B (en) Data acquisition method and device
CN112069386B (en) Request processing method, device, system, terminal and server
CN111190861B (en) Hot spot file management method, server and computer readable storage medium
CN110460885B (en) Multimedia file playing method and device, server and client equipment
CN112860432A (en) Process management method, device and server
US20140372361A1 (en) Apparatus and method for providing subscriber big data information in cloud computing environment
CN110753268B (en) Page card data generation method and device and electronic equipment
CN112491939A (en) Multimedia resource scheduling method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant