CN103366018B - A kind of micro-blog information grasping means and device - Google Patents

A kind of micro-blog information grasping means and device Download PDF

Info

Publication number
CN103366018B
CN103366018B CN201310334946.7A CN201310334946A CN103366018B CN 103366018 B CN103366018 B CN 103366018B CN 201310334946 A CN201310334946 A CN 201310334946A CN 103366018 B CN103366018 B CN 103366018B
Authority
CN
China
Prior art keywords
user
microblog
grabbing
captured
users
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310334946.7A
Other languages
Chinese (zh)
Other versions
CN103366018A (en
Inventor
韩中腾
崔世起
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
People's Data Management Beijing Co ltd
Original Assignee
PEOPLE SEARCH NETWORK AG
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PEOPLE SEARCH NETWORK AG filed Critical PEOPLE SEARCH NETWORK AG
Priority to CN201310334946.7A priority Critical patent/CN103366018B/en
Publication of CN103366018A publication Critical patent/CN103366018A/en
Application granted granted Critical
Publication of CN103366018B publication Critical patent/CN103366018B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A kind of micro-blog information grasping means and device, methods described include:Microblog users to be captured are obtained, and judge the type of the microblog users to be captured;If the microblog users to be captured are any active ues, the crawl cycle of the microblog users to be captured is calculated, and micro-blog information crawl is carried out according to the crawl period forecasting crawl time point;If the microblog users to be captured are inactive users, then obtain the seized condition and remaining crawl customer volume of the microblog users to be captured, if the seized condition represents that micro-blog information crawl can be carried out, and the remaining crawl customer volume is not zero, then carries out micro-blog information crawl to the microblog users to be captured.The present invention to different types of user by way of being handled differently, the reasonable distribution of crawl resource is realized with using, resource utilization is improved, while also ensuring that each crawl process can grab more micro-blog information, information scratching efficiency is improved.

Description

Microblog information capturing method and device
Technical Field
The invention relates to the technical field of networks, in particular to a microblog information capturing method and device.
Background
With the increasing popularization of microblogs, the amount of microblog users is steadily increasing, and the amount of information contained in microblogs released by tens of millions of users every day is non-trivial. In order to extract news hotspots from numerous microblogs issued by users, or analyze interests of the users according to the microblogs issued by the users to carry out microblog marketing, microblog information issued by the users needs to be timely and comprehensively captured.
The current information capturing process is mainly realized by calling an API (application program interface) of a microblog platform, but due to the consideration of the aspects of maintenance cost, information retention and the like of the microblog platform, the frequency and the frequency of information capturing of each large microblog platform are limited, namely, the capturing resources are limited. How to rapidly acquire more effective microblog information by using the limited grabbing resources has important significance in practical application.
Disclosure of Invention
According to the microblog information capturing method and device, the purpose of obtaining effective microblog information as much as possible by using limited capturing resources is achieved.
Therefore, the embodiment of the invention provides the following technical scheme:
a microblog information capturing method comprises the following steps:
acquiring microblog users to be captured, and judging the types of the microblog users to be captured;
if the microblog user to be captured is an active user, computing a capturing period of the microblog user to be captured, and predicting a capturing time point according to the capturing period to capture microblog information;
and if the microblog user to be captured is an inactive user, capturing the state of the microblog user to be captured and the quantity of the remaining captured users are obtained, and if the capturing state indicates that the microblog information can be captured and the quantity of the remaining captured users is not zero, capturing the microblog information of the microblog user to be captured.
Preferably, the acquiring the microblog user to be captured includes:
selecting at least one authenticated user as a seed user, and adding the seed user as an unprocessed user to a user list;
judging whether the unprocessed user has a subordinate user:
if yes, acquiring a subordinate user of the unprocessed user, adding the subordinate user to the user list, and setting the state of the unprocessed user as processed; taking the subordinate user as an unprocessed user, and continuing to execute the step of judging whether the unprocessed user has the subordinate user;
if not, the status of the unprocessed user is set to processed.
Preferably, the acquiring the subordinate user of the unprocessed user includes:
acquiring the subordinate user through the user relationship network of the unprocessed user; or,
and grabbing comments and/or forwarding the microblog issued by the unprocessed user as the subordinate user.
Preferably, the determining the type of the microblog user to be captured includes:
determining the user activity according to the frequency of the microblog to be captured for releasing the microblog;
judging the type of the microblog user to be captured according to a preset active value and the user activity, and if the user activity is not smaller than the preset active value, judging that the microblog user to be captured is an active user; otherwise, judging that the microblog user to be captured is an inactive user.
Preferably, the determining the user activity according to the frequency of issuing the microblog by the user to be subjected to the microblog grabbing comprises:
calculating the average posting interval of the users according to the microblogs issued by the microblog users to be captured;
and searching for the liveness corresponding to the average posting interval from a preset database.
A microblog information capturing device, the device comprising:
the first acquisition unit is used for acquiring microblog users to be captured;
the first judging unit is used for judging the type of the microblog user to be captured, which is acquired by the first acquiring unit;
the calculating unit is used for calculating the grabbing period of the microblog user to be grabbed when the first judging unit judges that the microblog user to be grabbed is the active user;
the grabbing unit is used for predicting grabbing time points according to the grabbing period to carry out microblog information grabbing;
the second acquisition unit is used for acquiring the grabbing state of the microblog user to be grabbed and the amount of the remaining grabbing users when the first judgment unit judges that the microblog user to be grabbed is the inactive user;
the grabbing unit is further configured to grab the microblog information from the microblog user to be grabbed when the grabbing state indicates that the microblog information grabbing can be performed and the remaining grabbing user amount is not zero.
Preferably, the first acquiring unit includes:
the system comprises a selecting unit, a judging unit and a judging unit, wherein the selecting unit is used for selecting at least one authenticated user as a seed user and adding the seed user as an unprocessed user to a user list;
a second judging unit configured to judge whether the unprocessed user has a subordinate user:
a third acquisition unit configured to acquire a subordinate user of the unprocessed user when the second judgment unit judges that the unprocessed user has the subordinate user,
an adding unit configured to add the subordinate user to the user list, and set a state of the unprocessed user as processed; taking the subordinate user as an unprocessed user, and informing the second judging unit to continuously judge whether the unprocessed user has the subordinate user;
a setting unit configured to set a state of the unprocessed user as processed when the second determination unit determines that the unprocessed user does not have a subordinate user.
Preferably, the third obtaining unit is specifically configured to obtain the subordinate user through the user relationship network of the unprocessed user; or,
the third obtaining unit is specifically configured to capture comments and/or forward a user of the microblog issued by the unprocessed user as the subordinate user.
Preferably, the first judging unit includes:
the determining unit is used for determining the user activity according to the frequency of the microblog to be captured for releasing the microblog;
the judging subunit is used for judging the type of the microblog user to be captured according to a preset active value and the user activity, and if the user activity is not smaller than the preset active value, judging that the microblog user to be captured is an active user; otherwise, judging that the microblog user to be captured is an inactive user.
Preferably, the calculation unit includes:
the calculating subunit is used for calculating the average posting interval of the users according to the microblogs issued by the microblog users to be captured;
and the searching unit is used for searching the liveness corresponding to the average posting interval from a preset database.
According to the microblog information capturing method and device, firstly, the microblog users to be captured are excavated as many as possible to serve as the processing objects of the invention, and then the processing objects are classified according to the activity of the processing objects: if the processing object is an active user, the behavior characteristics of the issuing microblog of the processing object are statistically analyzed, and a grabbing period is set according to the behavior characteristics, so that grabbing time points can be predicted by the grabbing period, and targeted information grabbing is performed; and if the processing object is an inactive user, judging whether to capture information according to the current capture state and the current remaining capture user amount. According to the method and the device, through a mode of carrying out differential processing on different types of users, reasonable distribution and use of the captured resources are realized, the resource utilization rate is improved, meanwhile, more microblog information can be captured in each capturing process, and the information capturing efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to the drawings.
FIG. 1 is a flow chart of a microblog information capturing method according to the invention;
FIG. 2 is a flowchart of acquiring a microblog user to be captured in the present invention;
FIG. 3 is a flow chart of determining a user type in the present invention;
FIG. 4 is a flow chart of determining user liveness in the present invention;
FIG. 5 is a schematic diagram of a microblog information capturing device according to the invention;
FIG. 6 is a schematic diagram of a first acquisition unit in the present invention;
FIG. 7 is a schematic diagram of a first determining unit according to the present invention;
fig. 8 is a schematic diagram of a computing unit in the present invention.
Detailed Description
In order to make the technical field of the invention better understand the scheme of the invention, the following detailed description of the embodiments of the invention is provided in conjunction with the accompanying drawings and the implementation mode.
In order to extract news hotspots or analyze user interests, microblog information released by users should be timely and comprehensively captured, and in consideration of the limitation of the number of times and frequency of information capture by each large microblog platform in the prior art, if the information capture is performed in the same way for different types of microblog users, for example, for active users who have behaviors of microblog release, forwarding, comment and the like every day and inactive users who log in microblogs less, the unreasonable allocation and use of captured resources are obviously caused by the fact that the information capture is performed in the same way, and the efficiency of microblog information capture is low. In order to improve the information capturing efficiency and fully utilize limited capturing resources to rapidly and accurately obtain more effective microblog information, the microblog information capturing scheme is provided. According to the scheme, the types of the microblog users to be captured are analyzed, and different processing is performed on the users of different types. The following explains a specific implementation process of the present invention.
Referring to fig. 1, a flowchart illustrating a microblog information capturing method according to the present invention is shown, and may include:
step 101, obtaining microblog users to be captured, and judging the types of the microblog users to be captured.
Considering the limitation of each large microblog platform on information capturing resources every day, if more effective microblog information is captured by using the limited capturing resources, different capturing schemes should be formulated for different types of users.
Firstly, acquiring microblog users to be captured, namely mining the microblog users to determine as many information capturing objects as possible. As an implementation manner for acquiring the microblog user to be captured in this step, the implementation manner may be embodied as a flowchart shown in fig. 2, and may include:
step 201, selecting at least one authenticated user as a seed user, and adding the seed user as an unprocessed user to a user list.
Step 202, judging whether the unprocessed user has a subordinate user, if yes, executing step 203, and if not, executing step 205.
Step 203, obtaining the subordinate user of the unprocessed user, adding the subordinate user to the user list, and setting the state of the unprocessed user as processed.
And step 204, taking the subordinate user as an unprocessed user, and returning to execute the step 202.
Step 205, the status of the unprocessed user is set to processed.
Microblog users can be roughly divided into two types: and authenticating the user and the ordinary user. In order to excavate as many microblog users as possible, the seed users are determined from the authentication users with large influence and complex user relationship network. As an implementation mode for determining the seed users, the seed users can be captured on the page of a microblog celebrity hall, if the first 100 users who influence the ranking or the popularity ranking are used as the seed users, or the authentication users under a certain classification can be captured in a targeted manner according to marketing needs, if a travel product needs to be popularized at present, the authentication users under the travel classification can be captured as the seed users. The present invention may not be limited to the specific manner of determining the seed user from among the authenticated users.
After the seed users are determined, the seed users can be added to a user list as unprocessed users, whether the unprocessed users have subordinate users is judged, and the following processing is performed:
(1) if the unprocessed user does not have the subordinate user, the unprocessed user is indicated as the bottom node, all microblog users directly or indirectly related to the seed user are mined currently, and the state of the unprocessed user can be directly identified as processed.
(2) If the unprocessed user has a lower-level user, it indicates that the unprocessed user is not a bottom-level node, and further performing recursive mining on the basis of the lower-level user, at this time, the following processing may be performed:
a. identifying the status of the unprocessed user as processed;
b. adding the subordinate users of the unprocessed users into a user list;
c. the state of the subordinate user is identified as an unprocessed state so that recursive mining can be continued on the basis of the unprocessed state.
After the 3 processing actions, that is, it is indicated that there are unprocessed users in the user list, the step 202 should be returned to, and when these subordinate users are continuously determined as unprocessed users, whether there are subordinate users or not is determined, and then the difference processing is performed according to the determination result, which is not described herein again.
It should be noted that the present invention provides two implementation manners for acquiring the subordinate users of the unprocessed user, which are explained below separately.
(1) And acquiring the subordinate user through the user relationship network of the unprocessed user.
The user relationship network refers to the relationship between microblog users, and generally adopts a node graph to represent the relationship between the microblog users, wherein nodes represent the microblog users, and connecting lines between the two nodes represent the relationship between the users. In the microblog, the user A can pay attention to and receive the microblog which is released by the user B and interested by the user A, at the moment, the user A is the fan of the user B, and correspondingly, the user B is the attention of the user A.
As an implementation mode for acquiring the user relationship network, the method can be realized by calling the API of the microblog open platform, and the attention list and the fan list of a certain unprocessed user are acquired. Since the users in the interest list and the fan list are mined by the unprocessed user, the users in the interest list and the fan list can be referred to as subordinate users of the unprocessed user.
(2) And grabbing comments and/or forwarding the microblog issued by the unprocessed user as the subordinate user.
Under the condition that the user A and the user B are not concerned or not in fan relation, the user A may also forward and/or comment a microblog issued by the user B, at this time, the user A and the user B can be considered to have an association relation, and under the condition, the user A can also be considered as a subordinate user of the user B. Therefore, as another implementation manner for acquiring the subordinate users, the method can be implemented by capturing and forwarding and/or commenting users who have not processed microblogs issued by the users.
The microblog users mined according to the introduction manner can be regarded as processing objects of the invention, namely microblog users to be captured, and in order to realize the differential processing of the microblog users, the types of the microblog users are identified. The microblog user types in the invention can be divided into active users and inactive users, and the active users occupy a small amount and the inactive users have a large amount. For the two types, the invention provides two different processing modes. For active users, the processing may be performed in step 102, and for inactive users, the processing may be performed in step 103, which will be explained below.
The implementation for determining the user type is not detailed here.
And step 102, if the microblog user to be captured is an active user, computing a capturing period of the microblog user to be captured, and predicting a capturing time point according to the capturing period to capture microblog information.
As described above, the number of active users is small, but the amount of microblog information provided by the active users is large, and according to this characteristic, we can analyze the behavior characteristics of each active user issuing a microblog one by one, set a corresponding capture cycle for the active user according to the behavior characteristics, and then capture information in a targeted manner according to the capture time points predicted by the capture cycle (that is, the time points at which the user may issue a microblog).
It should be noted that the grabbing period determined for the active user may be a fixed period or a variable period.
That is, for a certain active user, the average interval of issuing the microblogs in a unit time (such as hour, day, week, etc.) can be obtained by statistically analyzing the historical microblogs issued by the active user, and a fixed grabbing period is calculated based on the average interval, and the grabbing time point is predicted according to the fixed grabbing period. Wherein, the average interval of issuing the microblogs in unit time can be understood as the behavior characteristic of the user.
Or, for a certain active user, the busy period and the idle period of the user who releases the microblog in unit time (such as hours, days, weeks and the like) can be obtained by statistically analyzing the historical microblog released by the user, different capturing periods are set for the busy period and the idle period, and information capturing is performed in a period-changing manner. If statistics shows that a certain active user frequently issues microblogs during lunch eating time, subway riding time or evening time, the time periods can be defined as busy hours; when the user rarely releases microblogs during working hours at work and rest hours at night, the time periods can be defined as idle periods. Therefore, the behavior characteristic of the user for releasing the microblog in the day is obtained, the capture period of the day can be set according to the behavior characteristic, and the capture time point of the same day in the next week can be predicted by the set capture period to capture the microblog information.
It should be noted that, in the process of determining the grabbing period, factors that may affect the length of the grabbing period at least include: the weight of each historical microblog, the influence of the user (which can be represented by the number of fans and the number of mentions), the quality of the microblog release of the user (which can be represented by the number of forwarded microblogs), the resource capture (which is limited by the capture platform), and the like, and detailed description is not provided herein.
Step 103, if the microblog user to be captured is an inactive user, capturing the state of the microblog user to be captured and the amount of remaining captured users are obtained, and if the capturing state indicates that the microblog information can be captured and the amount of the remaining captured users is not zero, the microblog information is captured for the microblog user to be captured.
As described above, the number of inactive users is large, and the amount of microblog information provided by the part of users is small, and if the information capture is performed according to a certain capture period (fixed period or variable period) in the step 102, not only is the waste of capture resources caused, but also the captured information may be limited, so that another capture scheme for the inactive users is provided.
First, a grabbing interval, e.g. 2 months, is set indicating the current grabbing status of the inactive user. The grasping state of the user during the grasping interval is not grasping, and the grasping state of the user when the grasping interval arrives is grasping possible. For example, when information capture is performed on a certain inactive user at 06.12 (which may be regarded as a capture starting point of the user), when it is determined at 06.13 whether information capture is required for the user, it is known that the microblog information of the user has just been captured on the previous day, at this time, it is not necessary to capture the information again for the moment, that is, the user is not captured in the capture state of 06.13, and by analogy with such a manner of determining day by day (of course, other time units may be determined successively, and the present invention may not be limited), until it is determined at 08.12 that the capture state of the user is available for capture at an interval of 2 months, the next information capture is performed.
Secondly, a grabbing user amount for limiting a grabbing upper limit of each day, namely how many inactive users can be grabbed each day, such as ten million inactive users, is set according to the API authority.
After the two parameters are set, whether information capture can be performed on the microblog user to be captured currently can be judged, and the specific process is as follows: judging whether the grabbing state of the microblog user to be grabbed is grabbed, if so, continuously judging whether the current remaining grabbing user amount is zero, if not, judging that the information grabbing can be carried out on the microblog user to be grabbed, and reducing the remaining grabbing user amount by 1 while carrying out the microblog information grabbing so as to ensure the judgment accuracy of other follow-up inactive users.
That is, for an inactive user, if the capture state is not capturing, or the current remaining capture user amount is zero, no information capture is performed on the inactive user.
It should be noted that the limited amount of the grabbing users may cause that the microblog information of some non-active users whose grabbing states are capable of being grabbed cannot be grabbed normally, and for this reason, a plurality of non-active users may be processed in a staggered manner by setting different grabbing intervals or grabbing starting points, so that the limited grabbing resources may be used to process as many non-active users as possible, and the resource utilization rate and the efficiency of grabbing effective information are improved.
Referring to fig. 3, a flow of determining a user type according to the present invention is shown, which may include:
step 301, determining the user activity according to the frequency of the microblog to be captured for releasing the microblog.
Step 302, judging the type of the microblog user to be captured according to a preset active value and the user activity, and if the user activity is not less than the preset active value, judging that the microblog user to be captured is an active user; otherwise, judging that the microblog user to be captured is an inactive user.
The method mainly determines the activity of the user according to whether the user issues the microblog and the frequency of issuing the microblog, and if the user does not issue the microblog, the user is directly defined as an inactive user; if the user issues the microblog, the liveness of the user is determined according to the frequency of issuing the microblog, and the method can be realized by adopting the process shown in fig. 4, and comprises the following steps:
step 401, calculating an average posting interval of the users according to the microblogs issued by the users waiting to capture the microblogs;
step 402, searching for liveness corresponding to the average posting interval from a preset database.
The embodiment mainly represents the posting frequency of the user through the posting intervals, and further reflects the activity of the user. During specific implementation, a database storing correspondence between posting intervals and liveness can be established, and after the posting intervals of the users are obtained through calculation, the corresponding liveness can be determined through a table look-up mode. It should be noted that, the posting intervals and the liveness can be in one-to-one correspondence, that is, one posting interval corresponds to one liveness; alternatively, the posting interval and the liveness may be many-to-one, that is, a plurality of posting intervals correspond to one liveness, and the liveness may be regarded as an activity level, which is not limited in this disclosure.
After the user activity is obtained, comparing the user activity with a preset activity value, and if the user activity is smaller than the preset activity value, judging that the user is an inactive user; and if the user activity is greater than or equal to the preset activity value, judging that the user is an active user.
Correspondingly, the present invention further provides a microblog information capturing device, referring to fig. 5, which shows a schematic diagram of the microblog information capturing device according to the present invention, and the device may include:
a first obtaining unit 501, configured to obtain a microblog user to be captured;
a first determining unit 502, configured to determine the type of the microblog user to be captured, where the type is obtained by the first obtaining unit;
a calculating unit 503, configured to calculate a capture cycle of the to-be-captured microblog user when the first determining unit determines that the to-be-captured microblog user is an active user;
a fetching unit 504, configured to predict fetching time points according to the fetching cycles to fetch microblog information;
a second obtaining unit 505, configured to obtain, when the first determining unit determines that the microblog user to be captured is an inactive user, a capture state of the microblog user to be captured and a remaining capture user amount;
the grabbing unit 504 is further configured to grab the microblog information from the microblog user to be grabbed when the grabbing state indicates that the microblog information grabbing can be performed, and the amount of the remaining grabbing users is not zero.
Referring to fig. 6, a schematic diagram of a first obtaining unit in the present invention is shown, which may include:
a selecting unit 601, configured to select at least one authenticated user as a seed user, and add the seed user as an unprocessed user to a user list;
a second determining unit 602, configured to determine whether the unprocessed user has a subordinate user:
a third acquiring unit 603 configured to acquire a subordinate user of the unprocessed user when the second judging unit judges that the unprocessed user has a subordinate user,
an adding unit 604, configured to add the subordinate user to the user list, and set the state of the unprocessed user to be processed; taking the subordinate user as an unprocessed user, notifying the second determining unit 602 to continue determining whether the unprocessed user has a subordinate user;
a setting unit 605, configured to set the status of the unprocessed user as processed when the second determination unit determines that the unprocessed user does not have a subordinate user.
The third obtaining unit may obtain the subordinate user in the following two ways: acquiring the subordinate user through the user relationship network of the unprocessed user; or, the user who grabs comments and/or forwards the microblogs issued by the unprocessed user is used as the subordinate user.
Referring to fig. 7, a schematic diagram of a first judging unit in the present invention is shown, which may include:
a determining unit 701, configured to determine user activity according to the frequency of issuing the microblog by the microblog user to be captured;
a determining subunit 702, configured to determine the type of the microblog user to be captured according to a preset active value and the user activity, and if the user activity is not less than the preset active value, determine that the microblog user to be captured is an active user; otherwise, judging that the microblog user to be captured is an inactive user.
Referring to fig. 8, a schematic diagram of a computing unit of the present invention is shown, which may include:
the computation subunit 801 is configured to compute an average posting interval of the users according to the microblogs issued by the users to be subjected to the microblog grabbing;
a searching unit 802, configured to search for an activity corresponding to the average posting interval from a preset database.
The foregoing is merely a preferred embodiment of the invention and is not intended to limit the invention in any manner. Although the present invention has been described with reference to the preferred embodiments, it is not intended to be limited thereto. Those skilled in the art can make numerous possible variations and modifications to the present teachings, or modify equivalent embodiments to equivalent variations, without departing from the scope of the present teachings, using the methods and techniques disclosed above. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention are still within the scope of the protection of the technical solution of the present invention, unless the contents of the technical solution of the present invention are departed.

Claims (6)

1. A microblog information capturing method is characterized by comprising the following steps:
acquiring microblog users to be captured, and judging the types of the microblog users to be captured;
the judging the type of the microblog user to be captured comprises the following steps: calculating the average posting interval of the users according to the microblogs issued by the microblog users to be captured;
searching for liveness corresponding to the average posting interval from a preset database;
if the microblog user to be captured is an active user, computing a capturing period of the microblog user to be captured, and predicting a capturing time point according to the capturing period to capture microblog information;
if the microblog user to be captured is an inactive user, capturing the state of the microblog user to be captured and the quantity of the remaining captured users are obtained, and if the capturing state indicates that the microblog information can be captured and the quantity of the remaining captured users is not zero, capturing the microblog information of the microblog user to be captured;
the acquiring of the microblog user to be captured includes:
selecting at least one authenticated user as a seed user, and adding the seed user as an unprocessed user to a user list;
judging whether the unprocessed user has a subordinate user:
if yes, acquiring a subordinate user of the unprocessed user, adding the subordinate user to the user list, and setting the state of the unprocessed user as processed; taking the subordinate user as an unprocessed user, and continuing to execute the step of judging whether the unprocessed user has the subordinate user;
if not, setting the state of the unprocessed user as processed;
if the microblog user to be captured is an inactive user, capturing a capturing state and a remaining capturing user amount of the microblog user to be captured are obtained, and if the capturing state indicates that the microblog information capturing can be performed and the remaining capturing user amount is not zero, performing microblog information capturing on the microblog user to be captured comprises:
setting a grabbing interval representing the current grabbing state of an inactive user, wherein the grabbing state of the user is not grabbing during the grabbing interval, and the grabbing state of the user is available for grabbing when the grabbing interval is reached;
the grabbing user quantity of the grabbing upper limit of each day is limited according to the API authority setting;
judging whether the grabbing state of the microblog user to be grabbed is available for grabbing, if so, continuing to judge whether the current remaining grabbing user amount is zero, if not, judging that the microblog user to be grabbed can be grabbed, and reducing the remaining grabbing user amount by 1 while grabbing the information;
and setting different grabbing intervals or grabbing starting points for the microblog users to be grabbed, so that the non-active users are staggered.
2. The method of claim 1, wherein said obtaining the subordinate users of the unprocessed user comprises:
acquiring the subordinate user through the user relationship network of the unprocessed user; or,
and grabbing comments and/or forwarding the microblog issued by the unprocessed user as the subordinate user.
3. The method according to claim 1, wherein the determining the type of the microblog user to be captured comprises:
determining the user activity according to the frequency of the microblog to be captured for releasing the microblog;
judging the type of the microblog user to be captured according to a preset active value and the user activity, and if the user activity is not smaller than the preset active value, judging that the microblog user to be captured is an active user; otherwise, judging that the microblog user to be captured is an inactive user.
4. A microblog information capturing device is characterized by comprising:
the first acquisition unit is used for acquiring microblog users to be captured;
the first judging unit is used for judging the type of the microblog user to be captured, which is acquired by the first acquiring unit;
the calculating unit is used for calculating the grabbing period of the microblog user to be grabbed when the first judging unit judges that the microblog user to be grabbed is the active user;
the grabbing unit is used for predicting grabbing time points according to the grabbing period to carry out microblog information grabbing;
the second acquisition unit is used for acquiring the grabbing state of the microblog user to be grabbed and the amount of the remaining grabbing users when the first judgment unit judges that the microblog user to be grabbed is the inactive user;
the grabbing unit is further configured to grab microblog information from the microblog users to be grabbed when the grabbing state indicates that the microblog information grabbing can be performed and the remaining grabbing user amount is not zero;
wherein the first acquisition unit includes:
the system comprises a selecting unit, a judging unit and a judging unit, wherein the selecting unit is used for selecting at least one authenticated user as a seed user and adding the seed user as an unprocessed user to a user list;
a second judging unit configured to judge whether the unprocessed user has a subordinate user:
a third acquisition unit configured to acquire a subordinate user of the unprocessed user when the second judgment unit judges that the unprocessed user has the subordinate user,
an adding unit configured to add the subordinate user to the user list, and set a state of the unprocessed user as processed; taking the subordinate user as an unprocessed user, and informing the second judging unit to continuously judge whether the unprocessed user has the subordinate user;
a setting unit configured to set a state of the unprocessed user as processed when the second determination unit determines that the unprocessed user does not have a subordinate user;
if the microblog user to be captured is an inactive user, capturing a capturing state and a remaining capturing user amount of the microblog user to be captured are obtained, and if the capturing state indicates that the microblog information capturing can be performed and the remaining capturing user amount is not zero, performing microblog information capturing on the microblog user to be captured comprises:
setting a grabbing interval representing the current grabbing state of an inactive user, wherein the grabbing state of the user is not grabbing during the grabbing interval, and the grabbing state of the user is available for grabbing when the grabbing interval is reached;
the grabbing user quantity of the grabbing upper limit of each day is limited according to the API authority setting;
judging whether the grabbing state of the microblog user to be grabbed is available for grabbing, if so, continuing to judge whether the current remaining grabbing user amount is zero, if not, judging that the microblog user to be grabbed can be grabbed, and reducing the remaining grabbing user amount by 1 while grabbing the information;
setting different grabbing intervals or grabbing starting points for the microblog users to be grabbed, so that the non-active users are processed in a staggered mode;
the calculation unit includes:
the calculating subunit is used for calculating the average posting interval of the users according to the microblogs issued by the microblog users to be captured;
and the searching unit is used for searching the liveness corresponding to the average posting interval from a preset database.
5. The apparatus of claim 4,
the third obtaining unit is specifically configured to obtain the subordinate user through the user relationship network of the unprocessed user;
or,
the third obtaining unit is specifically configured to capture comments and/or forward a user of the microblog issued by the unprocessed user as the subordinate user.
6. The apparatus according to claim 4, wherein the first judging unit includes:
the determining unit is used for determining the user activity according to the frequency of the microblog to be captured for releasing the microblog;
the judging subunit is used for judging the type of the microblog user to be captured according to a preset active value and the user activity, and if the user activity is not smaller than the preset active value, judging that the microblog user to be captured is an active user;
otherwise, judging that the microblog user to be captured is an inactive user.
CN201310334946.7A 2013-08-02 2013-08-02 A kind of micro-blog information grasping means and device Active CN103366018B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310334946.7A CN103366018B (en) 2013-08-02 2013-08-02 A kind of micro-blog information grasping means and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310334946.7A CN103366018B (en) 2013-08-02 2013-08-02 A kind of micro-blog information grasping means and device

Publications (2)

Publication Number Publication Date
CN103366018A CN103366018A (en) 2013-10-23
CN103366018B true CN103366018B (en) 2017-11-03

Family

ID=49367359

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310334946.7A Active CN103366018B (en) 2013-08-02 2013-08-02 A kind of micro-blog information grasping means and device

Country Status (1)

Country Link
CN (1) CN103366018B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103605670B (en) * 2013-10-29 2017-03-29 北京奇虎科技有限公司 A kind of method and apparatus for determining the crawl frequency of network resource point
CN104767628A (en) * 2014-01-06 2015-07-08 中兴通讯股份有限公司 User experience quality evaluating method and device
CN104778177A (en) * 2014-01-13 2015-07-15 北大方正集团有限公司 Data processing method and device
CN105243122A (en) * 2015-09-29 2016-01-13 浪潮电子信息产业股份有限公司 Social software based data acquisition method and apparatus
CN106656727B (en) * 2015-10-29 2019-12-10 中国电信股份有限公司 Method and device for processing user information in social network
CN107193828B (en) * 2016-03-14 2021-08-24 百度在线网络技术(北京)有限公司 Novel webpage crawling method and device
CN109857968A (en) * 2019-01-24 2019-06-07 北京亿幕信息技术有限公司 Report form generation method and system

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609460B (en) * 2012-01-13 2015-02-04 中国科学院计算技术研究所 Method and system for microblog data acquisition
CN102663101B (en) * 2012-04-13 2015-10-28 北京交通大学 A kind of user gradation sort algorithm based on Sina's microblogging
CN102708176B (en) * 2012-05-08 2013-12-04 山东大学 Microblog data mining method based on active users
CN103116605B (en) * 2013-01-17 2016-02-10 上海交通大学 A kind of microblog hot event real-time detection method based on monitoring subnet and system
CN103150333B (en) * 2013-01-26 2016-01-13 安徽博约信息科技有限责任公司 Opinion leader identification method in microblog media
CN103150353A (en) * 2013-02-18 2013-06-12 人民搜索网络股份公司 Method and device for acquiring microblog information
CN103150374B (en) * 2013-03-11 2017-02-08 中国科学院信息工程研究所 Method and system for identifying abnormal microblog users

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于新浪微博开放平台的用户数据挖掘;周鑫 等;《中国科技论文在线》;20121109;第1-7页 *

Also Published As

Publication number Publication date
CN103366018A (en) 2013-10-23

Similar Documents

Publication Publication Date Title
CN103366018B (en) A kind of micro-blog information grasping means and device
Senderovich et al. Queue mining–predicting delays in service processes
CN107247651B (en) Cloud computing platform monitoring and early warning method and system
CN104951465B (en) Application recommendation method and device
CN104539514B (en) Information filtering method and device
CN104636232B (en) A kind of performance monitoring apparatus and method of distribution service
JP2017510003A5 (en)
CN104778185B (en) Anomaly sxtructure query language SQL statement determines method and server
JP2012079242A (en) Composite event distribution device, composite event distribution method and composite event distribution program
CN109800204B (en) Data distribution method and related product
CN106936659B (en) Public cloud dial testing method and device
CN104408640B (en) Application software recommends method and device
CN106326339A (en) Task allocating method and device
CN103942197A (en) Data monitoring processing method and device
CN105069029B (en) A kind of real-time ETL system and method
CN109032954A (en) A kind of user's choosing method, device, storage medium and the terminal of A/B test
US20130117275A1 (en) Index monitoring system, index monitoring method and program
CN106411638A (en) Method and system for processing monitoring data in cloud monitoring system
CN104518913B (en) A kind of cloud service method for detecting abnormality based on artificial immunity
CN111782488B (en) Message queue monitoring method, device, electronic equipment and medium
CN107908555B (en) SQL script abnormity detection method and terminal thereof
CN105468726B (en) Data computing method and system based on local computing and distributed computing
CN108255710B (en) Script abnormity detection method and terminal thereof
Huang et al. Learning-aided stochastic network optimization with imperfect state prediction
CN105591980B (en) A kind of bandwidth consumption management method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20191226

Address after: 755000 block B, Zhongguancun Industrial Park, Shapotou District, Zhongwei City, Ningxia Hui Autonomous Region

Patentee after: People's data management (Zhongwei) Co., Ltd

Address before: 100020, Beijing, Chaoyang District, East Third Ring Road, No. 1 global financial center, West Tower, 16 floor

Patentee before: People Search Network AG

TR01 Transfer of patent right
CP03 Change of name, title or address

Address after: 100026 room 370, 3 / F, building 15, 2 Jintai West Road, Chaoyang District, Beijing

Patentee after: People's data management (Beijing) Co.,Ltd.

Address before: 755000 block B, Zhongguancun Industrial Park, Shapotou District, Zhongwei City, Ningxia Hui Autonomous Region

Patentee before: People's data management (Zhongwei) Co.,Ltd.

CP03 Change of name, title or address