CN106681992B - Method and device for managing website login information - Google Patents

Method and device for managing website login information Download PDF

Info

Publication number
CN106681992B
CN106681992B CN201510745533.7A CN201510745533A CN106681992B CN 106681992 B CN106681992 B CN 106681992B CN 201510745533 A CN201510745533 A CN 201510745533A CN 106681992 B CN106681992 B CN 106681992B
Authority
CN
China
Prior art keywords
login information
time threshold
preset time
invalid
failure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510745533.7A
Other languages
Chinese (zh)
Other versions
CN106681992A (en
Inventor
崔志伸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201510745533.7A priority Critical patent/CN106681992B/en
Publication of CN106681992A publication Critical patent/CN106681992A/en
Application granted granted Critical
Publication of CN106681992B publication Critical patent/CN106681992B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a method and a device for managing website login information, relates to the technical field of internet, and can solve the problem that in the prior art, when a crawler program determines that certain login information is invalid, the certain login information is discarded, and then the discarded login information needs to be manually processed, so that the efficiency of managing the website login information is low. The method mainly comprises the following steps: acquiring locally stored invalid login information; judging whether the failure duration of the login information is greater than a preset time threshold corresponding to the login information; and if the failure duration is greater than the preset time threshold, recovering the login information into effective login information. The method is mainly suitable for a scene that the crawler program crawls the webpage through the login certificate.

Description

Method and device for managing website login information
Technical Field
The invention relates to the technical field of internet, in particular to a method and a device for managing website login information.
Background
The web crawler is a program for automatically capturing the information of the world wide web according to a certain rule. In practical applications, when a crawler program crawls various websites, the crawler program often encounters websites which require login credentials to have right to crawl the content of the web page. In this case, before the crawler crawls the website, login information (including a login account and a password) is sent to the website server; after receiving the login information, the website server verifies the login information through a verification rule; and if the verification is passed, feeding back a login certificate to the crawler program so that the crawler program crawls the webpage content on the website through the login certificate. Therefore, the login information is an important condition for the crawler program to obtain the login credentials.
However, in practical applications, the login credentials are often unavailable due to failure of the login information used. The failure of login information is mainly divided into three cases: (1) permanent failure; (2) the login information can be recovered to be normally used after the failure within a certain time period; (3) the request for login credentials fails, either by network or other reasons, and is mistaken by the crawler as a failure of the login information. When the crawler program learns that certain login information is invalid, the login information is discarded, whether the discarded login information can be reused or not is judged manually, and if the discarded login information can be reused, the reusable login information is added into the crawler program. Therefore, the whole process of managing the invalid login information is complex in operation and needs manual participation in processing, so that the efficiency of managing the website login information is low.
Disclosure of Invention
In view of the above technical problems, the present invention provides a method and an apparatus for managing website login information, which can solve the problem in the prior art that when a crawler determines that certain login information is invalid, the certain login information is discarded, and then the discarded login information needs to be manually processed, so that efficiency in managing website login information is low.
In one aspect, the present invention provides a method for managing website login information, wherein the method comprises:
acquiring locally stored invalid login information;
judging whether the failure duration of the login information is greater than a preset time threshold corresponding to the login information;
and if the failure duration is greater than the preset time threshold, recovering the login information into effective login information.
In another aspect, the present invention provides an apparatus for managing website login information, the apparatus comprising:
the acquisition unit is used for acquiring locally stored invalid login information;
the judging unit is used for judging whether the failure duration of the login information acquired by the acquiring unit is greater than a preset time threshold corresponding to the login information;
and the recovery unit is used for recovering the login information into effective login information when the judgment result of the judgment unit is that the failure duration is greater than the preset time threshold.
By means of the technical scheme, the method and the device for managing the website login information can store certain login information in the local after the crawler program determines that the login information is invalid, detect the login information, judge whether the invalid time of the login information is larger than a preset time threshold corresponding to the login information or not, and restore the login information into effective login information when the invalid time is larger than the preset time threshold. In the whole process of recovering the validity of the login information, the crawler program does not need to discard the invalid login information or manually participate in processing the invalid login information, so that the efficiency of managing the website login information is improved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a flowchart illustrating a method for managing website login information according to an embodiment of the present invention;
FIG. 2 is a block diagram illustrating an apparatus for managing website login information according to an embodiment of the present invention;
fig. 3 is a block diagram illustrating another apparatus for managing website login information according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The embodiment of the invention provides a method for managing website login information, which comprises the following steps of:
101. and acquiring locally stored invalid login information.
Specifically, when the crawler program cannot obtain the login credentials using certain login information, it is determined that the login information is invalid, and at this time, the crawler program locally stores the invalid login information and a time point (i.e., an initial invalid time point) when the login information is determined to be invalid. When judging whether the login information is invalid login information, whether the login information is invalid login information can be directly judged according to whether the login information has a corresponding initial invalid time point.
In addition, in order to improve the efficiency of acquiring the invalid login information, when the crawler program determines that a certain login information is the invalid login information, a failure flag may be added to the certain login information, so that whether the currently acquired login information is the invalid login information can be quickly determined according to the failure flag.
It should be noted that, in general, the login information used for logging in the website is a login account and a password, and therefore, the login information related in the embodiment of the present invention is mainly the login account and the password.
102. And judging whether the failure duration of the login information is greater than a preset time threshold corresponding to the login information.
The preset time threshold is a recovery period used for judging whether the invalid login information can recover the validity. In practical applications, the time required by each website to restore the validity of the failed login information is often different, so the preset time threshold involved in this step may be different.
Specifically, a recovery period correspondence table may be locally stored, and the recovery period correspondence table may record at least a correspondence between the login information and the preset time threshold, and may also record information such as a website address of the corresponding website. When the acquired failed login information needs to be judged, the preset time threshold corresponding to the failed login information can be searched from the locally stored recovery period corresponding table, and then judgment is performed according to the preset time threshold.
In addition, in step 101, when recording the logging information that is invalid, the initial invalid time point of the logging information is also recorded, so that the invalid time duration of the logging information can be obtained according to the initial invalid time point, and the invalid time duration is compared with the preset time threshold corresponding to the logging information, so as to determine whether the logging information can be restored to validity.
It should be noted that, in most cases, the crawler developer does not know the actual recovery period set by each website, and therefore the preset time threshold mentioned in this step may be empirically calculated.
103. And if the failure duration is greater than the preset time threshold, recovering the login information into effective login information.
And when the judgment result shows that the failure duration of the currently detected login information is greater than the preset time threshold, the validity of the login information is recovered on the website server side, which indicates that the login credential can be successfully applied by using the login information. However, since the crawler program side still sees that the login information is invalid login information, in order for the crawler program to determine that the login information is valid login information, the crawler program side needs to restore the login information to valid login information, that is, the crawler program side sees that the login information is valid login information. There are various methods for the crawler to determine that the login information is valid login information. For example, a valid flag is added to the login information to identify that the login information is valid.
In addition, in practical application, whether the locally stored logout information can recover the validity can be detected in real time, and whether the locally stored logout information can recover the validity can also be detected at regular time.
The method for managing the website login information provided by the embodiment of the invention can store a certain login information in a local place after the crawler program determines that the login information is invalid, detect the login information, judge whether the invalid time of the login information is greater than a preset time threshold corresponding to the login information or not, and recover the login information into effective login information when the invalid time is greater than the preset time threshold. In the whole process of recovering the validity of the login information, the crawler program does not need to discard the invalid login information or manually participate in processing the invalid login information, so that the efficiency of managing the website login information is improved.
Further, as mentioned in step 101, when the crawler determines that a certain login information is invalid, an invalid flag may be added to the certain login information, so as to quickly determine that the certain login information is invalid login information in the following. Thus, when the failed login information carries a failure flag, step 101 may be refined as: traversing locally stored login information; judging whether the current login information carries a failure mark or not; if the current login information carries the failure mark, determining that the current login information is the failed login information; and if the current login information does not carry the failure mark, determining that the current login information is not the failed login information. The invalidation mark is a mark added to the login information when the crawler program determines that the login information is invalid.
It should be noted that, in the above embodiment, when determining whether the login information is invalid login information, it may be directly determined whether the login information is invalid login information according to whether the login information has a corresponding initial invalidation time point. However, when the crawler program adds a failure flag to the failed login information, the crawler program only uses the failure flag as a judgment basis when judging whether a certain login information is failed, and does not consider whether a corresponding initial failure time point is included.
Further, when it is determined that the expiration time of the login information is greater than the preset time threshold, it may be determined that the login credential may be successfully applied using the login information, and therefore, in order to avoid the crawler program continuing to recognize the login information as the expiration information, the login information needs to be recovered as valid login information. Specifically, the invalidation flag carried by the login information may be changed into an effective flag, or the invalidation flag may be directly deleted.
Further, in practical application, when the crawler program determines that a certain login information is invalid, the invalid login information is stored locally, and a corresponding initial invalid time point is recorded, so that the invalid time length of the login information is judged according to the initial invalid time point. Therefore, the specific implementation manner for judging whether the failure duration of the login information is greater than the preset time threshold corresponding to the login information may be as follows: acquiring an initial failure time point corresponding to login information from the local; calculating the failure duration of the login information according to the initial failure time point; searching a preset time threshold corresponding to the login information from the local; and judging whether the failure time length is greater than a preset time threshold value.
For example, if the initial failure time point corresponding to the login information is 18: 25: 4 seconds at 10/1/2015, and the current time point is 50: 4 seconds at 7: 2/2015, the crawler program calculates the failure time length to be 13 hours and 25 minutes according to the initial failure time point and the current time point, and the preset time threshold corresponding to the login information is 10 hours. Therefore, the failure time of the login information exceeds the preset time threshold, so that the crawler program can recover the validity of the login information.
Further, since the preset time threshold may be empirically calculated and has an error from the actual recovery period, the preset time threshold may be smaller than the actual recovery period. For the above case, the following problems may occur: when the crawler program determines that the failure duration of certain login information is larger than a preset time threshold, the login information can be recovered to be effective login information, but the login information does not actually reach the actual recovery period, at the moment, if the crawler program applies for login credentials by using the login information, failure can be applied, so that the crawler program determines the login information as the failure login information, and the efficiency of crawling the webpage by the crawler program is reduced. In order to solve the above technical problem, the embodiment of the present invention proposes the following solutions:
if the login information is changed from the valid login information to the invalid login information within the preset time period from the time when the login information is restored to the valid login information, the preset time threshold corresponding to the login information is increased according to a first preset algorithm.
The specific content of the first preset algorithm may be: multiplying the current preset time threshold by a multiple, for example, multiplying the current preset time threshold by 2; the method can also comprise the following steps: the current preset time threshold is increased by a fixed value, for example, 5 minutes is added to the current preset time threshold.
Furthermore, it may happen that the predetermined time threshold is much larger than the actual recovery period, which reduces the usage of the login information. In order to solve the problem, the embodiment of the invention provides the following scheme:
after the login information is restored to be effective login information, a preset time threshold corresponding to the login information is adjusted according to a preset adjustment rule to obtain an optimal time threshold, wherein the optimal time threshold is the time when the validity of the invalid login information is restored based on the current preset time threshold, the login credentials can be successfully applied by using the login information, and if the current preset time threshold is reduced according to a second preset algorithm in the preset adjustment rule, the login credentials cannot be successfully applied by using the login information when the validity of the invalid login information is restored based on the reduced preset time threshold.
Specifically, the preset adjustment rule may be: (1) according to a second preset algorithm, a preset time threshold corresponding to the login information is reduced; (2) if the login information is restored to be valid by using the preset time threshold after being reduced after at least one time of reduction processing, and the phenomenon that the login credential cannot be successfully applied occurs for the first time, the reduction processing is stopped, the corresponding preset time threshold (hereinafter referred to as a first time threshold) when the login credential cannot be successfully applied for the first time occurs is recorded, then the first time threshold is subjected to at least one time of increase processing according to a first preset algorithm until the login information is restored to be valid by using the preset time threshold after the increase processing, and the login credential can be successfully applied; (3) repeating steps (1) - (2) until the following occurs: if the login credentials can be successfully applied after the validity of the login information is restored by using the preset time threshold value after each time of reduction in the process of carrying out at least one time of reduction processing on the preset time threshold value after the reduction processing, but before the preset time threshold value after the last time of reduction processing in the at least one time of reduction processing is reduced, the reduced preset time threshold value is less than or equal to the maximum value of the at least one recorded first time threshold value, namely if the reduction processing is carried out, the login credentials cannot be successfully obtained after the validity of the login information is restored by using the reduced preset time threshold value, the crawler program determines that the preset time threshold value after the last time of the at least one time of reduction processing cannot be reduced any more at the moment, and determining the preset time threshold after the last time of the at least one time of the lowering processing as a final preset time threshold, namely an optimal time threshold.
Since the preset time threshold is greatly reduced, it is likely that the original preset time threshold is quickly adjusted to a value smaller than the actual recovery period, and therefore, the adjustment range of the second preset algorithm is small. The common method is as follows: the current preset time threshold is decremented by a fixed value, for example 2 minutes, based on the current preset time threshold.
For example, if the preset time threshold is 8 hours, and the second preset algorithm is to decrease by 0.5 hours on the basis of the current preset time threshold, the crawler program restores the login information to valid login information when the expiration time of the expired login information is greater than 8 hours. At this time, if the crawler program can successfully obtain the login credential by using the login information, the preset time threshold is reduced according to the second preset algorithm, that is, the reduced preset time threshold is 8-0.5-7.5 hours. If the login information is changed from valid to invalid for some reason after a long period of time (for example, one month), whether the login information can be recovered for valid time is determined to be 7.5 hours or not at this time, that is, after the time length of the invalid time is longer than 7.5 hours, the crawler program recovers the login information to be valid login information. At this time, if the crawler program can still successfully obtain the login credential using the login information, the preset time threshold is reduced again according to the second preset algorithm, that is, the reduced preset time threshold is 7 hours.
If the initial preset time threshold is turned down for 6 times and the preset time threshold after each turn-down is not too small, the login credentials can be successfully applied after the validity of the login information is recovered by using the preset time threshold after each turn-down. However, after the 7 th time of the reduction, the validity of the login information is recovered by using the reduced preset time threshold, but the login credential is not successfully applied, that is, when the preset time threshold is 4.5 hours, the recovery period is shorter than the actual recovery period. Therefore, the preset time threshold after the 7 th turn-down needs to be turned up according to a first preset algorithm. If the first preset algorithm is to multiply the current preset time threshold by 2, the preset time threshold after the increase processing is changed to 4.5 × 2 to 9 hours.
By using 9 hours as a preset time threshold, the login credentials can be successfully applied after the login information is recovered to be valid. At this time, the current preset time threshold needs to be lowered according to a second preset algorithm. In the process of reducing 9 hours to 5 hours, after the validity of the login information is recovered by using the preset time threshold value after each reduction, the login credentials can be successfully applied. When the crawler program wants to reduce the 5 hours again according to the second preset algorithm, it is found that the reduced preset time threshold becomes 4.5 hours, and it is determined that the 4.5 hours are shorter than the actual recovery period, so the crawler program sets the 5 hours as the final preset time threshold, and does not perform subsequent adjustment.
Therefore, by using the method for adjusting the preset time threshold up and down, after the preset time threshold is adjusted for a plurality of times, the obtained preset time threshold is closer to the actual recovery period.
Furthermore, for the permanently invalid login information, no matter how long the invalid time is, the valid login information will not be recovered at the website server side. Therefore, when the crawler restores the login information to the valid login information, the following events occur consecutively a plurality of times: and in a preset time period, the login information is changed from the valid login information into the invalid login information, and the crawler program determines that the login information is the permanently invalid login information and discards the permanently invalid login information.
Further, according to the above method embodiment, another embodiment of the present invention further provides an apparatus for managing website login information, as shown in fig. 2, the apparatus includes: an acquisition unit 21, a judgment unit 22, and a recovery unit 23. Wherein the content of the first and second substances,
an obtaining unit 21 configured to obtain locally stored revocation login information;
a judging unit 22, configured to judge whether the expiration time of the login information acquired by the acquiring unit 21 is greater than a preset time threshold corresponding to the login information;
and a restoring unit 23, configured to restore the login information to valid login information when the determination result of the determining unit 22 is that the time duration of the failure is greater than the preset time threshold.
The device for managing the website login information provided by the embodiment of the invention can store a certain login information in a local place after the crawler program determines that the login information is invalid, detect the login information, judge whether the invalid time of the login information is greater than a preset time threshold corresponding to the login information or not, and recover the login information into effective login information when the invalid time is greater than the preset time threshold. In the whole process of recovering the validity of the login information, the crawler program does not need to discard the invalid login information or manually participate in processing the invalid login information, so that the efficiency of managing the website login information is improved.
Further, as shown in fig. 3, the obtaining unit 21 includes:
a traversal module 211, configured to traverse locally stored login information;
the judging module 212 is configured to judge whether the current login information carries a failure flag, where the failure flag is a flag added to the login information when the crawler determines that the login information is failed;
the determining module 213 is configured to determine that the current login information is invalid login information when the determination result of the determining module 212 is that the current login information carries the invalid flag.
Further, the recovery unit 23 is configured to change the invalid flag carried by the login information into a valid flag.
Further, as shown in fig. 3, the determining unit 22 includes:
an obtaining module 221, configured to obtain an initial failure time point corresponding to the login information from a local location;
a calculating module 222, configured to calculate a failure duration of the login information according to the initial failure time point obtained by the obtaining module 221;
the searching module 223 is used for searching a preset time threshold corresponding to the login information from the local;
the determining module 224 is configured to determine whether the time duration of failure obtained by the calculating module 222 is greater than the preset time threshold searched by the searching module 223.
Further, as shown in fig. 3, the apparatus further includes:
and an adjusting unit 24, configured to, after the login information is restored to the valid login information, if the login information is changed from the valid login information to the invalid login information within a preset time period, increase a preset time threshold corresponding to the login information according to a first preset algorithm.
Further, the adjusting unit 24 is further configured to adjust a preset time threshold corresponding to the login information according to a preset adjusting rule, so as to obtain an optimal time threshold;
and if the current preset time threshold is reduced according to a second preset algorithm in the preset adjustment rule, the login credentials cannot be successfully applied by using the login information when the validity of the invalid login information is restored based on the reduced preset time threshold.
The embodiment of the apparatus corresponds to the embodiment of the method, and for convenience of reading, details in the embodiment of the apparatus are not repeated one by one, but it should be clear that the apparatus in the embodiment can correspondingly implement all the contents in the embodiment of the method.
The device for managing the website login information comprises a processor and a memory, wherein the acquisition unit, the judgment unit, the recovery unit and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to be one or more than one, and the efficiency of the crawler program for managing the website login information is improved by adjusting the kernel parameters.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
The present application further provides a computer program product adapted to perform program code for initializing the following method steps when executed on a data processing device:
acquiring locally stored invalid login information;
judging whether the failure duration of the login information is greater than a preset time threshold corresponding to the login information;
and if the failure duration is greater than the preset time threshold, recovering the login information into effective login information.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (12)

1. A method for managing website login information, the method being applied to a crawler, the method comprising:
acquiring locally stored invalid login information;
judging whether the failure duration of the login information is greater than a preset time threshold corresponding to the login information; the preset time threshold is a recovery period for judging whether the invalid login information can recover the validity;
and if the failure duration is greater than the preset time threshold, recovering the login information into effective login information.
2. The method of claim 1, wherein obtaining locally stored stale login information comprises:
traversing locally stored login information;
judging whether the current login information carries a failure mark, wherein the failure mark is a mark added to the login information when the crawler determines that the login information is failed;
and if the current login information carries a failure mark, determining that the current login information is failed login information.
3. The method of claim 2, wherein the restoring the login information to valid login information comprises:
and changing the invalid mark carried by the login information into an effective mark.
4. The method according to claim 1, wherein the determining whether the expiration duration of the login information is greater than a preset time threshold corresponding to the login information comprises:
acquiring an initial failure time point corresponding to the login information from a local place;
calculating the failure duration of the login information according to the initial failure time point;
searching a preset time threshold corresponding to the login information from the local;
and judging whether the failure duration is greater than the preset time threshold.
5. The method of claim 1, further comprising:
if the login information is changed from the valid login information to the invalid login information within a preset time period from the time when the login information is restored to the valid login information, the preset time threshold corresponding to the login information is increased according to a first preset algorithm.
6. The method of claim 5, wherein after the restoring the login information to valid login information, the method further comprises:
adjusting a preset time threshold corresponding to the login information according to a preset adjustment rule to obtain an optimal time threshold;
and if the current preset time threshold is reduced according to a second preset algorithm in the preset adjustment rule, the login credentials cannot be successfully applied by using the login information when the validity of the invalid login information is restored based on the reduced preset time threshold.
7. An apparatus for managing website login information, the apparatus being applied to a crawler program, the apparatus comprising:
the acquisition unit is used for acquiring locally stored invalid login information;
the judging unit is used for judging whether the failure duration of the login information acquired by the acquiring unit is greater than a preset time threshold corresponding to the login information; the preset time threshold is a recovery period for judging whether the invalid login information can recover the validity;
and the recovery unit is used for recovering the login information into effective login information when the judgment result of the judgment unit is that the failure duration is greater than the preset time threshold.
8. The apparatus of claim 7, wherein the obtaining unit comprises:
the traversing module is used for traversing the locally stored login information;
the judging module is used for judging whether the current login information carries a failure mark, wherein the failure mark is a mark added to the login information when the crawler determines that the login information is failed;
and the determining module is used for determining that the current login information is invalid login information when the judgment result of the judging module is that the current login information carries an invalid mark.
9. The apparatus according to claim 8, wherein the recovery unit is configured to change a failure flag carried by the login information to a valid flag.
10. The apparatus according to claim 7, wherein the determining unit comprises:
the acquisition module is used for acquiring an initial failure time point corresponding to the login information from a local place;
the calculation module is used for calculating the failure duration of the login information according to the initial failure time point obtained by the obtaining module;
the searching module is used for searching a preset time threshold corresponding to the login information from a local place;
and the judging module is used for judging whether the failure duration obtained by the calculating module is greater than the preset time threshold searched by the searching module.
11. A storage medium, comprising a stored program, wherein when the program runs, a device in which the storage medium is located is controlled to execute the method for managing website login information according to any one of claims 1 to 6.
12. A processor for executing a program, wherein the program executes the method for managing website login information according to any one of claims 1 to 6.
CN201510745533.7A 2015-11-05 2015-11-05 Method and device for managing website login information Active CN106681992B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510745533.7A CN106681992B (en) 2015-11-05 2015-11-05 Method and device for managing website login information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510745533.7A CN106681992B (en) 2015-11-05 2015-11-05 Method and device for managing website login information

Publications (2)

Publication Number Publication Date
CN106681992A CN106681992A (en) 2017-05-17
CN106681992B true CN106681992B (en) 2020-12-01

Family

ID=58857757

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510745533.7A Active CN106681992B (en) 2015-11-05 2015-11-05 Method and device for managing website login information

Country Status (1)

Country Link
CN (1) CN106681992B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110968760A (en) * 2018-09-30 2020-04-07 北京国双科技有限公司 Webpage data crawling method and device, and webpage login method and device
CN110138719B (en) * 2019-03-05 2022-05-27 北京车和家信息技术有限公司 Network security detection method and device and electronic equipment

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101047504A (en) * 2006-03-29 2007-10-03 腾讯科技(深圳)有限公司 Network log-in authorization method and authorization system
CN102073678A (en) * 2010-12-03 2011-05-25 厦门市美亚柏科信息股份有限公司 System and method for analyzing information of websites
CN102214011A (en) * 2010-04-09 2011-10-12 北京搜狗科技发展有限公司 Method and device for initiating input method remote calculation request
CN103034632A (en) * 2011-09-29 2013-04-10 北京神州泰岳软件股份有限公司 Information transmitting method and a system
CN203301755U (en) * 2013-04-18 2013-11-20 浙江金之路信息科技有限公司 Wireless network transmission and coverage apparatus based on multi-network integration technology
CN103457738A (en) * 2013-08-30 2013-12-18 优视科技有限公司 Method and system for login processing based on browser
CN103544193A (en) * 2012-07-17 2014-01-29 北京千橡网景科技发展有限公司 Method and apparatus for recognizing network robot
CN103744907A (en) * 2013-12-26 2014-04-23 三星电子(中国)研发中心 Information publishing method and device
CN103886248A (en) * 2014-04-08 2014-06-25 国家电网公司 Website weak password detecting method
CN103984719A (en) * 2014-05-12 2014-08-13 浪潮电子信息产业股份有限公司 Method for acquiring by using crawler to simulate login
CN104320309A (en) * 2014-11-17 2015-01-28 上海斐讯数据通信技术有限公司 Automatic testing system and method for automatic redial function
CN104811449A (en) * 2015-04-21 2015-07-29 深信服网络科技(深圳)有限公司 Base collision attack detecting method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103179134A (en) * 2013-04-19 2013-06-26 中国建设银行股份有限公司 Single sign on method and system based on Cookie and application server thereof

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101047504A (en) * 2006-03-29 2007-10-03 腾讯科技(深圳)有限公司 Network log-in authorization method and authorization system
CN102214011A (en) * 2010-04-09 2011-10-12 北京搜狗科技发展有限公司 Method and device for initiating input method remote calculation request
CN102073678A (en) * 2010-12-03 2011-05-25 厦门市美亚柏科信息股份有限公司 System and method for analyzing information of websites
CN103034632A (en) * 2011-09-29 2013-04-10 北京神州泰岳软件股份有限公司 Information transmitting method and a system
CN103544193A (en) * 2012-07-17 2014-01-29 北京千橡网景科技发展有限公司 Method and apparatus for recognizing network robot
CN203301755U (en) * 2013-04-18 2013-11-20 浙江金之路信息科技有限公司 Wireless network transmission and coverage apparatus based on multi-network integration technology
CN103457738A (en) * 2013-08-30 2013-12-18 优视科技有限公司 Method and system for login processing based on browser
CN103744907A (en) * 2013-12-26 2014-04-23 三星电子(中国)研发中心 Information publishing method and device
CN103886248A (en) * 2014-04-08 2014-06-25 国家电网公司 Website weak password detecting method
CN103984719A (en) * 2014-05-12 2014-08-13 浪潮电子信息产业股份有限公司 Method for acquiring by using crawler to simulate login
CN104320309A (en) * 2014-11-17 2015-01-28 上海斐讯数据通信技术有限公司 Automatic testing system and method for automatic redial function
CN104811449A (en) * 2015-04-21 2015-07-29 深信服网络科技(深圳)有限公司 Base collision attack detecting method and system

Also Published As

Publication number Publication date
CN106681992A (en) 2017-05-17

Similar Documents

Publication Publication Date Title
CN113840012B (en) Block chain-based screen recording evidence obtaining method and system and electronic equipment
EP3312733A2 (en) Method, system and server of removing a distributed caching object
JP2017532649A5 (en)
WO2017096968A1 (en) Log uploading method and apparatus
JP2019537115A (en) Method, apparatus and system for detecting abnormal user behavior
US20160034861A1 (en) Method and apparatus of controlling network payment
CN110968760A (en) Webpage data crawling method and device, and webpage login method and device
CN106681992B (en) Method and device for managing website login information
CN109165112B (en) Fault recovery method, system and related components of metadata cluster
CN110599229A (en) Hundred million-level flow advertisement real-time processing method, storage medium, electronic equipment and system
CN106611118B (en) Method and device for applying login credentials
CN106657422B (en) Method, device and system for crawling website page and storage medium
US20200334358A1 (en) Method for detecting computer virus, computing device, and storage medium
CN112579623A (en) Method, device, storage medium and equipment for storing data
CN106911636B (en) Method and device for detecting whether backdoor program exists in website
CN110889065B (en) Page stay time determination method, device and equipment
CN108228613B (en) Data reading method and device
RU2624558C2 (en) Method, terminal and server for file fields adjustment
CN113992378B (en) Security monitoring method and device, electronic equipment and storage medium
CN111125087A (en) Data storage method and device
CN111240750B (en) Awakening method and device for target application program
CN113342579A (en) Data restoration method and device
CN108234196B (en) Fault detection method and device
CN109561123B (en) Token caching method and device
CN107544968B (en) Method and device for determining website availability

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: Beijing Guoshuang Technology Co.,Ltd.

Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing

Applicant before: Beijing Guoshuang Technology Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant