CN114386059A - Webpage text confusion anti-crawler method and device, electronic equipment and storage medium - Google Patents

Webpage text confusion anti-crawler method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114386059A
CN114386059A CN202111536063.5A CN202111536063A CN114386059A CN 114386059 A CN114386059 A CN 114386059A CN 202111536063 A CN202111536063 A CN 202111536063A CN 114386059 A CN114386059 A CN 114386059A
Authority
CN
China
Prior art keywords
picture
text
data
address
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111536063.5A
Other languages
Chinese (zh)
Inventor
王斌
史忠伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing 58 Information Technology Co Ltd
Original Assignee
Beijing 58 Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing 58 Information Technology Co Ltd filed Critical Beijing 58 Information Technology Co Ltd
Priority to CN202111536063.5A priority Critical patent/CN114386059A/en
Publication of CN114386059A publication Critical patent/CN114386059A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2107File encryption

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Data Mining & Analysis (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a webpage text confusion anti-crawler method and device, electronic equipment and a storage medium, and relates to the technical field of website page security. The method comprises the following steps: generating corresponding picture data according to text data corresponding to the webpage text, and storing the picture data to a first server; encrypting a picture address corresponding to the picture data returned by the first server to obtain an encrypted picture address, and storing the encrypted picture address into a preset database of the second server; and when a text rendering request corresponding to the webpage text sent by the user terminal is received, returning the encrypted picture address to the user terminal. Therefore, the problem that the webpage text visible in the webpage in the related technology cannot avoid the crawler to acquire the webpage data can be solved.

Description

Webpage text confusion anti-crawler method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of website page security, in particular to a webpage text confusion anti-crawler method and device, electronic equipment and a storage medium.
Background
In the prior art, page data is subject to crawling by a crawler, and the crawler can crawl related page data according to a page DOM structure, so that great loss is caused to application services.
In the process of implementing the invention, the applicant finds that page data in the related art at least face the following problems:
1) font anti-crawler: the WEB page replaces data on a Document Object Model (DOM) in a font file mode, crawl data is avoided, but the crawler can analyze the page and find a mapping rule of fonts and the data, so that the data crawled to the page is cracked
2) Picture anti-crawler: by changing key data in the page into pictures and mixing with normal characters, the method can make the crawler program unable to obtain visible character content. But the crawler may acquire the picture address, download the picture, and then identify the picture content by OCR (Optical Character Recognition).
In order to solve the above problems, no effective technical means has been proposed.
Disclosure of Invention
The embodiment of the invention provides a webpage text confusion anti-crawler method, a webpage text confusion anti-crawler device, electronic equipment and a storage medium, and aims to solve the problem that in the related art, a crawler cannot acquire webpage data due to visible webpage texts in visible webpages in webpages.
In order to solve the technical problem, the invention is realized as follows:
in a first aspect, an embodiment of the present invention provides a web page text obfuscation anti-crawler method, which is applied to a second server, and the method includes: generating corresponding picture data according to text data corresponding to the webpage text, and storing the picture data to a first server; encrypting a picture address corresponding to the picture data returned by the first server to obtain an encrypted picture address, and storing the encrypted picture address into a preset database of the second server; and when a text rendering request corresponding to the webpage text sent by the user terminal is received, returning the encrypted picture address to the user terminal.
Further, generating corresponding picture data according to text data corresponding to the webpage text, including: generating a corresponding identification code according to the text data; and if the identification code does not exist in the second server, generating the picture data according to the text data.
Further, if the identification code does not exist in the second server, generating the picture data according to the text data, including: generating a corresponding picture byte stream according to the text data; and adding noise into the picture byte stream to obtain the picture data.
Further, when the text rendering request corresponding to the webpage text sent by the user terminal is received, after the encrypted picture address is returned to the user terminal, the method further includes: sending a picture data request based on the encrypted picture address in the user terminal, wherein the picture data request is used for requesting the picture data; intercepting the picture data request, and repositioning the encrypted picture address to obtain the picture address; requesting the picture data based on the picture address.
Further, intercepting the picture data request, and repositioning the encrypted picture address to obtain the picture address, including: and decrypting the encrypted picture address through a preset script in the user terminal to obtain the picture address.
In a second aspect, an embodiment of the present invention further provides a web page text obfuscation anti-crawler apparatus, applied to a second server, where the apparatus includes: the processing module is used for generating corresponding picture data according to the text data corresponding to the webpage text and storing the picture data to the first server; the encryption module is used for encrypting a picture address corresponding to the picture data returned by the first server to obtain an encrypted picture address, and storing the encrypted picture address into a preset database of the second server; and the sending module is used for returning the encrypted picture address to the user terminal when receiving a text rendering request corresponding to the webpage text sent by the user terminal.
Further, the processing module comprises: the first processing submodule is used for generating a corresponding identification code according to the text data; and the second processing submodule is used for generating the picture data according to the text data if the identification code does not exist in the second server.
Further, the second processing sub-module includes: the conversion unit is used for generating a corresponding picture byte stream according to the text data; and the processing unit is used for adding noise into the picture byte stream to obtain the picture data.
Further, still include: a first request module, configured to, after returning the encrypted picture address to the user terminal when receiving a text rendering request corresponding to the web page text sent by the user terminal, send, in the user terminal, a picture data request based on the encrypted picture address, where the picture data request is used to request the picture data; the intercepting module is used for intercepting the picture data request, repositioning the encrypted picture address to obtain the picture address request module, and requesting the picture data based on the picture address.
Further, the intercepting module includes: and the decryption unit is used for decrypting the encrypted picture address through a preset script in the user terminal to obtain the picture address.
In a third aspect, an embodiment of the present invention additionally provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the web page text obfuscation anti-crawler method as described in the previous first aspect.
In a fourth aspect, the present invention provides a storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the web page text obfuscation anti-crawler method according to the first aspect.
In the embodiment of the invention, corresponding picture data is generated according to text data corresponding to a webpage text, and the picture data is stored to a first server; encrypting a picture address corresponding to the picture data returned by the first server to obtain an encrypted picture address, and storing the encrypted picture address into a preset database of the second server; and when a text rendering request corresponding to the webpage text sent by the user terminal is received, returning the encrypted picture address to the user terminal. Text confusion is carried out on the text data based on the picture data, and then the storage address of the picture data is encrypted and hidden, so that the difficulty of crawling the webpage data by a crawler is improved, and the black product countermeasure cost is improved. The problem that the webpage text visible in the webpage in the related technology cannot avoid the crawler to acquire the webpage data is further solved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without inventive labor.
FIG. 1 is a diagram illustrating a hardware application scenario in an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a method for obfuscating a webpage text against a crawler according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating another page text obfuscation anti-crawler method in an embodiment of the invention;
FIG. 4 is a flow chart of a method for rendering a web page in an embodiment of the invention;
fig. 5 is a schematic structural diagram of a web page text confusion anti-crawler apparatus in an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
A crawler is a program or script that automatically captures information according to certain rules. In the prior art, webpage data face crawling to crawlers, and the crawlers can crawl related webpage data according to a webpage DOM structure, so that great loss is caused to application services.
In order to solve the above problem, in an embodiment of the present invention, a web page text confusion anti-crawler method is provided to improve anti-web crawler capability of a web page text. Fig. 1 is a schematic diagram of a hardware application scenario of a web page text obfuscation anti-crawler method according to an embodiment of the present invention. The scenario includes a first server 10, a second server 20, and a user terminal 30, where the first server 10, the second server 20, and the user terminal 30 have a network connection, and can implement transmission of network data.
In this embodiment, the second server 20 receives text data of a web page text sent by the first server 10, generates corresponding picture data according to the text data corresponding to the web page text, stores the picture data in the first server 10, and after the second server 20 encrypts a storage address of the picture data, the second server 20 stores the encrypted picture address in a preset database of the second server 20.
Then, when the user terminal 30 requests a target webpage text of the target webpage, a text rendering request is sent to the second server 20, after the second server 20 receives the text rendering request, the webpage text requested in the text rendering request is determined, and an encrypted picture address of the webpage text is returned to the user terminal 30, so that the user terminal 30 cannot acquire picture data corresponding to the webpage text based on the encrypted picture address, and the actual storage address of the webpage text of the target webpage text is encrypted and hidden.
According to the embodiment of the invention, text confusion is carried out on the text data based on the picture data, and then the storage address of the picture data is encrypted and hidden, so that the difficulty of crawling the webpage data by a crawler is improved, and the black product countermeasure cost is improved. The problem that the webpage text visible in the webpage in the related technology cannot avoid the crawler to acquire the webpage data is further solved.
Specifically, according to an embodiment of the present invention, there is provided a web page text obfuscation anti-crawler method applied to a second server, as shown in fig. 2, the method may specifically include the following steps:
s202, generating corresponding picture data according to text data corresponding to the webpage text, and storing the picture data to a first server;
in this embodiment, each web page corresponds to one or more web page texts, and each web page text corresponds to a set of text data. And generating corresponding picture data according to the text data corresponding to the webpage text, wherein the display content of the picture data is the same as that of the text data. Text confusion of the text data is realized by converting the text data corresponding to the webpage text into the picture. When the webpage text is integrally converted into the picture in the preset format, mosaic, watermark and other processing can be added to the picture.
Optionally, in this embodiment, the corresponding picture data is generated according to text data corresponding to a webpage text, which includes but is not limited to: generating a corresponding identification code according to the text data; and if the identification code does not exist in the second server, generating picture data according to the text data.
Specifically, in order to realize fast rendering and reading of the web page text, each web page text needs to be encoded, for example, a corresponding identification code is generated according to text data corresponding to the web page text.
Then, it is determined whether the identification code exists in the database of the second server.
If the text data exists, the text data representing the webpage text and the corresponding picture data are already stored in the database, and the text data does not need to be converted into the picture data again.
And if the identification code corresponding to the webpage text does not exist, the webpage text is not stored in the database of the first server before being represented, text confusion is carried out on the text data of the webpage text, and the text data is converted into corresponding picture data.
In a specific application scenario, the identification code of the text data of the web page text is a unique identification code, for example, each web page text corresponds to one text key. Then, picture data corresponding to the text data is generated based on the text key. And then storing the picture data to the first server, and storing the text key of the picture data in a database of the second server. In this embodiment, according to a preset algorithm or model, a unique identification code is generated for text data corresponding to a web page text, the identification code corresponding to each web page text is fixed, and if a plurality of web page texts exist in a target web page, each text corresponds to a group of text data and corresponds to one identification code.
Through the embodiment, the corresponding identification code is generated according to the text data, and under the condition that the identification code corresponding to the text data does not exist in the database, the picture data is generated according to the text data, so that text confusion of the webpage text is realized.
Optionally, in this embodiment, if the identification code does not exist, the image data is generated according to the text data, which includes but is not limited to: generating a corresponding picture byte stream according to the text data; noise is added to the picture byte stream to obtain picture data.
In an actual application scene, after text data is converted into picture data, a picture crawler program can acquire a picture address of the picture data, then download the picture data based on the picture address, then recognize picture content through an OCR (optical character recognition), and further complete text content crawling of a webpage text.
In contrast, in this embodiment, in the process of converting text data into picture data, a corresponding picture byte stream is generated according to the text data, and then noise is added to the picture byte stream, so as to improve the OCR character recognition difficulty of the picture data.
Specifically, the noise is added to the picture byte stream, and the pixel points in the picture byte stream may be modified according to a preset rule or at random, or the gray scale, color, brightness, and the like of the whole picture in the picture byte stream are adjusted, so as to improve the OCR character recognition difficulty of the picture data. For example, the RGB values of the image pixels are adjusted.
By the embodiment, the noise is added into the picture byte stream, so that the OCR character recognition difficulty of the picture data is increased, the recognition accuracy is reduced, and the anti-crawling capability of the text data of the webpage text is further improved.
S204, encrypting a picture address corresponding to the picture data returned by the first server to obtain an encrypted picture address, and storing the encrypted picture address into a preset database of the second server;
the method includes the steps of encrypting a picture address corresponding to picture data returned by a first server to obtain an encrypted picture address, and then storing the encrypted picture address into a preset database of a second server.
And S206, when receiving a text rendering request corresponding to the webpage text sent by the user terminal, returning the encrypted picture address to the user terminal.
Specifically, when the webpage text needs to be rendered in the preset application of the user terminal, a text rendering request is sent to the second server according to the identification code of the webpage text or other information, and the text rendering request carries information such as the identification code of the webpage text needing to be rendered.
And then, after receiving the text rendering request, the second server determines a picture encryption address corresponding to the webpage text according to the text rendering request, and returns the picture encryption address to the user terminal.
In practical application, if a tool such as a web crawler requests to obtain a picture encryption address returned by the second server through a series of means, the picture encryption address cannot be decrypted, so that picture data corresponding to a web text cannot be obtained, the storage address of the picture data is encrypted and hidden, and the difficulty of crawling the web data by the crawler is improved.
Optionally, in this embodiment, after returning the encrypted picture address to the user terminal when receiving a text rendering request corresponding to a webpage text sent by the user terminal, the method further includes, but is not limited to: in the user terminal, sending a picture data request based on the encrypted picture address, wherein the picture data request is used for requesting picture data; intercepting a picture data request, and repositioning an encrypted picture address to obtain a picture address; picture data is requested based on the picture address.
Specifically, when the user terminal accesses the target webpage, the target webpage performs text rendering of the webpage text and sends a text rendering request to the second server. And after the second server receives the text rendering request sent to the second server by the user terminal, the user terminal returns the picture encryption address corresponding to the picture data.
And then, in the preset application of the user terminal, sending a picture data request according to the picture encryption address, wherein the picture data request is used for requesting to acquire corresponding picture data. Intercepting a picture data request through a user terminal, then acquiring a picture encryption address in the picture data request for decryption, then acquiring picture data based on the decrypted picture address request, and then returning the picture data to the user terminal to realize webpage text rendering of a target webpage.
Further optionally, in this embodiment, the image data request is intercepted, and the encrypted image address is relocated to obtain an image address, which includes but is not limited to: and decrypting the encrypted picture address to obtain the picture address through a preset script in the user terminal.
It should be noted that a preset script in the user terminal is preset with a preset encryption algorithm, and the preset script is loaded in the process of webpage rendering by the user. And encrypting the storage address of the picture data returned by the first server in the second server based on a preset encryption algorithm to obtain a picture encryption address. When a user terminal sends a picture data request to acquire picture data based on a picture encryption address, a preset script intercepts the picture encryption address, and the picture encryption address is decrypted based on a preset encryption algorithm to obtain the picture address.
In one example, a preset script JSSDK is accessed into a web page of a preset application of a user terminal, and when the web page performs webpage text rendering, a picture data request of the web page is intercepted through the JSSDK. And then, acquiring an address URL of the picture data requested by the web page, decrypting the address URL, sending the picture data to the first server through the decrypted URL to request to acquire the picture data, and rendering the picture data to the web page. The confusion is performed based on the front-end SDK codes, so that a crawler developer does not know the encryption logic of the picture address and can not multiplex the picture data downloading mode.
By the embodiment, the picture data request is intercepted, the picture address is obtained by repositioning according to the picture encryption address in the picture data request, the picture data is obtained based on the picture address picture data request, and the picture data URL is protected by encrypting the storage address URL of the picture data, so that a black producer cannot obtain the picture data by copying and downloading.
Specifically, a first picture address URL corresponding to target picture data in the picture data request is obtained, at the moment, the first picture address URL is encrypted, a second picture address URL is obtained by decrypting the first picture address URL, then the picture data request is generated based on the second picture address URL, and the picture data request is sent to the first server through the second picture address URL so as to request for obtaining the target picture data.
As a preferable implementation, in this embodiment, the picture encryption address of the picture data in the preset database is updated every preset time. Specifically, according to a preset algorithm or a preset model, a picture address corresponding to each picture data stored in a preset database is updated regularly, so that the difficulty of the crawler in colliding with the database is improved.
According to the embodiment, corresponding picture data are generated according to text data corresponding to the webpage text, and the picture data are stored in the first server; encrypting a picture address corresponding to the picture data returned by the first server to obtain an encrypted picture address, and storing the encrypted picture address into a preset database of the second server; and when a text rendering request corresponding to the webpage text sent by the user terminal is received, returning the encrypted picture address to the user terminal. Text confusion is carried out on the text data based on the picture data, and then the storage address of the picture data is encrypted and hidden, so that the difficulty of crawling the webpage data by a crawler is improved, and the black product countermeasure cost is improved. The problem that the webpage text visible in the webpage in the related technology cannot avoid the crawler to acquire the webpage data is further solved.
Example two
Describing in detail the web page text obfuscation anti-crawler method provided by the embodiment of the present invention, as shown in fig. 3, the method may specifically include the following steps:
s301, receiving text data of a webpage text;
specifically, the first server pushes data to a message pipeline of the second server through a service link, and a picture service corresponding to the second server takes text data from the message pipeline;
s302, judging whether an identification code corresponding to the text data exists in the database;
specifically, the picture service corresponding to the second server generates the identification code key according to a preset rule and text data, and then judges whether the unique identification code key exists in a database of the second server. If yes, ending text data conversion; otherwise, S303 is executed.
S303, generating picture data according to the text data;
specifically, if no key exists in the database, the text data is converted into a picture byte stream, then noise is added to the picture byte stream to obtain picture data, and then the picture data is stored in the first server.
S304, encrypting the picture address;
specifically, a picture address corresponding to the picture data returned by the first server is received, and the picture address is encrypted and then stored in the database.
In addition, describing in detail a web page rendering method provided by the embodiment of the present invention, as shown in fig. 4, the method may specifically include the following steps:
s401, initiating a web page rendering request of a web page;
specifically, the user terminal sends a web page rendering request to the second server for requesting text data of the text of the target web page.
S402, receiving a picture encryption address returned by the second server;
s403, initiating a picture data request based on the picture encryption request;
s404, a preset plug-in JSSDK is accessed to the web page, and a picture data request of the web page is intercepted through the JSSDK;
specifically, a picture data request is intercepted through a preset plug-in JSSDK of a web page.
S405, decrypting the picture encryption address in the picture data request to obtain a target picture address;
s406, generating a target picture data request according to the target picture address to acquire picture data corresponding to the web page;
s407, web page rendering is carried out according to the picture data returned by the first server.
Through the embodiment, text confusion is carried out on the text data based on the picture data, then the storage address of the picture data is encrypted and hidden, the difficulty of crawling the webpage data by a crawler is improved, and the black product countermeasure cost is improved. The problem that the webpage text visible in the webpage in the related technology cannot avoid the crawler to acquire the webpage data is further solved.
EXAMPLE III
The embodiment of the invention provides a webpage text confusion anti-crawler device.
Referring to fig. 5, a schematic structural diagram of a web page text obfuscation anti-crawler apparatus according to an embodiment of the present invention is shown, where the apparatus is applied to a second server.
The application program loading device of the embodiment of the invention comprises: a processing module 50, an encryption module 52, and a sending module 54.
The functions of the modules and the interaction relationship between the modules are described in detail below.
The processing module 50 is configured to generate corresponding picture data according to text data corresponding to a web page text, and store the picture data to the first server;
the encryption module 52 is configured to encrypt a picture address corresponding to the picture data returned by the first server to obtain an encrypted picture address, and store the encrypted picture address in a preset database of the second server;
and a sending module 54, configured to return the encrypted picture address to the user terminal when receiving a text rendering request corresponding to the webpage text sent by the user terminal.
Optionally, in this embodiment, the processing module 50 includes:
the first processing submodule is used for generating a corresponding identification code according to the text data;
and the second processing submodule is used for generating the picture data according to the text data if the identification code does not exist in the second server.
Optionally, in this embodiment, the second processing sub-module includes:
the conversion unit is used for generating a corresponding picture byte stream according to the text data;
and the processing unit is used for adding noise into the picture byte stream to obtain the picture data.
Optionally, in this embodiment, the method further includes:
a first request module, configured to, after returning the encrypted picture address to the user terminal when receiving a text rendering request corresponding to the web page text sent by the user terminal, send, in the user terminal, a picture data request based on the encrypted picture address, where the picture data request is used to request the picture data;
an interception module for intercepting the picture data request and relocating the encrypted picture address to obtain the picture address
A request module for requesting the picture data based on the picture address.
Optionally, in this embodiment, the intercepting module includes:
and the decryption unit is used for decrypting the encrypted picture address through a preset script in the user terminal to obtain the picture address.
Moreover, in the embodiment of the invention, corresponding picture data is generated according to text data corresponding to the webpage text, and the picture data is stored in the first server; encrypting a picture address corresponding to the picture data returned by the first server to obtain an encrypted picture address, and storing the encrypted picture address into a preset database of the second server; and when a text rendering request corresponding to the webpage text sent by the user terminal is received, returning the encrypted picture address to the user terminal. Text confusion is carried out on the text data based on the picture data, and then the storage address of the picture data is encrypted and hidden, so that the difficulty of crawling the webpage data by a crawler is improved, and the black product countermeasure cost is improved. The problem that the webpage text visible in the webpage in the related technology cannot avoid the crawler to acquire the webpage data is further solved. .
Example four
Preferably, an embodiment of the present invention further provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program when executed by the processor implementing the steps of the web page text obfuscation anti-crawler method as described above. And the same technical effect can be achieved, and in order to avoid repetition, the description is omitted.
The embodiment of the invention also provides a storage medium, wherein a computer program is stored on the storage medium, and when being executed by a processor, the computer program realizes each process of the page text confusion anti-crawler method embodiment, can achieve the same technical effect, and is not repeated here to avoid repetition. The storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (12)

1. A webpage text confusion anti-crawler method is applied to a second server, and comprises the following steps:
generating corresponding picture data according to text data corresponding to the webpage text, and storing the picture data to a first server;
encrypting a picture address corresponding to the picture data returned by the first server to obtain an encrypted picture address, and storing the encrypted picture address into a preset database of the second server;
and when a text rendering request corresponding to the webpage text sent by the user terminal is received, returning the encrypted picture address to the user terminal.
2. The method of claim 1, wherein generating corresponding picture data from text data corresponding to web page text comprises:
generating a corresponding identification code according to the text data;
and if the identification code does not exist in the second server, generating the picture data according to the text data.
3. The method of claim 2, wherein generating the picture data according to the text data if the identification code does not exist in the second server comprises:
generating a corresponding picture byte stream according to the text data;
and adding noise into the picture byte stream to obtain the picture data.
4. The method according to claim 1, further comprising, after returning the encrypted picture address to the user terminal when receiving a text rendering request corresponding to the webpage text sent by the user terminal, the method further comprising:
sending a picture data request based on the encrypted picture address in the user terminal, wherein the picture data request is used for requesting the picture data;
intercepting the picture data request, and repositioning the encrypted picture address to obtain the picture address;
requesting the picture data based on the picture address.
5. The method of claim 4, wherein intercepting the picture data request and relocating the encrypted picture address to obtain the picture address comprises:
and decrypting the encrypted picture address through a preset script in the user terminal to obtain the picture address.
6. A web page text obfuscation anti-crawler apparatus applied to a second server, the apparatus comprising:
the processing module is used for generating corresponding picture data according to the text data corresponding to the webpage text and storing the picture data to the first server;
the encryption module is used for encrypting a picture address corresponding to the picture data returned by the first server to obtain an encrypted picture address, and storing the encrypted picture address into a preset database of the second server;
and the sending module is used for returning the encrypted picture address to the user terminal when receiving a text rendering request corresponding to the webpage text sent by the user terminal.
7. The apparatus of claim 6, wherein the processing module comprises:
the first processing submodule is used for generating a corresponding identification code according to the text data;
and the second processing submodule is used for generating the picture data according to the text data if the identification code does not exist in the second server.
8. The apparatus of claim 7, wherein the second processing sub-module comprises:
the conversion unit is used for generating a corresponding picture byte stream according to the text data;
and the processing unit is used for adding noise into the picture byte stream to obtain the picture data.
9. The apparatus of claim 6, further comprising:
a first request module, configured to, after returning the encrypted picture address to the user terminal when receiving a text rendering request corresponding to the web page text sent by the user terminal, send, in the user terminal, a picture data request based on the encrypted picture address, where the picture data request is used to request the picture data;
an interception module for intercepting the picture data request and relocating the encrypted picture address to obtain the picture address
A request module for requesting the picture data based on the picture address.
10. The apparatus of claim 9, wherein the intercepting module comprises:
and the decryption unit is used for decrypting the encrypted picture address through a preset script in the user terminal to obtain the picture address.
11. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the web page text obfuscation anti-crawler method as claimed in any one of claims 1 to 5.
12. A storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the web page text obfuscation anti-crawler method according to any one of claims 1 to 5.
CN202111536063.5A 2021-12-15 2021-12-15 Webpage text confusion anti-crawler method and device, electronic equipment and storage medium Pending CN114386059A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111536063.5A CN114386059A (en) 2021-12-15 2021-12-15 Webpage text confusion anti-crawler method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111536063.5A CN114386059A (en) 2021-12-15 2021-12-15 Webpage text confusion anti-crawler method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114386059A true CN114386059A (en) 2022-04-22

Family

ID=81197237

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111536063.5A Pending CN114386059A (en) 2021-12-15 2021-12-15 Webpage text confusion anti-crawler method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114386059A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114896531A (en) * 2022-04-27 2022-08-12 北京聚通达科技股份有限公司 Image processing method and device, electronic equipment and storage medium
CN116932854A (en) * 2023-09-14 2023-10-24 百鸟数据科技(北京)有限责任公司 Webpage information anticreeper method, device, system, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114896531A (en) * 2022-04-27 2022-08-12 北京聚通达科技股份有限公司 Image processing method and device, electronic equipment and storage medium
CN116932854A (en) * 2023-09-14 2023-10-24 百鸟数据科技(北京)有限责任公司 Webpage information anticreeper method, device, system, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN102184351B (en) Content reading system and content reading method
CN114386059A (en) Webpage text confusion anti-crawler method and device, electronic equipment and storage medium
CN113806806B (en) Desensitization and restoration method and system for webpage screenshot
CN111460503B (en) Data sharing method, device, equipment and storage medium
CN112529586B (en) Transaction information management method, device, equipment and storage medium
CN109379351B (en) Two-dimensional code encryption method, storage medium, equipment and system
US8656157B2 (en) Method for sending and receiving an encrypted message and a system thereof
CN112199622A (en) Page jump method, system and storage medium
CN115484086B (en) Cloud mobile phone screen sharing method, electronic equipment and computer readable storage medium
CN112035827A (en) Cipher data processing method, device, equipment and readable storage medium
CN113793245A (en) Image encryption method, image decryption device, electronic device, and medium
CN114115903A (en) Method and device for reinforcing small program and operating small program
CN111368322B (en) File decryption method and device, electronic equipment and storage medium
CN111460502B (en) Data sharing method, device, equipment and storage medium
Khan et al. A novel combination of information confidentiality and data hiding mechanism
US11088824B2 (en) Method and apparatus for use in information processing
CN107729345B (en) Website data processing method and device, website data processing platform and storage medium
CN116431948A (en) Picture loading method and device, electronic equipment and storage medium
CN111666466A (en) Method, system, apparatus and computer-readable storage medium for preventing crawler
CN113946862A (en) Data processing method, device and equipment and readable storage medium
CN112182603B (en) Anti-crawler method and device
CN109145645B (en) Method for protecting short message verification code in android mobile phone
CN111131270B (en) Data encryption and decryption method and device, electronic equipment and storage medium
CN110061949B (en) Method and device for acquiring information
CN111353133B (en) Image processing method, device and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination