CN111125605B

CN111125605B - Page element acquisition method and device

Info

Publication number: CN111125605B
Application number: CN201911407963.2A
Authority: CN
Inventors: 徐彦卿; 周梦席
Original assignee: Beijing Chuangxin Journey Network Technology Co ltd
Current assignee: Mafengwo Guizhou Tourism Group Co ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2022-07-29
Anticipated expiration: 2039-12-31
Also published as: CN111125605A

Abstract

The embodiment of the application provides a page element obtaining method and device. The method of the present application comprises: acquiring an html file of a target webpage; according to the html file, obtaining first identification information corresponding to each page element in the html file, wherein the first identification information comprises: the type and the first identification of each page element, the first identification is used for uniquely indicating each page element, and the first identification is based on an html format; processing the first identifier according to the target format to obtain second identifier information based on the target format, wherein the second identifier information comprises: the page type comprises the type of each page element, a first identifier and a first preset character corresponding to the first identifier. The method and the device have the advantages that the html format-based identification of the target webpage is automatically acquired, the html format-based identification is converted into the target format-based identification, and the efficiency and the accuracy of the acquired target format-based page element identification are improved.

Description

Page element acquisition method and device

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method and a device for acquiring page elements.

Background

A web page is a basic element constituting a website, wherein the web page may include one display area composed of one or more page tags, i.e., hypertext markup Language (HTML) tags, called page elements, such as words, hyperlinks, buttons, input boxes, drop-down boxes, and the like. After the web page is designed, a tester often tests a User Interface (UI), and the web page is improved according to the test result. In the process of automatically testing a User Interface (UI), the format of a page element needs to be converted from html format to Cascading Style Sheets (CSS) format, so as to generate a test case.

In the prior art, when the format of a page element is converted from html format to CSS format, the method adopted is as follows: and (3) using a CSS selector or an Xpath mode, finding the page elements required by the test by a tester, wherein the format of the page elements is html format, and manually inputting the format of the page elements into CSS format. Therefore, it is inefficient and less accurate to acquire page elements in the CSS format.

Disclosure of Invention

The embodiment of the application provides a page element obtaining method and device, and the efficiency and the accuracy of obtaining information of page elements based on a target format are improved.

In a first aspect, an embodiment of the present application provides a page element obtaining method, which is applied to a server, and includes:

acquiring an html file of a target webpage;

according to the html file, acquiring first identification information corresponding to each page element in the html file, wherein the first identification information comprises: the type of the page element and the first identification are used for uniquely indicating the page element, and the first identification is based on an html format;

processing the first identifier according to a target format to obtain second identifier information based on the target format, wherein the second identifier information comprises: the page element type, the first identifier and a first preset character corresponding to the first identifier.

Optionally, the method further includes:

if N page elements in the html file have no first identifier but have a first attribute, acquiring first attribute information corresponding to each element in the N page elements according to the html file, wherein the first attribute information comprises: a type of each of the N page elements and the first attribute, the first attribute being used to indicate each of the N page elements, and the first attribute being based on an html format;

processing the first attribute according to a target format to obtain second identification information based on the target format, wherein the second identification information comprises: the type of the page element, the first attribute and a second preset character corresponding to the first attribute.

Optionally, before the processing the first attribute according to the target format and obtaining the second identification information based on the target format, the method further includes:

if the first attributes of the multiple page elements are the same, respectively acquiring second attributes of parent page elements of the multiple page elements;

according to the target format, processing the first attribute to obtain second identification information based on the target format, wherein the processing comprises the following steps:

Processing the second attribute according to a target format, and acquiring second identification information based on the target format, wherein the identification information comprises: the type of the page element, the second attribute, a second preset character corresponding to the second attribute, and an algebra of the parent page element to which the page element belongs.

according to a target format, processing the second attribute and the first attribute to obtain second identification information based on the target format, wherein the identification information comprises: the type of each page element, the second attribute, the first attribute, a second preset character corresponding to the second attribute, and a second preset character corresponding to the first attribute.

Optionally, the method further includes:

storing second identification information of the target page;

and generating a file according to the stored second identification information of the target page.

Optionally, before generating a file according to the stored second identification information of the target page, the method further includes:

selecting second identification information corresponding to a needed page element from the stored second identification information of the target page;

generating a file according to the stored second identification information of the target page, including:

and generating a file according to the selected second identification information.

Optionally, the obtaining of the html file of the target webpage includes:

and acquiring the URL of the target webpage, and acquiring the html file according to the URL.

In a second aspect, an embodiment of the present application provides a page element obtaining apparatus, which is applied to a server, and includes:

the acquisition module is used for acquiring the html file of the target webpage;

a first processing module, configured to obtain, according to the html file, first identification information corresponding to each page element in the html file, where the first identification information includes: the type of the page element and the first identification are used for uniquely indicating the page element, and the first identification is based on an html format;

A second processing module, configured to process the first identifier according to a target format, to obtain second identifier information based on the target format, where the second identifier information includes: the page element type, the first identifier and a first preset character corresponding to the first identifier.

Optionally, the first processing module is further configured to:

the second processing module is further configured to process the first attribute according to a target format to obtain second identification information based on the target format, where the second identification information includes: the type of the page element, the first attribute and a second preset character corresponding to the first attribute.

Optionally, the first processing module is further configured to, if the first attributes of the multiple page elements are the same, respectively obtain second identifiers of parent page elements of the multiple page elements;

The second processing module is configured to, when processing the first attribute according to a target format and obtaining second identification information based on the target format, specifically:

processing the second identifier according to a target format to obtain second identifier information based on the target format, wherein the second identifier information includes: the type of the page element, the second identifier, a first preset character corresponding to the second identifier, and an algebra of the page element belonging to the parent page element.

Optionally, the first processing module is further configured to, if the first attributes of the multiple page elements are the same, respectively obtain second attributes of parent page elements of the multiple page elements;

processing the second attribute according to a target format to obtain second identification information based on the target format, wherein the identification information comprises: the type of the page element, the second attribute, a second preset character corresponding to the second attribute, and an algebra of the parent page element to which the page element belongs.

Optionally, the apparatus further comprises: the storage module and the third processing module;

the storage module is used for storing second identification information of the target page;

and the third processing module is used for generating a file according to the stored second identification information of the target page.

Optionally, before the third processing module generates a file according to the stored second identification information of the target page, the third processing module is further configured to:

when the third processing module generates a file according to the stored second identification information of the target page, the third processing module is specifically configured to:

Optionally, when the obtaining module obtains the html file of the target webpage, the obtaining module is specifically configured to:

In a third aspect, an embodiment of the present application provides an electronic device, including: at least one processor and memory;

the memory stores computer-executable instructions; the at least one processor executes computer-executable instructions stored by the memory to perform the method of any one of the first aspect of the embodiments of the present application.

In a fourth aspect, the present application provides a computer-readable storage medium, in which program instructions are stored, and when the program instructions are executed by a processor, the method according to any one of the first aspect of the embodiments of the present invention is implemented.

In a fifth aspect, this application embodiment provides a program product, which includes a computer program, where the computer program is stored in a readable storage medium, and the computer program can be read by at least one processor of an electronic device from the readable storage medium, and the computer program is executed by the at least one processor to enable the electronic device to implement the method according to any one of the first aspect of the application embodiment.

The embodiment of the application provides a method and a device for acquiring page elements, wherein a server acquires html files of a target webpage; according to the html file, acquiring first identification information corresponding to each page element in the html file, wherein the first identification information comprises: the type and the first identification of each page element, the first identification is used for uniquely indicating each page element, and the first identification is based on an html format; processing the first identifier according to the target format to obtain second identifier information based on the target format, wherein the second identifier information comprises: the page type comprises the type of each page element, a first identifier and a first preset character corresponding to the first identifier. The html format-based identification of the target webpage is automatically obtained, and the html format-based identification is converted into the target format-based identification, so that a tester is not required to obtain the html format-based identification in an html file of the target webpage any more, and then the html format-based identification is manually changed into the target format-based identification. The efficiency and the accuracy of obtaining the identification of the page element based on the target format are improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.

Fig. 1 is an application scenario diagram provided in an embodiment of the present application;

fig. 2 is a flowchart of a page element obtaining method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an interface provided in an embodiment of the present application;

fig. 4 is a schematic structural diagram of a page element obtaining apparatus according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 6 is a block diagram of the page element obtaining apparatus 20 according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Fig. 1 is an application scenario diagram provided in an embodiment of the present application. As shown in fig. 1, the

terminal device

101 and 104 communicate with the server 105, and for example, when the tester obtains the target format-based page element of the webpage through the

terminal device

101 and 104, the server 105 obtains the target format-based page element of the webpage by using the method shown in the following embodiment.

Fig. 2 is a flowchart of a page element obtaining method according to an embodiment of the present application, and as shown in fig. 2, the method according to the embodiment of the present application is applied to a server. The method of the embodiment may include:

s201, obtaining the html file of the target webpage.

In this embodiment, a web page includes many page elements, for example, an input box, a button, a drop-down box, a hyperlink, and the like, and when the UI test is performed on the web page, the page elements need to be tested, for example, for the input box, the rule of the content that the input box allows to be tested. Therefore, when UI testing of a web page is performed, it is necessary to be able to find a page element to be tested.

The web page displayed in front of the user is designed by the developer through source code, for example, through HTML code. When a developer designs a web page through HTML code, page elements in the web page are in HTML format, and when a UI test is performed on the web page, the HTML format needs to be converted into other formats, such as CSS format. In the embodiment of the present application, the target format is the CSS format for example.

In the embodiment of the application, when the html format is converted into the target format, for example, the CSS format, a web page to be tested (i.e., a target web page) needs to be opened by a browser, and a developer mode is entered to obtain an html file of the target web page. The html file includes identification information of each page element on the target web page, for example, the type of the page element, the identification of the page element, and the attribute of the page element. The types of the page elements can be classified into characters, hyperlinks, buttons, input boxes, drop-down boxes and the like.

Optionally, one possible implementation manner of obtaining the html file is as follows: and acquiring a Uniform Resource Locator (URL) of the target webpage, and acquiring the html file according to the URL address. The user inputs the URL of the target webpage through the terminal equipment, the server obtains the URL input by the user, and the developer mode of the target webpage is automatically opened according to the URL of the target webpage, so that the html file of the target webpage is obtained.

S202, acquiring first identification information corresponding to each page element in the html file according to the html file.

Wherein the first identification information includes: the type and the first identification of each page element, the first identification is used for uniquely indicating each page element, and the first identification is based on html format.

In this embodiment, after the html file is obtained, the content of the html file is automatically analyzed, and information of each page element in the html file is obtained. For example, according to the generation rule of the CSS Selector, the content of the html file is parsed, each page element is located, and the information of each page element is acquired.

When the UI test of the web page is performed, it is necessary to be able to find the page element to be tested, and the identification of the page element can uniquely identify the page element. Therefore, according to the generation rule of the CSS Selector, the identification of the page element may be acquired first. For example, for a page element on the target web page, i.e. a table, the corresponding code on the html file is:

form action＝“/login/”method＝“post”id＝“_j_login_form”

according to the code corresponding to the page element form on the html file, the identification information of the page element can be obtained, and the identification information corresponding to the page element form includes: the type of the page element is form, and the first identifier is the value of id of the page element form, i.e., -j _ logic _ form. The page element form can be determined by the identifier _ j _ logic _ form of the page element form.

S203, processing the first identification according to the target format to obtain second identification information based on the target format.

Wherein the second identification information includes: the page type identification method comprises the steps of identifying the type of each page element, a first identification and a first preset character corresponding to the first identification.

In this embodiment, for each piece of acquired identification information, the format of the identification of the page element is converted according to the target format. For example, a preset symbol corresponding to the identifier is added before the obtained identifier based on the html format, for example, in the CSS format, for the identifier of the page element, a symbol "#" needs to be set before the identifier, so that for the obtained first identifier, a #, and the identifier based on the CSS format is obtained. For example, for the identification value _ j _ logic _ form of form obtained in S202, the corresponding CSS format is # j _ logic _ form. And then automatically acquiring second identification information based on the CSS format according to the type of the page element in the first identification information. And uniquely identifying the corresponding page element through the second identification information.

In this embodiment, the server obtains the html file of the target webpage; according to the html file, obtaining first identification information corresponding to each page element in the html file, wherein the first identification information comprises: the type and the first identification of each page element, the first identification is used for uniquely indicating each page element, and the first identification is based on an html format; processing the first identifier according to the target format to obtain second identifier information based on the target format, wherein the second identifier information comprises: the page type comprises the type of each page element, a first identifier and a first preset character corresponding to the first identifier. The html format-based identification of the target webpage is automatically obtained, and the html format-based identification is converted into the target format-based identification, so that a tester is not required to obtain the html format-based identification in an html file of the target webpage any more, and then the html format-based identification is manually changed into the target format-based identification. The efficiency and the accuracy of obtaining the identification of the page element based on the target format are improved.

Optionally, for the page elements on the target web page, not every page element has an identifier for identifying the page element, that is, in the html file, some page elements have no identifier, and the access identifies the page element by using the identifier. Thus, a page element may also be identified by its attributes. Accordingly, the method of the present application further comprises:

s301, if the N page elements in the html file do not have the first identification but have the first attribute, acquiring first attribute information corresponding to each element in the N page elements according to the html file.

Wherein the first attribute information includes: the type and the first attribute of each page element in the N page elements, the first attribute is used for indicating each page element in the N page elements, and the first attribute is based on the html format.

In this embodiment, for any page element that does not have the identification list existence attribute, according to the generation rule of the CSS Selector, the identification of the page element may be obtained first. For example, for a page element on the target web page, i.e. input box (input), the corresponding code on the html file is:

input data-v-7afcdb40 ═ data-v-54cdc180 ═ type ═ text ═ placeholder ═ please fill your mobile phone number "data-name ═ reserve-person-phone" class ═ reserve-person-cell-input "

According to the code corresponding to the page element input in the html file, it can be known that the page element input does not have a corresponding id value, but the page element input has a corresponding attribute (class), and the value of the class is a reserve-person-cell-input, so that the value of the class of the page element input can be obtained. Thus, the attribute information of the page element is obtained, and the attribute information corresponding to the page element input includes: the type of the page element is input, and the first attribute is the value of class of the page element input, namely, reserve-person-cell-input. The page element input can be determined through the attribute of the page element input, namely, reserve-person-cell-input.

S302, processing the first attribute according to the target format to obtain second identification information based on the target format.

Wherein the second identification information includes: the type of each page element, the first attribute and a second preset character corresponding to the first attribute.

In this embodiment, after the attribute information of the page element is obtained, the format of the attribute of the page element is converted according to the target format. For example, a preset character corresponding to the attribute is added before the obtained attribute based on the html format, for example, in the CSS format, a symbol ". before the attribute of the page element, therefore, for the obtained first attribute, the attribute based on the CSS format is obtained after the first attribute is added. For example, for the attribute value of input obtained in S301, the CSS format corresponding to the attribute value of input is. And then automatically acquiring second identification information based on the CSS format according to the type of the page element in the first attribute information. And uniquely identifying the corresponding page element through the second identification information. Therefore, the condition that the page elements without the identification (id) cannot be accurately determined is avoided.

Since the attribute (class) value of a page element is not the same as the value of the identification (id), the value of the id of each page element is unique. Therefore, optionally, before S302, the method further includes:

and S303, if the first attributes of the multiple page elements are the same, respectively acquiring second identifiers of parent page elements of the multiple page elements.

Wherein the second identifier is used for uniquely identifying the parent page element.

In this embodiment, after obtaining the value of class of the page element of the target web page, it is determined whether there is a case where the value of class is the same, and if there is a case where the values of class of at least two page elements are the same, the identifier of the parent-level page element of each of all the page elements having the same value of class is obtained from the html file, and is recorded as the second identifier. For example, if the page element is an input box and the input box 1 further includes an input box 11 and a drop-down box 12, the input box 1 is a parent page element of the input box 11 and the drop-down box 12.

Accordingly, one possible implementation manner of S302 is: and processing the second identifier according to the target format to acquire second identifier information based on the target format.

Wherein the second identification information includes: the type of the page element, the second identifier, the first preset character corresponding to the second identifier and the algebra of the parent page element to which the page element belongs.

In this embodiment, after the second identifier of the parent page element of the page element is obtained, the format of the identifier of the parent page element of the page element is converted according to the target format. For example, if the attribute value of the page element input is reserve-person-cell-input, the parent page element is form, and the identification value is _ j _ logic _ form, then according to the CSS format, the format of the second identification information is: # _ j _ logic _ form input: nth-child (1). Wherein nth-child (1) indicates that the page element input belongs to the first generation of which the parent page element is form. After the second identification information is determined, determining the page element through the input in the second identification information, determining the parent page element of the bin element input as form according to nth-child (1), and uniquely determining the page element form according to # _ j _ logic _ form. So that when a plurality of page elements have the same value of class, the respective parent page elements are determined, and thus the page element can be uniquely determined.

Wherein, when the parent page element of the page element has no corresponding identification value, optionally, S is before S302, the method further includes:

s304, if the first attributes of the multiple page elements are the same, respectively obtaining second attributes of parent page elements of the multiple page elements.

In this embodiment, after obtaining the value of class of the page element of the target web page, it is determined whether there is a case where the value of class is the same, if there is a case where the values of class of at least two page elements are the same, the identifier of the parent page element of each of all the page elements having the same value of class is obtained from the html file, and if there is no identifier but there is an attribute in the parent page element of any page element, the attribute value of the parent page element is obtained. For example, if the page element is an input box and the input box 1 further includes an input box 11 and an input box 12, the input box 1 is a parent page element of the input box 11 and the input box 12. Also, the type of the parent page element of the page element may be the same as the type of the page element. For example, for page element input11, its class has a value of reserve-person-cell-input _11, and the class of parent page element input1 has a value of reserve-person-cell-input _ 1.

Accordingly, one possible implementation manner of S302 is: processing the second attribute according to the target format to obtain second identification information based on the target format, wherein the identification information comprises: the type of each page element, the second attribute, a second preset character corresponding to the second attribute and an algebra of the page element belonging to the parent page element.

In this embodiment, after the second attribute of the parent page element of the page element is obtained, the format of the attribute of the parent page element of the page element is converted according to the target format. For example, if the attribute value of the page element input11 is reserve-person-cell-input _11, the parent page element thereof is input1, and the class value thereof is reserve-person-cell-input _1, then according to the CSS format, the format of the second identification information is: residual-person-cell-input _1 input: nth-child (1). Wherein nth-child (1) indicates that the page element input1 belongs to the first generation of which the parent page element is input 11. After the second identification information is determined, the type of the page element is determined to be an input frame through input in the second identification information, the parent page element of the page element is determined to be input1 according to nth-child (1), and the page element input1 is determined according to reserve-person-cell-input _1, so that the page element input11 is uniquely determined according to the second identification information. So that when a plurality of page elements have the same value of class, the respective parent page elements are determined, and thus the page element can be uniquely determined.

It should be noted that, when the attribute values of the parent page elements of at least two page elements are the same, and the identification value of the parent page element is obtained, and the identification value of the parent page element does not exist, the attribute value of the parent page element is obtained until the second identification information can uniquely identify the page element. In the second identification information, the algebra of the page element belonging to the last acquired parent page element is recorded.

Optionally, on the basis of any of the above embodiments, the method further includes:

and S204, storing second identification information of the target page.

In this embodiment, the obtained second identification information of the page element in the target page is saved.

And S205, generating a file according to the stored second identification information of the target page.

In this embodiment, for any page element, second identification information of a stored target page is obtained, if the second identification information is an identification of the page element, a variable name is defined for the identification of the page element, and the identification of the page element is represented by the variable name. For example, if the identification value of the page element form is _ j _ logic _ form, the variable name defined for it may be: logic _ form. Wherein, the variable name can be defined in any way. If the second identification information is the attribute of the page element, defining a variable name for the identification of the page element, and representing the attribute of the page element through the variable name. Wherein, the variable name corresponding to each page element is different.

After the variable name is defined for the identifier or attribute of each page element, the generated file may be a yaml file. The yaml file is then placed into an automated testing framework for UI automated testing.

Optionally, before S205, the method further includes: selecting second identification information corresponding to a needed page element from the stored second identification information of the target page;

accordingly, one possible implementation manner of S205 is: and generating a file according to the selected second identification information.

In this embodiment, as shown in fig. 3, the second identification information stored in the storage medium is displayed on the interface, and a variable is defined for each identifier or attribute by clicking the definition variable. Then, the second identification information of the required page element is selected through a selection frame on the interface, and a yaml file is generated according to the selected second identification information. Here, "input" in fig. 3 represents, for example, an input box, "button" represents, for example, a button, "a" represents, for example, a link, "div" may define a partition or a section in a document, and "p" represents, for example, an input box for placing a picture in a web page.

When second identification information corresponding to the page element of the target page is displayed on the interface, the type of the page element in each second identification information is displayed, and the identification value or the attribute value is based on the CSS format.

In addition, when the second identification information of the required page element is selected through the selection frame on the interface, the selection can be performed according to the type of the page element, for example, the type of the page element displayed at the upper left position of the interface is clicked, so that the second identification information of the page element corresponding to the page type is displayed below the interface.

Fig. 4 is a schematic structural diagram of a page element obtaining apparatus provided in an embodiment of the present application, and is applied to a server, and as shown in fig. 4, the apparatus in this embodiment may include: an acquisition module 41, a first processing module 42 and a second processing module 43. Wherein, optionally, the apparatus may include: a storage module 44 and a third processing module 45. Wherein,

an obtaining module 41, configured to obtain an html file of a target web page;

the first processing module 42 is configured to obtain, according to the html file, first identification information corresponding to each page element in the html file, where the first identification information includes: the page element identification method comprises the steps of the type and a first identification of a page element, wherein the first identification is used for uniquely indicating the page element and is based on an html format;

a second processing module 43, configured to process the first identifier according to the target format, to obtain second identifier information based on the target format, where the second identifier information includes: the page element comprises a type of the page element, a first identifier and a first preset character corresponding to the first identifier.

Optionally, the first processing module 42 is further configured to: if the first identifier does not exist in the N page elements in the html file but the first attribute exists in the N page elements in the html file, acquiring first attribute information corresponding to each element in the N page elements according to the html file, wherein the first attribute information comprises: the type and the first attribute of each page element in the N page elements, the first attribute is used for indicating each page element in the N page elements, and the first attribute is based on an html format;

the second processing module 43 is further configured to process the first attribute according to the target format, and obtain second identification information based on the target format, where the second identification information includes: the page element type, the first attribute and a second preset character corresponding to the first attribute.

Optionally, the first processing module 42 is further configured to, if the first attributes of the multiple page elements are the same, respectively obtain second identifiers of parent page elements of the multiple page elements;

the second processing module 43 processes the first attribute according to the target format, and when obtaining the second identification information based on the target format, is specifically configured to: processing the second identifier according to the target format to obtain second identifier information based on the target format, wherein the second identifier information comprises: the type of the page element, the second identifier, the first preset symbol corresponding to the second identifier and the algebra of the page element belonging to the parent page element.

Optionally, the first processing module 42 is further configured to, if the first attributes of the multiple page elements are the same, respectively obtain second attributes of parent page elements of the multiple page elements;

the second processing module 43 processes the first attribute according to the target format, and when obtaining the second identification information based on the target format, is specifically configured to: processing the second attribute according to the target format, and acquiring second identification information based on the target format, wherein the identification information comprises: the type of the page element, the second attribute, a second preset character corresponding to the second attribute and an algebra of the page element belonging to the parent page element.

Optionally, the storage module 44 is configured to store second identification information of the target page;

and the third processing module 45 is configured to generate a file according to the stored second identification information of the target page.

Optionally, before the third processing module 45 generates the file according to the second identification information of the stored target page, the third processing module is further configured to: selecting second identification information corresponding to the needed page element from the stored second identification information of the target page;

when the third processing module 45 generates the file according to the stored second identification information of the target page, the third processing module is specifically configured to: and generating a file according to the selected second identification information.

Optionally, when the obtaining module 41 obtains the html file of the target webpage, the obtaining module is specifically configured to: and acquiring the URL of the target webpage, and acquiring the html file according to the URL.

The apparatus of this embodiment may be configured to implement the technical solution of any one of the above-mentioned method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.

Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application, and as shown in fig. 5, the electronic device according to the embodiment may include: at least one processor 51 and a memory 52. Fig. 5 shows an electronic device as an example of a processor, wherein,

and a memory 52 for storing programs. In particular, the program may include program code comprising computer operating instructions. The memory 52 may comprise a Random Access Memory (RAM) and may also include a non-volatile memory (e.g., at least one disk memory).

The processor 51 is configured to execute the computer-executable instructions stored in the memory 52 to implement the page element obtaining method in any of the above embodiments.

The processor 51 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement the embodiments of the present Application.

Alternatively, in a specific implementation, if the memory 52 and the processor 51 are implemented independently, the memory 52 and the processor 51 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The buses may be divided into address buses, data buses, control buses, etc., but do not represent only one bus or one type of bus.

Alternatively, in a specific implementation, if the memory 52 and the processor 51 are integrated on a chip, the memory 52 and the processor 51 may complete the same communication through an internal interface.

The electronic device described above in this embodiment may be configured to execute the technical solutions shown in the above method embodiments, and the implementation principles and technical effects are similar, which are not described herein again.

Fig. 6 is a block diagram of the page element obtaining apparatus 20 according to an embodiment of the present application. For example, the apparatus 20 may be provided as a server. Referring to fig. 6, the apparatus 20 includes a processing component 21, which further includes one or more processors, and memory resources, represented by memory 22, for storing instructions, such as applications, that are executable by the processing component 21. The application programs stored in memory 22 may include one or more modules that each correspond to a set of instructions. Furthermore, the processing component 21 is configured to execute instructions to perform the data transmission method shown in any of the embodiments described above.

The apparatus 20 may also include a power component 23 configured to perform power management of the apparatus 20, a wired or wireless network interface 24 configured to connect the apparatus 20 to a network, and an input/output (I/O) interface 25. The apparatus 20 may operate based on an operating system stored in the memory 23, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media capable of storing program codes, such as Read-Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disk, and the like.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A page element acquisition method is applied to a server, and comprises the following steps:

acquiring an html file of a target webpage;

according to the html file, acquiring first identification information corresponding to each page element in the html file, wherein the first identification information comprises: the type and the first identification of the page element are used for uniquely indicating the page element, and the first identification is based on an html format;

processing the first identifier according to a target format to obtain second identifier information based on the target format, wherein the second identifier information comprises: the type of the page element, the first identifier and a first preset character corresponding to the first identifier;

the obtaining of the first identification information corresponding to each page element in the html file according to the html file includes: analyzing the html file according to a generating rule of a CSS Selector, positioning each page element, and acquiring first identification information corresponding to each page element;

If the first attributes of the multiple page elements are the same, respectively acquiring second identifiers of parent page elements of the multiple page elements; processing the second identifier according to the target format to obtain second identifier information based on the target format, where the second identifier information further includes: the second identification, a first preset character corresponding to the second identification and an algebra of the page element belonging to the parent page element are obtained;

if the first attributes of the multiple page elements are the same and when the parent page elements of the multiple page elements do not have corresponding identification values, respectively acquiring second attributes of the parent page elements of the multiple page elements; processing the second attribute according to a target format to obtain second identification information based on the target format, where the second identification information further includes: the second attribute and a second preset character corresponding to the second attribute.

2. The method of claim 1, further comprising:

the second identification information further includes: the first attribute and a second preset character corresponding to the first attribute.

3. The method of claim 1 or 2, further comprising:

Storing second identification information of the target page;

4. The method according to claim 3, wherein before generating the file according to the second identification information of the stored target page, the method further comprises:

selecting second identification information corresponding to the needed page element from the stored second identification information of the target page;

5. The method of claim 1, wherein obtaining the html file for the target web page comprises:

and acquiring a uniform resource location system (URL) of the target webpage, and acquiring the html file according to the URL.

6. A page element acquisition device, which is applied to a server, the device comprising:

the first acquisition module is used for acquiring the html file of the target webpage;

a first processing module, configured to obtain, according to the html file, first identification information corresponding to each page element in the html file, where the first identification information includes: the type and the first identification of the page element are used for uniquely indicating the page element, and the first identification is based on an html format;

A second processing module, configured to process the first identifier according to a target format to obtain second identifier information based on the target format, where the second identifier information includes: the type of the page element, the first identifier and a first preset character corresponding to the first identifier;

the first processing module is specifically configured to: analyzing the html file according to a generating rule of a CSS Selector, positioning each page element, and acquiring first identification information corresponding to each page element;

a first processing module further configured to: if N page elements in the html file have no first identifier but have a first attribute, acquiring first attribute information corresponding to each element in the N page elements according to the html file, wherein the first attribute information comprises: a type of each of the N page elements and the first attribute, the first attribute being used to indicate each of the N page elements, and the first attribute being based on an html format;

the first processing module is further configured to obtain second identifiers of parent page elements of the multiple page elements respectively if the first attributes of the multiple page elements are the same;

The second processing module is configured to, when processing the first attribute according to the target format and obtaining second identification information based on the target format, specifically:

processing the second identifier according to a target format to obtain second identifier information based on the target format, wherein the second identifier information further includes: the second identifier, a first preset symbol corresponding to the second identifier and an algebra of the parent page element to which the page element belongs;

the first processing module is further configured to obtain second attributes of parent page elements of the multiple page elements respectively if the first attributes of the multiple page elements are the same and if the parent page elements of the multiple page elements do not have corresponding identification values;

the second processing module is configured to, when processing the first attribute according to the target format and obtaining second identification information based on the target format, specifically: processing the second attribute according to the target format to obtain second identification information based on the target format, wherein the second identification information further comprises: the second attribute and a second preset character corresponding to the second attribute.

7. An electronic device, comprising: a memory for storing program instructions and at least one processor for calling program instructions in the memory to perform the page element retrieval method of any one of claims 1 to 5.

8. A readable storage medium, characterized in that the readable storage medium has stored thereon a computer program; the computer program, when executed, implementing the page element obtaining method of any of claims 1-5.