CN117992689A

CN117992689A - Method and system for improving webpage performance by repeatedly utilizing webpage response result

Info

Publication number: CN117992689A
Application number: CN202410396567.9A
Authority: CN
Inventors: 吴浩然; 朱晶晶; 易航
Original assignee: Guancheng Information Technology Suzhou Co ltd
Current assignee: Guancheng Information Technology Suzhou Co ltd
Priority date: 2024-04-03
Filing date: 2024-04-03
Publication date: 2024-05-07
Anticipated expiration: 2044-04-03
Also published as: CN117992689B

Abstract

The invention provides a method and a system for improving webpage performance by recycling webpage response results, which relate to the technical field of webpages and comprise the steps of identifying cacheable contents in a target webpage, determining the resource type of each cacheable content, and respectively determining the content change condition of the cacheable content under each resource type; wherein the resource types include static resources and dynamic resources; setting a cache time length matched with the corresponding content change condition for each cacheable content by configuring an HTTP cache head to obtain a configured target webpage; the caching time length of the static resource is longer than that of the dynamic resource; and dynamically distributing the access request initiated by the user to the configured target webpage to the target server according to the actual load and the actual response time of the server adopting the load balancing mechanism, and providing the cacheable content of the target webpage for the user by matching the target CDN node of the user.

Description

Method and system for improving webpage performance by repeatedly utilizing webpage response result

Technical Field

The invention relates to a webpage technology, in particular to a method and a system for improving webpage performance by recycling webpage response results.

Background

At present, the method for improving the webpage performance mainly comprises the following three optimization strategies:

1. Image and multimedia optimization

Policy description-images and videos are typically the largest resources in a web page. By using techniques such as format compression, picture lazy loading, responsive pictures, etc., the size and loading time of these resources can be significantly reduced.

The technical disadvantage of this strategy is that over-compression may lose image quality. At the same time, additional development and testing effort is required for complex image processing and multimedia format compatibility.

2. Using Content Delivery Network (CDN)

Policy description the CDN may distribute content over servers worldwide enabling users to quickly obtain content from geographically closer locations, reducing latency.

The technical disadvantage is that while a CDN may increase access speed and extend the global access capability of a website, it may involve higher costs and may affect the usability of the website when problems occur with CDN providers.

3. Front-end resource optimization (e.g., compression and merging of JavaScript and CSS files)

Policy description several CSS or JavaScript files are combined into one file and compressed to reduce the number and total volume of HTTP requests.

A technical disadvantage of this approach is that it may cause a single file to become very large, affecting parsing and execution time. Furthermore, improper merging may cause render blocking at page loading.

Each strategy has its applicable scenario, but also has certain limitations. In the actual optimization process, it is often necessary to combine various strategies and make custom adjustments for specific websites and user groups.

Disclosure of Invention

The embodiment of the invention provides a method and a system for improving webpage performance by recycling webpage response results, which at least can solve part of problems in the prior art.

In a first aspect of an embodiment of the present invention,

The method for improving the webpage performance by recycling the webpage response result comprises the following steps:

Identifying cacheable contents in a target webpage, determining the resource type of each cacheable content, and respectively determining the content change condition of the cacheable content under each resource type; wherein the resource types include static resources and dynamic resources;

Setting a cache time length matched with the corresponding content change condition for each cacheable content by configuring an HTTP cache head to obtain a configured target webpage; the caching time length of the static resource is longer than that of the dynamic resource;

Dynamically distributing a user initiated access request to a configured target webpage to a target server according to the actual load and the actual response time of the server adopting a load balancing mechanism, and providing cacheable content of the target webpage for the user through a target CDN node matched with the user; and the target CDN node is at least cached with the static resources of the target webpage.

In an alternative embodiment of the present invention,

Identifying cacheable content in the target webpage, determining the resource type to which each cacheable content belongs, and respectively determining the content change condition of the cacheable content under each resource type, wherein the method comprises the following steps:

Listing all loading resources in the target webpage by using a website analysis tool, and dividing each loading resource into the static resources or the dynamic resources according to whether the loading resources are frequently changed or not; wherein the static resources include: an image, a CSS style sheet, and a JavaScript file; the dynamic resources include: API response, user-specific content;

And analyzing the update frequency and the update mode of each type of loading resource in the log record of the target webpage obtained by a version control system or a manual recording mode, and establishing a cache duration strategy reference corresponding to the update frequency and the update mode.

In an alternative embodiment of the present invention,

Setting a cache time length matched with a corresponding content change condition for each cacheable content by configuring an HTTP cache header to obtain a configured target webpage, wherein the method comprises the following steps:

When the Cache-Control header is used, configuring the static resource with the first change frequency in the content change condition as: the method comprises the steps that Cache-Control is public, max-age=XXXXXX, wherein XXXX is the number of seconds that corresponding resources should be cached, and Cache-Control is used for dynamic resources with second change frequency in the content change condition, so that corresponding content can be ensured to acquire the latest version from a corresponding server, the first change frequency refers to change frequency being lower than preset frequency, and the second change frequency refers to change frequency being higher than preset frequency;

When using the Expires header, setting an Expires header for the cacheable resource having the first change frequency in the content change case, so as to avoid that the corresponding cacheable resource is not required to be re-requested before the future time;

when the Last-Modified header is used, adding the Last-Modified header to each cacheable resource as a time tag for marking the Last modification time;

When utilizing the ETag header, generating a unique ETag value for each of the cacheable resources for identifying a particular version for determining whether the version of the corresponding cacheable resource has been altered;

Setting the Cache time length matched with the content change condition corresponding to each cacheable content according to the Cache policy abstracted by the mode for various HTTP headers comprising the Cache-Control, the expies, the Last-Modified and the ETag; and the caching time length of the static resource is longer than that of the dynamic resource.

In an alternative embodiment of the present invention,

The method further comprises the steps of:

And adjusting the cache strategy for periodic review according to the actual change of the cacheable content in the target website and the access mode of the user.

In an alternative embodiment of the present invention,

The dynamic allocation of the user initiated access request to the configured target webpage to the target server according to the actual load and the actual response time of the server adopting the load balancing mechanism comprises the following steps:

Selecting a load balancer matched with actual demands from a software layer and a hardware layer respectively, installing the selected load balancer software on the server, and performing basic configuration including defining a server pool, designating monitoring and health checking parameters on the load balancer;

Adding all server addresses to be balanced in the load balancer configuration which completes the basic configuration, setting different weights for each server, and distributing requests according to the processing capacity and the flow of the servers, so that the servers which complete the configuration are added into a preset pool; performing health check on the configured servers, and removing servers which are not subjected to the health check from the preset pool until the servers are recovered to be normal;

and selecting a target load balancing strategy which accords with the actual situation from the load balancing strategies of polling, minimum connection or resource-based allocation, and dynamically allocating the access request of the user to the configured target webpage to the target server according to the actual load and the actual response time of the server adopting the target load balancing strategy.

In an alternative embodiment of the present invention,

The method further comprises the steps of:

the key content with the greatest influence on the user access experience is identified by carrying out depth analysis on the content structure and key elements of the target webpage; wherein, the key content includes: a body text, a core image;

And detecting the change of the key content by utilizing a hash or other fingerprint technology, triggering one cache refresh only when the key content is changed, and keeping the other parts of the target webpage except the key content unchanged.

In an alternative embodiment of the present invention,

The method further comprises the steps of:

The data processing and content caching are distributed on the network edge close to the position of the user in advance by utilizing an edge computing technology, and cacheable content forming a webpage is cached on an edge node close to the geographic position of the user;

By identifying the capabilities and storage capabilities of the device that originated the access request, the cache size is dynamically adjusted to preferentially allocate more cache space for the high frequency access resource and to cache more content when the remaining storage space is sufficient.

In a second aspect of an embodiment of the present invention,

The system for improving webpage performance by recycling webpage response results comprises:

The first unit is used for identifying cacheable contents in the target webpage, determining the resource type of each cacheable content, and respectively determining the content change condition of the cacheable contents under each resource type; wherein the resource types include static resources and dynamic resources;

A second unit, configured to set a cache duration matched with a corresponding content change condition for each cacheable content by configuring an HTTP cache header, so as to obtain a configured target web page; the caching time length of the static resource is longer than that of the dynamic resource;

The third unit is used for dynamically distributing the access request initiated by the user to the configured target webpage to the target server according to the actual load and the actual response time of the server adopting the load balancing mechanism, and providing the cacheable content of the target webpage for the user through a target CDN node matched with the user; and the target CDN node is at least cached with the static resources of the target webpage.

In a third aspect of an embodiment of the present invention,

There is provided an electronic device including:

A processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the instructions stored in the memory to perform the method described previously.

In a fourth aspect of an embodiment of the present invention,

There is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method as described above.

According to the method and the device for processing the content, the cacheable content in the target webpage is accurately identified, and the proper caching time is set for the content, so that the processing request of the server for the repeated content can be remarkably reduced, the loading speed of the webpage is increased, the static resource has longer caching time, and the user can rapidly load the content from a local cache or CDN node with a relatively short distance when accessing the webpage again, so that the data transmission time is reduced. By adopting a load balancing mechanism, the user requests are dynamically distributed according to the actual load and response time of the servers, and the load pressure of each server can be effectively balanced, so that the processing efficiency of the servers is improved, the risk of single-point overload is reduced, the service life of the servers is prolonged, and the operation cost is reduced. By matching the target CDN node of the user to provide the static resource, the user can be ensured to acquire data from the nearest node in the geographic position, and the loading time of the webpage content is further reduced. This optimization not only increases the response speed of the web page, but also improves the overall access experience of the user, especially for those users who repeatedly access the web page.

Drawings

FIG. 1 is a flow chart of a method for improving webpage performance by reusing webpage response results according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a system for improving webpage performance by reusing webpage response results according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The technical scheme of the invention is described in detail below by specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.

Fig. 1 is a flow chart of a method for improving web page performance by reusing web page response results according to an embodiment of the present invention, as shown in fig. 1, the method includes:

S101, identifying cacheable contents in a target webpage, determining the resource type of each cacheable content, and respectively determining the content change condition of the cacheable contents under each resource type; wherein the resource types include static resources and dynamic resources;

S102, setting a cache time length matched with the corresponding content change condition for each cacheable content by configuring an HTTP cache head, and obtaining a configured target webpage; the caching time length of the static resource is longer than that of the dynamic resource;

S103, dynamically distributing a user initiated access request to the configured target webpage to a target server according to the actual load and the actual response time of the server adopting a load balancing mechanism, and providing cacheable content of the target webpage for the user through a target CDN node matched with the user; and the target CDN node is at least cached with the static resources of the target webpage.

In an alternative embodiment of the present invention,

The method further comprises the steps of:

and according to the actual change of the cacheable content in the target website and the access mode of the user, the caching strategy is adjusted in a periodical review mode.

In an alternative embodiment of the present invention,

The method further comprises the steps of:

In an alternative embodiment of the present invention,

The method further comprises the steps of:

Further, the framework for giving the overall technical scheme of the application can comprise:

Identifying cacheable content:

web page content is analyzed to identify which content is static or infrequently changing, such as pictures, CSS, javaScript files.

And judging the change frequency of the content, namely determining the update frequency of the resources so as to reasonably set the caching strategy.

The application also provides a specific embodiment, in particular:

data collection-all loaded resources, including pictures, CSS files, javaScript files, fonts, etc., are listed using a website analysis tool (e.g., *** Chrome DevTools).

Static resource identification, namely classifying the resources into two types of static and dynamic. Static resources typically include files that are not frequently altered, such as images, CSS style sheets, javaScript files, and the like.

Dynamic resource identification-identifying resources, such as API responses and user-specific content, that are frequently updated or changed upon user request.

A resource change log is created, logging, that is, for the primary resources of the web site, their update history is recorded. May be recorded by a version control system or manually.

Analyzing an update mode; historical data analysis may be included to analyze the update frequency and pattern of each resource by logging. For example, some pictures may change little, while the response of some APIs may change every day.

Setting a cache policy reference; the static resource is that for the static resource with low updating frequency, a longer buffering time can be set. Dynamic resources-for resources that are frequently updated, shorter cache times should be set or more complex cache policies should be used, such as cache invalidation based on content changes.

Client side caching is realized: HTTP Cache header is configured to Control client (browser) Cache using Cache-Control, expire, last-Modified, and ETag etc. HTTP header. Setting reasonable caching time length, namely setting caching time according to the updating frequency of the content. Specifically:

Configuring an HTTP cache header;

using a Cache-Control head;

Basic setting is to set Cache-Control for static resources, max-age=xxxx, where XXXX is the time (in seconds) that a resource should be cached.

Dynamic content setting, namely, for dynamic content, using a Cache-Control no-Cache or a Cache-Control no-store to ensure that the content always obtains the latest version from a server.

Applying an Expires header;

The expiration time is set-for those resources that are not updated very often, a future expire header can be set so that the browser knows that the resource need not be re-requested before this date.

Using a Last-Modified header;

And (3) adding a Last-Modified header to each resource, and marking the Last Modified time of the resource. This is useful for the browser to verify whether the resource is updated in a subsequent access.

Utilizing an ETag header;

unique identification-a unique ETag value (a particular version identification of a resource) is generated for each resource to assist the browser in determining whether the resource has changed.

Analyzing the resource update frequency; and the data driving step is to determine the cache time lengths of different resource types according to the resource updating frequency.

A static resource cache duration; including long-term caching, a longer cache duration (e.g., one year) may be set for static resources that are not changed frequently, such as pictures, CSS, and JavaScript files.

A dynamic content caching strategy; including short term or no caching, setting a short caching duration or no caching at all for frequently changed content to ensure that the user always gets up to date content.

Updating and adjusting a cache strategy; the method comprises the steps of continuously optimizing, periodically reviewing a caching strategy, and adjusting according to the change of website contents and the access mode of a user.

Load balancing is used;

Load balancers, such as ng inx or HAProxy, are deployed. Requests are dynamically allocated according to server load and response time.

Specifically: a load balancer may be selected; software is selected to select the appropriate load balancer, such as nmginx or HAProxy, as required. Hardware considerations for high-traffic websites, specialized hardware load balancers are considered.

Mounting and configuring;

Installing software, namely installing the selected load balancer software on a server.

Basic configuration-setting up a load balancer, including defining a server pool, specifying parameters for monitoring and health checks.

Setting a server pool;

Adding a server to the pool;

Server list, adding all server addresses to be balanced in load balancer configuration.

Weight distribution, if necessary, setting different weights for each server according to its processing power and flow distribution request.

Configuring health examination;

checking mechanism-a health checking mechanism is implemented to ensure that all requests are sent to a healthy server.

Automatic removal-if the server fails the health check, it is automatically removed from the pool until normal is restored.

Load distribution strategies;

Selection policy selection an appropriate load balancing policy, such as polling, least connection or resource-based allocation.

Dynamic adjustment, namely dynamically adjusting request distribution according to the real-time load and response time of the server.

Taking polling as an example:

Implementing a basic polling mechanism; server list configuration; listing all servers participating in load balancing. Ensuring that each server is able to handle the request.

Setting polling logic; the polling logic is set in the load balancer so that each new request is assigned to the next server in sequence. Ensuring that all servers receive requests in sequence.

Monitoring the performance of the server;

deploying a performance monitoring tool; performance monitoring tools are deployed to track the load conditions and response times of each server. Performance data is collected and analyzed periodically. Real-time load and response time recording; the real-time load (e.g., CPU and memory usage) and response time of each server is recorded. Data support is provided for dynamic adjustment.

Dynamically adjusting request allocation;

Load assessment; the load situation and response time of each server are periodically evaluated. It is determined whether there is a server overload or in a low load state.

Weight adjustment; the weighting mechanism is implemented in the polling logic. For servers with better performance or lower load, the frequency of processing requests is increased. For a server with higher load, the frequency of processing requests is reduced.

An automatic adjustment mechanism;

An automatic system is realized, and the weight of the server is automatically adjusted according to the real-time data. The system can respond to load change quickly, and request distribution is adjusted in real time.

Monitoring performance;

monitoring tool-monitoring tool is used to track the performance and response time of each server. Data analysis-analysis of the data, optimizing load balancing policies to ensure the most efficient resource utilization.

An application CDN (content delivery network);

Select CDN provider, select CDN suitable for web site content and target user group. And caching static resources, namely caching static resources of websites, such as pictures, CSS (CSS) and JavaScript files, at CDN nodes.

Optimizing a caching strategy;

And realizing intelligent cache refreshing, namely adopting a cache refreshing strategy of content perception to refresh the cache only when key content changes. And adjusting the caching strategy, namely continuously adjusting and optimizing the caching strategy according to the website access mode and user feedback.

Optimizing the caching strategy still further includes:

A content-aware intelligent cache refresh mechanism; method logic detects specific changes to content using hashing or other fingerprinting techniques and only flushes the cache when key content (e.g., body text, core picture) changes.

The method and the device have the technical introduction that the part with the greatest influence on the user experience can be identified by carrying out deep analysis on the structure and key elements of the webpage content. Changes in these parts will trigger cache flushes, while other parts remain unchanged, optimizing cache usage efficiency.

Specifically:

Generating an initial hash value; a hash value is generated for each key content (e.g., text segment, picture). Common hash algorithms such as MD5, SHA-1, etc. are used.

Storing the hash value; the generated hash value is stored in a database or a cache system. It is ensured that after each content update, the corresponding hash value is updated.

Monitoring content changes;

Hash values are recalculated on a periodic basis (or based on specific events) for the key content. The new and old hash values are compared to determine if the content has changed.

Identifying the change;

if the hash value changes, the content is marked as updated. If the hash value is unchanged, the current cache state is kept unchanged.

Cache refreshing decision; making a refreshing strategy; a cache refresh policy is determined when critical content is updated. Either full refresh or partial refresh may be selected. Implementing cache updating; according to the policy, the cache is only refreshed when a change in critical content is detected. Avoiding extraneous content changes results in unnecessary cache flushes.

Edge calculation combined cache optimization;

The method logic is used for caching contents on edge nodes near the geographic position of the user so as to reduce data transmission time and improve response speed.

The edge computing technique enables data processing and content caching to be distributed at the edge of the network, close to where the user is located. This can significantly reduce latency and speed up content loading, especially for those services that have geographic location dependencies.

Self-adaptive cache size management;

the method logic dynamically adjusts the buffer memory size according to the storage capacity and the performance of the user equipment, and preferentially allocates more buffer memory space for the high-frequency access resource.

The cache policy may be dynamically adjusted by identifying the capabilities and storage capabilities of the user device. This adaptive approach means that on less resource devices, the system will optimize the use of the buffer space, while on more resource devices, the content can be buffered more aggressively.

Specifically:

Device performance and storage capability assessment; collecting equipment information; hardware information of the user equipment, such as CPU performance, RAM size and memory space, is acquired. JavaScript or server logic is used to evaluate and collect this information.

Evaluating the energy storage capacity; the maximum space available for caching is determined from the storage space of the device. It is contemplated that different cache limits may be set for different types of devices.

User access pattern analysis; tracking the access frequency; and recording the frequency of accessing various resources by the user through log analysis or client script. High frequency access and low frequency access resources are identified.

Behavior pattern recognition; user behavior patterns are identified using a data analysis tool. Resources that distinguish between short-term high frequency access and long-term stable access.

Dynamically adjusting a cache strategy;

Making a cache rule; and according to the equipment performance and the access mode of the user, formulating flexible caching rules. More buffer space is allocated for high frequency access resources.

Self-adaptive adjustment; and monitoring the state of the equipment and the behavior of the user in real time, and dynamically adjusting the cache size and the strategy. And optimizing the buffer allocation, and ensuring the optimal performance and user experience.

Cache efficiency and performance monitoring; the hit rate and load time of the cache are checked regularly. And analyzing the influence of the caching strategy on the webpage performance.

Adjusting a strategy; and adjusting the cache rule and the size according to the monitoring data. The caching strategy is ensured to be effective and not to excessively occupy the resources of the user equipment.

Actual case:

Device-based cache restrictions;

For high-end devices (e.g., RAM. Gtoreq.4 GB, high-speed CPU): the maximum buffer limit is set to 200MB.

For a mid-end device (e.g., RAM. Gtoreq.2 GB, medium speed CPU): the maximum buffer limit is set to 100MB.

For low-end devices (e.g., RAM < 2GB, low-speed CPU): the maximum buffer limit is set to 50MB.

Resource caching based on access frequency;

High frequency access to resources (number of accesses per day exceeds a certain threshold, e.g. 10): the buffering time is prolonged to one week. Medium frequency access resource (2-10 times per week): the buffering time was set to 3 days. Low frequency access to resources (less than 2 times per week): the buffering time is set to 1 day or not.

Special resource treatment;

For large media files (e.g., video, large picture): even for high frequency access, the occupied storage space needs to be considered, and the cache time can be reduced appropriately.

Fig. 2 is a schematic structural diagram of a system for improving web page performance by reusing web page response results according to an embodiment of the present invention, as shown in fig. 2, where the system includes:

In a third aspect of an embodiment of the present invention,

There is provided an electronic device including:

A processor;

a memory for storing processor-executable instructions;

In a fourth aspect of an embodiment of the present invention,

The present invention may be a method, apparatus, system, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for performing various aspects of the present invention.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. A method for improving webpage performance by recycling webpage response results is characterized by comprising the following steps:

2. The method according to claim 1, wherein the identifying cacheable content in the target web page, determining a resource type to which each cacheable content belongs, and determining a content change condition of the cacheable content under each resource type, respectively, includes:

3. The method according to claim 2, wherein the setting a buffer duration matched with the corresponding content change condition for each cacheable content by configuring an HTTP buffer header, to obtain the configured target web page, includes:

4. A method according to claim 3, characterized in that the method further comprises:

5. The method according to claim 4, wherein dynamically allocating user initiated access requests to configured target web pages to target servers based on actual load and actual response time of servers employing load balancing mechanism comprises:

6. The method according to claim 1, wherein the method further comprises:

7. The method according to claim 1, wherein the method further comprises:

8. A system for improving web page performance by reusing web page response results, for implementing the method of any of claims 1-7, comprising:

9. An electronic device, comprising:

A processor;

a memory for storing processor-executable instructions;

Wherein the processor is configured to invoke the instructions stored in the memory to perform the method of any of claims 1 to 7.

10. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1 to 7.