JP2015528930A

JP2015528930A - Automatic extraction system and extraction method for website internal structure

Info

Publication number: JP2015528930A
Application number: JP2015514895A
Authority: JP
Inventors: キム、ヨンシク
Original assignee: ヴィヴァンスカンパニー、リミテッド
Priority date: 2012-05-29
Filing date: 2013-05-14
Publication date: 2015-10-01
Anticipated expiration: 2033-05-14
Also published as: WO2013180410A1; JP6044008B2; KR101235139B1

Abstract

本発明は、ウェブサイト内部構造の自動抽出システムに関するものであり、ウェブサイト内部構造を測定する少なくとも一つ以上の測定機（端末機）を備え、上記測定機は、端末運営体制上のＤＮＳキャッシュ及びブラウザキャッシュを空にしてウェブサイトに接続することにより応答をくれる全てのウェブサーバ情報を獲得し、ブラウザのイベント情報を介してブラウザとウェブサーバ間の詳細情報を獲得するか、ブラウザとウェブサーバ間のネットワークパケットを捉えてブラウザとウェブサーバの間の詳細情報を獲得する過程を具現することを特徴とする。また、上記測定機は、該当ウェブサイトに接続して応答をくれる全てのウェブサーバについての情報を獲得するように制御する制御部、上記ウェブサイトに接続を実現するウェブブラウザと、ウェブブラウザを駆動するウェブブラウザのイベント情報を介してウェブサーバとの送受信情報を獲得するブラウザ駆動機、及び上記ウェブブラウザがイベント情報で提供しない詳細な情報をパケット情報を収集するパケットキャプチャモジュールを含んで構成されることを特徴とする。【選択図】図３The present invention relates to a website internal structure automatic extraction system, comprising at least one measuring device (terminal) for measuring the website internal structure, and the measuring device is a DNS cache on a terminal operating system. All the web server information that responds is obtained by emptying the browser cache and connecting to the website, and detailed information between the browser and the web server is obtained via the browser event information, or the browser and the web server It is characterized in that a process of acquiring detailed information between a browser and a web server by capturing a network packet between them is implemented. In addition, the measuring device controls the controller to acquire information about all the web servers that connect to the website and responds, the web browser that realizes connection to the website, and the web browser. A browser driver that acquires transmission / reception information with a web server via event information of the web browser, and a packet capture module that collects packet information of detailed information that the web browser does not provide as event information It is characterized by that. [Selection] Figure 3

Description

本発明は、ウェブサイト内部構造の自動抽出システム及び抽出方法に関する。さらに詳しくは、ウェブサイトを介して提供される多様なコンテンツの物理的位置（コンポーネント、ドメイン、サーバ等）に該当する内部的構造を容易に把握することができるように、ウェブサイト構造を能動的に抽出及び視覚化することができるウェブサイト内部構造の自動抽出システム及び抽出方法に関する。 The present invention relates to an automatic extraction system and extraction method for a website internal structure. In more detail, the website structure is active so that the internal structure corresponding to the physical location (component, domain, server, etc.) of various contents provided via the website can be easily grasped. The present invention relates to an automatic extraction system and extraction method for an internal structure of a website that can be extracted and visualized.

ポータルサイト［ネイバー（ｎａｖｅｒ）、ダウム（ｄａｕｍ）等］、コミュニティサイト［フェイスブック（ｆａｃｅｂｏｏｋ）、サイワールド（ｃｙｗｏｒｌｄ）等］等のようなオンラインサービスウェブサイトは、多数のドメイン及びサーバを介してそのサービスがなされているのが一般的である。例えば、ポータルサイトのネイバーは、使用者に見えるＵＲＬはｗｗｗ．ｎａｖｅｒ．ｃｏｍ一つであるが、内部的にはｗｗｗ．ｎａｖｅｒ．ｃｏｍの他にもｉｃｓ．ｎａｖｅｒ．ｃｏｍ、ｎｖ１．ａｄ．ｎａｖｅｒ．ｃｏｍ、ｎｖ２．ａｄ．ｎａｖｅｒ．ｃｏｍ、ｓｔａｔｉｃ．ｎａｖｅｒ．ｃｏｍ、ｉｍｇｓｈｏｐｐｉｎｇ．ｎａｖｅｒ．ｃｏｍ等、１０余りを超えるドメインに分けられて各々ウェブコンポーネントをサービスしており、各々のドメイン別に物理的なサーバが割り当てられている複雑な構造である。 Online service websites such as portal sites [naver, daum, etc.], community sites [facebook, cyworld, etc.] are connected via a number of domains and servers. Generally, services are provided. For example, a neighbor of a portal site has a URL that is visible to the user as www. naver. com, but internally, www. naver. com. naver. com, nv1. ad. naver. com, nv2. ad. naver. com, static. naver. com, imgshopping. naver. com, etc., each of which is divided into more than 10 domains, each serving a web component, and has a complicated structure in which a physical server is assigned to each domain.

また、自己のインフラの他にＣＤＮ（ＣｏｎｔｅｎｔｓＤｅｌｉｖｅｒｙＮｅｔｗｏｒｋ）サービスや、クラウドコンピュータのような外部インフラの利用が増加し、内部の運営者も該当ウェブサイトの構造を把握するのが段々大変になってきている。 In addition to its own infrastructure, CDN (Contents Delivery Network) services and use of external infrastructure such as cloud computers have increased, and it has become increasingly difficult for internal operators to understand the structure of the corresponding website. ing.

また、少なくないオンラインサービスウェブサイトのコンテンツには、自己のコンテンツの他に外部の広告コンテンツやフェイスブック、ツイッターのようなＳＮＳ（ＳｏｃｉａｌＮｅｔｗｏｒｋＳｅｒｖｉｃｅ）のリアルタイム情報等を融合（Ｍｅｓｈ−Ｕｐ）して提供している（第三者コンテンツ）。 In addition, the content of many online service websites is a combination of external advertising content, SNS (Social Network Service) real-time information such as Facebook, etc. in addition to its own content (Mesh-Up). Provided (third-party content).

図１は、ウェブページの一般的な構成を図示化したものであり、実質的なウェブサイトの内部構造を図式化するための最も基本的な情報である。ウェブページ１００は、ＨＴＭＬファイル、ＣＳＳファイル、イメージファイル、Ｊａｖａｓｃｒｉｐｔファイル等、画面上へのコンテンツ表示のための多数のコンポーネント１１０で構成され、各コンポーネントについてのアドレスであるコンポーネントＵＲＬは、一つのＤＮＳドメイン１２０と対応する。一つのＤＮＳドメインは、一つ以上のウェブサーバ１３０に対応する。一方、インターネット標準ＲＦＣ２０４６に規定された固有のメディア類型１１５の詳細な例を含むコンポーネント１１０は、図２のとおりである。 FIG. 1 illustrates the general structure of a web page and is the most basic information for diagrammatically showing the internal structure of a substantial website. The web page 100 is composed of a number of components 110 for displaying content on the screen, such as HTML files, CSS files, image files, JavaScript files, etc. The component URL, which is the address for each component, is a single DNS domain. Corresponding to 120. One DNS domain corresponds to one or more web servers 130. On the other hand, the component 110 including a detailed example of the specific media type 115 defined in the Internet standard RFC2046 is as shown in FIG.

通常、このようなウェブサイトの内部構造図は、既存には運営者が手作業で作成し管理してきたが、随時に変化するサービスの構造や、随時に追加、削除、変更されるサーバインフラの内容を、このような手作業では適時に正確に反映することができない問題がある。 Normally, the internal structure diagram of such a website has been manually created and managed by the operator, but the structure of the service that changes from time to time and the server infrastructure that is added, deleted, or changed at any time There is a problem that the contents cannot be accurately reflected in a timely manner by such manual work.

さらに、ＣＤＮサービス、クラウドコンピュータの利用や、第三者コンテンツの活用のように外部のコンテンツやインフラの利用が増え、ウェブサイト運営者がウェブサイト全体の内部構造を把握し管理することがほとんど不可能になった。 In addition, the use of external content and infrastructure, such as the use of CDN services, cloud computers, and third-party content, has increased, making it almost impossible for website operators to understand and manage the internal structure of the entire website. It became possible.

上記のような問題点を解決するための本発明は、実際の使用者端に位置する複数の端末上のブラウザ（またはブラウザをシミュレーションするモジュール）を介して該当ウェブサイトに直接接続し、ブラウザとウェブサーバ間のデータ等を収集し分析することにより、ウェブサイトの内部構造を自動で抽出することができるウェブサイト構造の抽出方法を提供しようとするところにその目的がある。 In order to solve the above problems, the present invention directly connects to a corresponding website via browsers (or modules for simulating browsers) on a plurality of terminals located at the actual user end. The purpose is to provide a website structure extraction method that can automatically extract the internal structure of a website by collecting and analyzing data between web servers.

また、本発明は、自動抽出した内部構造を運営者が容易に把握しやすいグラフで表現するように具現される抽出方法を提供しようとするところにその目的がある。 Another object of the present invention is to provide an extraction method that is embodied so as to express an automatically extracted internal structure in a graph that is easy for an operator to easily grasp.

よって、本発明は、ウェブサイトの物理的内部構造を抽出し視覚化して、管理者の管理及びメンテナンス（ｍａｉｎｔｅｎａｎｃｅ）を効率的に具現することができるシステム及び方法を提供しようとするところに目的がある。 Accordingly, it is an object of the present invention to provide a system and method that can efficiently implement management and maintenance of an administrator by extracting and visualizing the physical internal structure of a website. is there.

上記のような目的を達成するための本発明は、ウェブサイト内部構造を測定する少なくとも一つ以上の測定機（端末機）を備え、上記測定機は、端末運営体制上のＤＮＳキャッシュ及びブラウザキャッシュを空にしてウェブサイトに接続することにより応答をくれる全てのウェブサーバ情報を獲得し、ブラウザのイベント情報を介してブラウザとウェブサーバ間の詳細情報を獲得するか、ブラウザとウェブサーバ間のネットワークパケットを捉えてブラウザとウェブサーバ間の詳細情報を獲得する過程を具現することを特徴とする。 In order to achieve the above object, the present invention includes at least one measuring device (terminal device) that measures the internal structure of a website, and the measuring device includes a DNS cache and a browser cache on a terminal operating system. Get all the web server information that responds by connecting to the website with empty, and get detailed information between browser and web server via browser event information, or network between browser and web server It is characterized by capturing a packet and embodying a process of acquiring detailed information between a browser and a web server.

また、上記測定機は、該当ウェブサイトに接続して応答をくれる全てのウェブサーバについての情報を獲得するように制御する制御部、上記ウェブサイトに接続を実現するウェブブラウザと、ウェブブラウザを駆動するウェブブラウザのイベント情報を介してウェブサーバとの送受信情報を獲得するブラウザ駆動機、及び上記ウェブブラウザがイベント情報で提供しない詳細な情報をパケット情報を介して収集するパケットキャプチャモジュールを含んで構成されることを特徴とする。 In addition, the measuring device controls the controller to acquire information about all the web servers that connect to the website and responds, the web browser that realizes connection to the website, and the web browser. A browser driver that acquires transmission / reception information with a web server via event information of the web browser, and a packet capture module that collects detailed information that the web browser does not provide as event information via the packet information It is characterized by being.

また、上記測定機は、運営体制を備えたＰＣまたは携帯用端末機に該当することを特徴とする。 Further, the measuring device corresponds to a PC or a portable terminal having an operating system.

また、上記ウェブブラウザは、ウェブブラウザシミュレータで構成可能なことを特徴とする。 The web browser can be configured by a web browser simulator.

また、上記測定機は、外部システムと通信接続するための通信部をさらに含み、上記測定機を介して測定されたウェブサイトの内部構造情報を収集及び視覚化を具現する収集及び分析サーバをさらに含むことを特徴とする。 The measuring device further includes a communication unit for communicating with an external system, and further includes a collection and analysis server for collecting and visualizing internal structure information of the website measured through the measuring device. It is characterized by including.

また、上記収集及び分析サーバは、上記通信部と接続する通信部、上記通信部を介して送信された情報を保存する保存部、上記保存部に保存された情報についての追加的な分析及び統計を介してウェブサイト内部構造情報を抽出する分析部、データ形態で存在するウェブサイト内部構造をグラフで視覚化する視覚化部、及び視覚化されたウェブサイト内部構造をディスプレイするＧＵＩを含むことを特徴とする。 The collection and analysis server includes a communication unit connected to the communication unit, a storage unit that stores information transmitted via the communication unit, and additional analysis and statistics on the information stored in the storage unit. Including an analysis unit that extracts information on the internal structure of the website through a graph, a visualization unit that visualizes the internal structure of the website that exists in the form of data in a graph, and a GUI that displays the internal structure of the visualized website Features.

また、上記視覚化部は、名前と色相を有するノード（ＮａｍｅｄＣｏｌｏｒＮｏｄｅ）と、加重値を有するライン（ＷｅｉｇｈｔｅｄＬｉｎｅ）で、ウェブサイト、ドメイン、サーバ（サーバＩＰ）間の相関関係を視覚化することを特徴とする。 The visualization unit visualizes a correlation between a website, a domain, and a server (server IP) by using a node having a name and a hue (Named Color Node) and a line having a weight value (Weighted Line). It is characterized by that.

また、測定機の運営体制（ＯＳ）上のＤＮＳＣａｃｈｅ情報及びウェブブラウザのＣａｃｈｅ情報を全て初期化する第１段階、ウェブブラウザ（またはブラウザシミュレータ）を介してウェブサイトに対する探索（ｎａｖｉｇａｔｉｏｎ）をする第２段階、該当ページに対する全ての探索が終わるｏｎｌｏａｄイベントを受信する時までウェブブラウザの内部イベント情報をｈｏｏｋｉｎｇする第３段階、及びイベントｈｏｏｋｉｎｇを介して該当ウェブページを構成する全てのコンポーネントの詳細情報を獲得して保存する第４段階を含むことを特徴とする。 In addition, the first stage of initializing all DNS Cache information on the operating system (OS) of the measuring instrument and the cache information of the web browser, the first stage of searching for the website via the web browser (or browser simulator) The second stage, the third stage of hooking the internal event information of the web browser until the time when the onload event is completed, and the detailed information of all the components constituting the corresponding web page through the event hooking It includes a fourth stage of acquiring and storing.

また、上記イベントｈｏｏｋｉｎｇを介して獲得することができない情報は、必要であればパケットキャプチャを介して追加で獲得する第５段階をさらに含むことを特徴とする。 In addition, the information that cannot be acquired through the event hooking further includes a fifth step of additionally acquiring through the packet capture if necessary.

また、上記コンポーネント情報は、ドメイン、コンポーネント名、ダウンロード時間、コンポーネントサイズ、メディア類型、ウェブサーバＩＰを含むことを特徴とする。 The component information includes a domain, a component name, a download time, a component size, a media type, and a web server IP.

また、Ｏｎｌｏａｄイベントを受信すれば、該当ウェブページの全ての探索が終了したため、この時まで保存したコンポーネント情報を収集及び分析サーバに送信４７０し、次の測定周期まで待機する第６段階をさらに含むことを特徴とする。 In addition, if the onload event is received, all searches for the corresponding web page have been completed, and thus the component information stored up to this time is transmitted to the collection and analysis server 470, and further includes a sixth step of waiting until the next measurement cycle. It is characterized by that.

上記のように構成され作用する本発明は、今日随時に変化し複雑なウェブサイトの内部構造を自動で抽出することができ、さらに一目でその構造を把握することができる視覚化された形態で表現が可能である。これによって、ウェブサイト運営者は、既存に手作業によってウェブサイト内部構造を管理する場合の不正確さを解消することができて、既存の手作業では管理しづらかった外部の第三者要素までも含めて管理することができ、時間によって変更するウェブサイト内部構造を自動で管理することができる長所がある。 The present invention constructed and operated as described above is a visualized form that can automatically extract the internal structure of a complex website that changes from time to time and can grasp the structure at a glance. Expression is possible. This allows website operators to eliminate inaccuracies when managing the internal structure of the website manually, even to external third party elements that were difficult to manage with existing manual operations. There is an advantage that the internal structure of the website that changes with time can be automatically managed.

これによって、ウェブサイト運営者は、安定的かつ効果的なウェブサイト運営が可能な効果がある。 As a result, the website operator can effectively and stably operate the website.

一般的なウェブページの構成図である。It is a block diagram of a general web page. 一般的なメディア類型と類型の分類を示した図である。It is the figure which showed the general media type and classification of the type. 本発明に係るウェブサイト内部構造の自動抽出システムの概略的な構成図である。It is a schematic block diagram of the automatic extraction system of the website internal structure which concerns on this invention. 本発明に係るウェブサイト内部構造の自動抽出システムの測定機の詳細構成図である。It is a detailed block diagram of the measuring machine of the automatic extraction system of the website internal structure which concerns on this invention. 本発明に係る測定機のウェブサイト内部構造情報の抽出段階のフロー図である。It is a flowchart of the extraction stage of the website internal structure information of the measuring device which concerns on this invention. 本発明に係る測定機を介して獲得して保存されるウェブサイト情報のデータ構造を示した図である。It is the figure which showed the data structure of the website information acquired and preserve | saved via the measuring device which concerns on this invention. 本発明に係るウェブサイト内部構造の表現のためのデータ構造を示した図である。It is the figure which showed the data structure for the expression of the website internal structure which concerns on this invention. 本発明に係るウェブサイト内部構造の具体的な視覚化方法を示した図である。It is the figure which showed the specific visualization method of the website internal structure which concerns on this invention.

以下、添付された図面を参照して、本発明に係るウェブサイト内部構造の自動抽出方法の望ましい実施例を詳しく説明すれば、次のとおりである。 Hereinafter, a preferred embodiment of a method for automatically extracting a website internal structure according to the present invention will be described in detail with reference to the accompanying drawings.

本発明に係るウェブサイト内部構造の自動抽出方法は、ウェブサイト内部構造を測定する少なくとも一つ以上の測定機（端末機）を備え、上記測定機は、端末運営体制上のＤＮＳキャッシュ及びブラウザキャッシュを空にしてウェブサイトに接続することにより応答をくれる全てのウェブサーバ情報を獲得し、ブラウザのイベント情報を介してブラウザとウェブサーバ間の詳細情報を獲得するか、ブラウザとウェブサーバ間のネットワークパケットを捉えてブラウザとウェブサーバ間の詳細情報を獲得する過程を具現することを特徴とする。 A method for automatically extracting a website internal structure according to the present invention includes at least one measuring device (terminal) that measures the internal structure of the website, and the measuring machine includes a DNS cache and a browser cache on a terminal operating system. Get all the web server information that responds by connecting to the website with empty, and get detailed information between browser and web server via browser event information, or network between browser and web server It is characterized by capturing a packet and embodying a process of acquiring detailed information between a browser and a web server.

本発明に係るウェブサイト内部構造抽出システムは、一つのウェブサイトを具現するために物理的に構成されるコンポーネント（ＵＲＬ）、ＤＮＳドメイン、ウェブサーバ（ＩＰアドレス）等の集合体に該当する内部構造を能動的に検出して視覚化することができるシステムを提供しようとするところに目的がある。 The website internal structure extraction system according to the present invention is an internal structure corresponding to an aggregate of components (URLs), DNS domains, web servers (IP addresses), etc. that are physically configured to implement one website. The purpose is to provide a system that can detect and visualize the sympathy.

図３は、本発明に係るウェブサイト内部構造の自動抽出システムの概略的な構成図である。本発明は、図３に示したように、インターネット２００上に、対象となるウェブサイト２１０についてサイトの構成情報を獲得するための複数の測定機２２０、及び測定機が獲得した情報を収集し分析して最終ウェブサイト内部構造を生成して視覚化する収集及び分析サーバ２３０で構成される。複数の測定機を置く理由は、一部のウェブサイトは使用者の位置に応じて異なるウェブサーバが応答するように内部構成をすることができるためである。 FIG. 3 is a schematic configuration diagram of an automatic extraction system for a website internal structure according to the present invention. As shown in FIG. 3, the present invention collects and analyzes on the Internet 200 a plurality of measuring devices 220 for acquiring site configuration information for a target website 210, and information acquired by the measuring devices. The collection and analysis server 230 generates and visualizes the final website internal structure. The reason for placing a plurality of measuring machines is that some web sites can be configured internally so that different web servers respond depending on the location of the user.

ウェブサイト２１０は、ＰＣ等の有線端末で接続する有線ウェブサイトだけでなく、スマートフォンのような無線端末で接続するモバイルウェブサイト［または（アプリ：Ａｐｐ）］が全て含まれ、測定機２２０は、ＰＣまたはサーバのような有線端末、及びスマートフォンのような無線端末上に具現される。測定機は、ウェブサイトに対して周期的（例：１０分周期）に接続を介してウェブサイトの構成情報を獲得する。 The website 210 includes not only a wired website connected with a wired terminal such as a PC but also a mobile website [or (app: App)] connected with a wireless terminal such as a smartphone. It is embodied on a wired terminal such as a PC or a server and a wireless terminal such as a smartphone. The measuring device obtains the configuration information of the website through connection to the website periodically (eg, every 10 minutes).

図４は、測定機と収集及び分析サーバのさらに詳細な構成を示す。 FIG. 4 shows a more detailed configuration of the measuring device and the collection and analysis server.

測定機２２０は、ウェブサイト内部構造を抽出する一つの端末機に該当するもので、一般的なＰＣや携帯用端末機（モバイル、タブレット等）で構成することができ、ウェブサイトに接続して応答に対応するウェブサーバ情報を抽出する構成を有する。 The measuring device 220 corresponds to one terminal for extracting the internal structure of the website, and can be composed of a general PC or a portable terminal (mobile, tablet, etc.) and connected to the website. The web server information corresponding to the response is extracted.

具体的には、上記測定機２２０は、全体的な測定過程の制御を担当する制御部３００と、収集及び分析サーバと通信を介して測定するウェブサイト情報及び測定周期を受信し測定によって獲得した情報を収集及び分析サーバに伝送する通信部３１０、実際のウェブサイト接続をするウェブブラウザ３３０、及びウェブブラウザを駆動しウェブブラウザのイベント情報を介してウェブサーバとの送受信情報を獲得するブラウザ駆動機３２０で構成される。 Specifically, the measuring device 220 receives the website information and the measurement period to be measured through communication with the control unit 300 in charge of controlling the overall measurement process, and the collection and analysis server, and obtained by measurement. A communication unit 310 that collects and transmits information to an analysis server, a web browser 330 that connects to an actual website, and a browser driver that drives the web browser and acquires transmission / reception information with the web server via event information of the web browser 320.

ウェブブラウザ３３０は、実際のウェブブラウザを使用するかウェブブラウザシミュレータで構成されることができ、ウェブブラウザがイベント情報で提供しないさらに詳細な情報のために、パケットキャプチャモジュール３４０を介してウェブブラウザとウェブサーバ間の送受信パケット情報を活用することができる。収集及び分析サーバは、測定機に制御情報を伝達し、測定機が獲得した情報を収集する通信部３５０、収集した情報を保存する保存部３６０、保存部によって累積保存された情報について追加的な分析及び統計によってウェブサイト内部構造情報を抽出する分析部３７０、データ形態で存在するウェブサイト内部構造情報をグラフで視覚化する視覚化部３８０、実際の使用者に視覚化されたウェブサイト内部構造を提供するＧＵＩ３９０で構成される。 The web browser 330 can be configured using a real web browser or a web browser simulator. For more detailed information that the web browser does not provide in the event information, the web browser 330 is connected to the web browser via the packet capture module 340. It is possible to utilize transmission / reception packet information between web servers. The collection and analysis server transmits control information to the measuring device, collects information acquired by the measuring device, a communication unit 350, a storage unit 360 that stores the collected information, and additional information about the information accumulated and stored by the storage unit. Analysis unit 370 that extracts website internal structure information by analysis and statistics, visualization unit 380 that visualizes website internal structure information existing in a data form in a graph, website internal structure visualized by an actual user It is comprised with GUI390 which provides.

測定機がウェブサイト内部構造情報を獲得するさらに詳細な手順は、図５のとおりである。測定機は、収集及び分析サーバから対象サイト情報及び測定周期情報を得て４１０、実質的な測定を開始する。測定の最初の段階は初期化４２０で、この段階では測定機の運営体制（ＯＳ）上のＤＮＳＣａｃｈｅ情報及びウェブブラウザのＣａｃｈｅ情報を全て初期化することにより、複雑なウェブサイト情報を漏れ無く獲得することができるようにする。初期化が終わると、ウェブブラウザ（またはブラウザシミュレータ）を介してウェブサイトについての探索（ｎａｖｉｇａｔｉｏｎ）を開始４３０し、該当ページに対する全ての探索が終わるｏｎｌｏａｄイベントを受信する時までウェブブラウザの内部イベント情報をｈｏｏｋｉｎｇする４５０。イベントｈｏｏｋｉｎｇを介して該当ウェブページを構成する全てのコンポーネントの詳細情報を獲得して保存する４６０。イベントｈｏｏｋｉｎｇを介して獲得することができない情報は、必要であれば、パケットキャプチャを介して追加で獲得することができる。Ｏｎｌｏａｄイベントを受信すると、該当ウェブページの全ての探索が終了したため、この時まで保存したコンポーネント情報を収集及び分析サーバに送信４７０し、次の測定周期まで待機４８０する。この際、送信されるコンポーネント情報は、図６のように、ウェブサイトの内部構造を把握することができる情報（ドメイン、コンポーネント名、ダウンロード時間、コンポーネントサイズ、メディア類型、ウェブサーバＩＰ等）を含む。次の測定周期になると、初期化から開始して同一の段階を繰り返し遂行する。 A more detailed procedure for the measuring machine to acquire the website internal structure information is as shown in FIG. The measuring device obtains target site information and measurement period information 410 from the collection and analysis server, and starts substantial measurement. The first stage of measurement is initialization 420. At this stage, all the DNS cache information on the measurement system operating system (OS) and the cache information of the web browser are initialized, so that complicated website information can be obtained without omission. To be able to. When initialization is completed, a search for the website is started 430 via the web browser (or browser simulator), and the internal event information of the web browser is received until an onload event is received when all searches for the page are completed. 450 to hook. Detailed information of all the components constituting the corresponding web page is acquired and stored 460 through the event hooking. Information that cannot be acquired via event hooking can be additionally acquired via packet capture if necessary. When the onload event is received, all the searches for the corresponding web page are completed. Therefore, the component information stored up to this time is transmitted to the collection and analysis server 470 and waits 480 until the next measurement cycle. At this time, the component information to be transmitted includes information (domain, component name, download time, component size, media type, web server IP, etc.) that can grasp the internal structure of the website as shown in FIG. . At the next measurement cycle, the same steps are repeated starting from initialization.

図７は、収集及び分析サーバの分析部３７０で、測定機から収集されて保存された図６のデータを統計及び分析して、ウェブサイト内部構造を抽出することができるデータとして保存するデータ構造のさらに詳細な例である。図６のデータを周期的（例：１時間または１日）に、サーバＩＰを基準として統計を出して、図７のような形態で保存するが、主要項目は、サーバＩＰ、該当サーバＩＰに対応するドメイン、統計時刻、ウェブサイト名、メディア類型、ＣｏｍｐｏｎｅｎｔＣｏｕｎｔ（該当サーバＩＰの出現回数）、ダウンロード速度等で構成され、ダウンロード速度（Ｄｏｗｎｌｏａｄｓｐｅｅｄ）は図６より次の数式で求める。 FIG. 7 shows a data structure in which the analysis unit 370 of the collection and analysis server statistically analyzes the data of FIG. 6 collected and stored from a measuring machine and stores it as data from which a website internal structure can be extracted. This is a more detailed example. The data of FIG. 6 is periodically (eg, 1 hour or 1 day), statistics are calculated based on the server IP, and stored in the form shown in FIG. 7, but the main items are the server IP and the corresponding server IP. Corresponding domain, statistical time, website name, media type, component count (appearance number of corresponding server IP), download speed, etc., and download speed is obtained from the following formula from FIG.

図７のＣｏｍｐｏｎｅｎｔＣｏｕｎｔは、該当サーバがどれだけ多くのコンポーネントをサービスするかを知ることができる重要な指標であり、ダウンロード速度は、該当サーバのサービス速度を示す重要な指標である。 The Component Count in FIG. 7 is an important index that can know how many components the corresponding server serves, and the download speed is an important index that indicates the service speed of the corresponding server.

図８は、収集及び分析サーバの視覚化部３８０が図７の形態で保存されたデータを名前と色相を有するノード（ＮａｍｅｄＣｏｌｏｒＮｏｄｅ）と加重値を有するライン（ＷｅｉｇｈｔｅｄＬｉｎｅ）で構成されるグラフ形態にウェブサイト内部構造を視覚化表現するさらに詳細な方法の例を説明する。 FIG. 8 is a graph in which the visualization unit 380 of the collection and analysis server includes data stored in the form of FIG. 7 and includes nodes having names and hues (Named Color Nodes) and lines having weights (Weighted Lines). An example of a more detailed method for visualizing the internal structure of the website in the form will be described.

上記測定機で測定されたウェブサイト内部構造を視覚的に容易にモニタリングするために、抽出された情報は、収集及び分析サーバで処理して多様な方式で描写する。 In order to easily monitor the internal structure of the website measured by the measuring device, the extracted information is processed by the collection and analysis server and depicted in various ways.

一番中央の１段階ノード５００はウェブサイト名が対応し、二番目の段階のノード５１０には該当ウェブサイトを構成するドメインが各々対応し、三番目の段階のノード５３０はドメインに対応するサーバＩＰが対応する。２段階及び３段階ノードは、該当ノードが有するメディア類型に応じて各々固有の色相で表現される。色相表現方法をさらに詳しく説明すれば、図５でのようにウェブサイトで主に利用される３つの基本類型（Ｔｅｘｔ、Ａｐｐｌｉｃａｔｉｏｎ、Ｉｍａｇｅ）にグルーピングし、各基本類型について図８の５２０のように光の３原色を利用して、Ｔｅｘｔ類型は赤、Ａｐｐｌｉｃａｔｉｏｎ類型は緑、Ｉｍａｇｅ類型は青で表現する。特定ノードが複数の基本類型を有する場合には、該当類型の色を混合した色で表現する。具体的な例をあげると、Ｔｅｘｔ（赤）とＡｐｐｌｉｃａｔｉｏｎ（緑）を全てサービスするドメインやサーバＩＰは、赤と緑を混合した黄で表現する。３段階ノードは、図７のＣｏｍｐｏｎｅｎｔＣｏｕｎｔに比例して円の大きさが定められる。すなわち、３段階の円が大きいほど、該当サーバＩＰで多くのコンポーネントをサービスしたという意味を表現する。また、２段階ノードと３段階ノードを連結するラインは、該当３段階ノードサーバのＤｏｗｎｌｏａｄＳｐｅｅｄに比例して太さを表現することにより、太いラインのサーバはより早い速度でサービスを提供することを表現する。 The first stage node 500 corresponds to the website name, the second stage node 510 corresponds to the domain constituting the corresponding website, and the third stage node 530 corresponds to the server corresponding to the domain. IP corresponds. The two-stage node and the three-stage node are each expressed with a unique hue according to the media type of the corresponding node. In more detail, the hue expression method is grouped into three basic types (Text, Application, Image) mainly used on the website as shown in FIG. 5, and each basic type is shown as 520 in FIG. Using the three primary colors of light, the Text type is expressed in red, the Application type in green, and the Image type in blue. When a specific node has a plurality of basic types, it is expressed by a color obtained by mixing the colors of the corresponding types. As a specific example, a domain or server IP that services all Text (red) and Application (green) is expressed in yellow, which is a mixture of red and green. In the three-stage node, the size of the circle is determined in proportion to the component count of FIG. That is, the larger the three-stage circle, the more meaning that the corresponding server IP has serviced more components. In addition, the line connecting the two-stage node and the three-stage node expresses the thickness in proportion to the download speed of the corresponding three-stage node server, so that the server of the thick line can provide the service at a faster speed. Express.

このように構成される本発明は、ウェブサーバ情報を抽出してウェブサイトの内部構造を獲得することにより構造を一目で把握することができ、視覚化された形態で表現することにより既存の管理形態をより一層効率的に具現することができる長所がある。 The present invention configured as described above can grasp the structure at a glance by extracting the web server information and acquiring the internal structure of the website, and can express the existing management by expressing it in a visualized form. There is an advantage that the form can be implemented more efficiently.

以上、本発明の原理を例示するための望ましい実施例と係わって説明して図示したが、本発明はこのように図示され説明されたとおりの構成及び作用に限定されるものではない。むしろ、添付された請求の範囲の思想及び範疇を逸脱することなく本発明についての多数の変更及び修正が可能であることを当業者は理解することができるであろう。したがって、そのような全ての適切な変更及び修正と均等物も、本発明の範囲に属するものとみなされるべきものである。 While the invention has been described and illustrated in connection with a preferred embodiment for illustrating the principles of the invention, the invention is not so limited in construction and operation as shown and described. Rather, those skilled in the art will recognize that numerous changes and modifications may be made to the present invention without departing from the spirit and scope of the appended claims. Accordingly, all such suitable changes and modifications and equivalents are to be considered within the scope of the present invention.

Claims

ウェブサイト内部構造を測定する少なくとも一つ以上の測定機（端末機）を備え、
上記測定機は、端末運営体制上のＤＮＳキャッシュ及びブラウザキャッシュを空にしてウェブサイトに接続することにより応答をくれる全てのウェブサーバ情報を獲得し、ブラウザのイベント情報を介してブラウザとウェブサーバ間の詳細情報を獲得するか、ブラウザとウェブサーバ間のネットワークパケットを捉えてブラウザとウェブサーバ間の詳細情報を獲得する過程を具現するウェブサイト内部構造の自動抽出システム。 At least one measuring device (terminal) that measures the internal structure of the website,
The above measuring device acquires all the web server information that responds by emptying the DNS cache and browser cache on the terminal operating system and connecting to the website, and between the browser and the web server via the browser event information System for automatically extracting the internal structure of a website that implements the process of acquiring detailed information on the network or capturing the network packet between the browser and the web server to acquire detailed information between the browser and the web server.

上記測定機は、
該当ウェブサイトに接続して応答をくれる全てのウェブサーバについての情報を獲得するように制御する制御部、
上記ウェブサイトに接続を実現するウェブブラウザと、ウェブブラウザを駆動するウェブブラウザのイベント情報を介してウェブサーバとの送受信情報を獲得するブラウザ駆動機、及び
上記ウェブブラウザがイベント情報で提供しない詳細な情報をパケット情報を介して収集するパケットキャプチャモジュールを含んで構成される、請求項１に記載のウェブサイト内部構造の自動抽出システム。 The measuring machine
A control unit that controls to acquire information about all the web servers that connect to and respond to the relevant website,
A browser that realizes connection to the website, a browser driver that acquires transmission / reception information with the web server via event information of the web browser that drives the web browser, and detailed information that the web browser does not provide in the event information 2. The system for automatically extracting a website internal structure according to claim 1, comprising a packet capture module that collects information via packet information.

上記測定機は、
運営体制を備えたＰＣまたは携帯用端末機に該当する、請求項１に記載のウェブサイト内部構造の自動抽出システム。 The measuring machine
The system for automatically extracting the internal structure of a website according to claim 1, which corresponds to a PC or a portable terminal equipped with an operating system.

上記ウェブブラウザは、
ウェブブラウザシミュレータで構成可能な、請求項２に記載のウェブサイト内部構造の自動抽出システム。 The above web browser
The system for automatically extracting a website internal structure according to claim 2, which can be configured by a web browser simulator.

上記測定機は、
外部システムと通信接続するための通信部をさらに含み、
上記測定機を介して測定されたウェブサイトの内部構造情報を収集及び視覚化を具現する収集及び分析サーバをさらに含む、請求項１に記載のウェブサイト内部構造の自動抽出システム。 The measuring machine
A communication unit for communicating with an external system;
The system for automatically extracting a website internal structure according to claim 1, further comprising a collection and analysis server that collects and visualizes the internal structure information of the website measured through the measuring device.

上記収集及び分析サーバは、
上記通信部と接続する通信部、
上記通信部を介して送信された情報を保存する保存部、
上記保存部に保存された情報についての追加的な分析及び統計を介してウェブサイト内部構造情報を抽出する分析部、
データ形態で存在するウェブサイト内部構造をグラフで視覚化する視覚化部、及び
視覚化されたウェブサイト内部構造をディスプレイするＧＵＩを含む、請求項４に記載のウェブサイト内部構造の自動抽出システム。 The collection and analysis server
A communication unit connected to the communication unit;
A storage unit for storing information transmitted via the communication unit;
An analysis unit for extracting website internal structure information through additional analysis and statistics on the information stored in the storage unit;
The system for automatically extracting a website internal structure according to claim 4, comprising: a visualization unit that visualizes the website internal structure existing in a data form in a graph; and a GUI that displays the visualized website internal structure.

上記視覚化部は、
名前と色相を有するノード（ＮａｍｅｄＣｏｌｏｒＮｏｄｅ）と、加重値を有するライン（ＷｅｉｇｈｔｅｄＬｉｎｅ）で、ウェブサイト、ドメイン、サーバ（サーバＩＰ）間の相関関係を視覚化する、請求項６に記載のウェブサイト内部構造の自動抽出システム。 The visualization part
7. The web according to claim 6, wherein a correlation between a website, a domain, and a server (server IP) is visualized by a node having a name and a hue (Named Color Node) and a line having a weight (Weighted Line). Automatic extraction system for site internal structure.

測定機の運営体制（ＯＳ）上のＤＮＳＣａｃｈｅ情報及びウェブブラウザのＣａｃｈｅ情報を全て初期化する第１段階、
ウェブブラウザ（またはブラウザシミュレータ）を介してウェブサイトに対する探索（ｎａｖｉｇａｔｉｏｎ）をする第２段階、
該当ページに対する全ての探索が終わるｏｎｌｏａｄイベントを受信する時までウェブブラウザの内部イベント情報をｈｏｏｋｉｎｇする第３段階、及び
イベントｈｏｏｋｉｎｇを介して該当ウェブページを構成する全てのコンポーネントの詳細情報を獲得して保存する第４段階を含む、ウェブサイト内部構造の自動抽出方法。 The first stage to initialize all DNS Cache information on the operating system (OS) of the measuring machine and Cache information of the web browser,
A second stage of navigating to a website via a web browser (or browser simulator);
The third stage of hooking the internal event information of the web browser until the time when the onload event is completed, and the detailed information of all the components constituting the web page is obtained through the event hooking. A method for automatically extracting the internal structure of a website, including a fourth stage of saving.

上記イベントｈｏｏｋｉｎｇを介して獲得することができない情報は、必要であればパケットキャプチャを介して追加で獲得する第５段階をさらに含む、請求項８に記載のウェブサイト内部構造の自動抽出方法。 The method according to claim 8, wherein the information that cannot be obtained through the event hooking further includes a fifth step of additionally obtaining the information through a packet capture if necessary.

上記コンポーネント情報は、
ドメイン、コンポーネント名、ダウンロード時間、コンポーネントサイズ、メディア類型、ウェブサーバＩＰを含んで構成される、請求項８に記載のウェブサイト内部構造の自動抽出方法。 The above component information is
The method for automatically extracting a website internal structure according to claim 8, comprising a domain, a component name, a download time, a component size, a media type, and a web server IP.

Ｏｎｌｏａｄイベントを受信すれば、該当ウェブページの全ての探索が終了したため、この時まで保存したコンポーネント情報を収集及び分析サーバに送信し、次の測定周期まで待機する第６段階をさらに含む、請求項８に記載のウェブサイト内部構造の自動抽出方法。 6. The method according to claim 6, further comprising a sixth step of transmitting the component information stored up to this time to the collection and analysis server and waiting until the next measurement cycle because all searches of the corresponding web page are completed if the onload event is received. 9. A method for automatically extracting the internal structure of a website according to 8.