JP5490082B2

JP5490082B2 - Internet site information analysis method and apparatus

Info

Publication number: JP5490082B2
Application number: JP2011277095A
Authority: JP
Inventors: 雅和堀; 恭平川添
Original assignee: Intec Inc Japan
Current assignee: Intec Inc Japan
Priority date: 2011-12-19
Filing date: 2011-12-19
Publication date: 2014-05-14
Anticipated expiration: 2027-11-02
Also published as: JP2012059295A

Description

本発明は、インターネットのＷｅｂサイト上で公開されている情報を分析し、トレンド情報等を取得・提供するインターネットサイト情報分析方法と装置に関する。 The present invention relates to an Internet site information analysis method and apparatus for analyzing information published on an Internet website and acquiring / providing trend information and the like.

インターネット上では、様々な人々が公開した膨大な量の情報がＷｅｂサイトに記憶され、現在もなお増え続けている。ここで、Ｗｅｂサイトとは、掲示板（BBS：Bulletin Board System）やホームページ、ブログと称されるウェブログ（Web Log）などの情報源を意味する。 On the Internet, an enormous amount of information released by various people is stored in a Web site and is still increasing. Here, the Web site means an information source such as a bulletin board (BBS: Bulletin Board System), a homepage, and a Web log called a blog.

近年、Ｗｅｂサイトに蓄積された記事を分析して新しいトレンド情報を得ようとする取り組みが盛んに行われている。例えば、意見の内容がどの程度肯定的なのか或いは否定的なのかを分析する評判分析の技術や、所定のキーワードの出現頻度や注目度（Ｂｕｒｓｔ度）のトレンドを時系列に評価する技術や、検索エンジンを使用したとき検索結果内で特定のＷｅｂサイトが上位に表示されるようにする検索エンジン最適化の技術等、様々な技術を用いたサービスが実際に提供されている。 In recent years, efforts to obtain new trend information by analyzing articles accumulated on Web sites have been actively conducted. For example, a reputation analysis technique that analyzes how positive or negative the content of an opinion is, a technique that evaluates trends in the frequency of appearance and the degree of attention (Burst degree) of a given keyword in time series, Services using various technologies are actually provided, such as a search engine optimization technology that allows a specific Web site to be displayed at the top of search results when a search engine is used.

例えば、特許文献１に開示されているように、Ｗｅｂサイトにアクセスして文章情報（風評情報）を所定の期間ごとに収集し、収集したキーワードの使用状況を定量化して、定量化したキーワードの使用状況を監視することにより、抽出したキーワードの中からトレンドキーワードとなるキーワードを選定して、近未来に検索エンジンで使用される可能性の高いトレンドキーワードをリアルタイムで予測し、そのトレンドキーワードに関連する情報を提供するトレンド予測装置がある。これは、実際に検索エンジンに入力された検索キーワードの使用実績を分析してトレンドキーワードを得るよりも、リアルタイム性に優れているという特徴がある。 For example, as disclosed in Patent Document 1, text information (reputation information) is collected every predetermined period by accessing a website, and the usage status of the collected keywords is quantified, and the quantified keyword By monitoring usage, select keywords that will become trend keywords from the extracted keywords, predict trend keywords that are likely to be used in search engines in the near future, and relate to those trend keywords There is a trend prediction device that provides information to be used. This is characterized in that it has better real-time performance than the trend keyword is obtained by analyzing the actual use of the search keyword input to the search engine.

特開２００６−２２７９６５号公報JP 2006-227965 A

しかし、特許文献１のトレンド予測装置は、分析の対象としたキーワードについて、そのキーワードを含む文章情報の肯定性／否定性やキーワードの発生頻度などを分析して、そのキーワード自体をトレンドキーワードに選定するか否かを判断するものであって、そのキーワードの周囲に広がっていく投稿者の興味の変化を実体的に把捉し得るものではなかった。 However, the trend prediction apparatus of Patent Document 1 analyzes the affirmative / negative of text information including the keyword, the frequency of occurrence of the keyword, and the like, and selects the keyword itself as the trend keyword. It was not possible to substantiate the changes in the interests of contributors that spread around the keyword.

また、特定のキーワードの周辺情報を収集する方法として相関分析手法が提案されており、そのキーワードから連想される別のキーワードを取得する連想検索等のサービスも行われているが、特許文献１と同様に、そのキーワードの周囲に広がっていく投稿者の興味の変化を実態的に把捉し得るものではなかった。 Further, a correlation analysis method has been proposed as a method of collecting peripheral information of a specific keyword, and a service such as an associative search for acquiring another keyword associated with the keyword has been performed. Similarly, changes in the interests of contributors spreading around the keyword could not be grasped in practice.

また、有益な情報が得られる活発なＷｅｂサイトをリアルタイムに知りたいという要望があるが、そのような要望に応え得る方法やサービスは提案されておらず、未だ実用化されていないものであった。 In addition, there is a request to know in real time an active website where useful information can be obtained, but no method or service that can meet such a request has been proposed and has not yet been put into practical use. .

本発明は上記背景技術に鑑みて成されたもので、インターネット上に公開された多くのサイト情報を基に、それらの情報が持つ意味や背景、傾向を効果的に正確に知ることができる分析装置及び分析方法を提供するもので、特に、有益な情報が得られる活発なＷｅｂサイトをリアルタイムに抽出するＷｅｂサイト活性度分析、およびあるキーワードの周囲に広がっていく投稿者の興味の変化を実態的に把捉する共起情報分析を行い、Ｗｅｂサイトに蓄積された記事から新しいトレンドを正確に知ることができるインターネットサイト情報分析方法と装置を提供することを目的とする。 The present invention was made in view of the above background art, and based on a lot of site information published on the Internet, an analysis that can effectively and accurately know the meaning, background, and tendency of the information. Providing devices and analysis methods, especially website activity analysis that extracts active websites where useful information can be obtained in real time, and changes in the interests of contributors spreading around certain keywords It is an object of the present invention to provide an Internet site information analysis method and apparatus capable of performing co-occurrence information analysis and grasping new trends accurately from articles accumulated on a Web site.

この発明は、インターネット上に存在するＷｅｂサイトに、コンピュータシステムにより自動的にアクセスしてその文章情報を収集し、分析を行うインターネットサイト情報分析方法において、前記コンピュータシステムにより、前記文章情報と各Ｗｅｂサイトの更新日情報を収集する情報収集ステップと、前記文章情報を単語に分割する単語分割ステップと、前記単語群の中から所定のキーワードと同一または類似する単語を抽出し、その単語を含む文章情報の数を関連情報投稿数として算出する関連情報投稿数計算ステップと、各Ｗｅｂサイトから収集した文章情報数に占める前記関連情報投稿数の割合を、各Ｗｅｂサイト毎に算出して関連情報投稿率とする関連情報投稿率計算ステップと、分析を行う基準日と前記更新日情報をもとに、各Ｗｅｂサイトの更新頻度を算出する更新頻度計算ステップと、各Ｗｅｂサイト毎の前記関連情報投稿率と前記更新頻度とを相対比較して出力するサイト活性度分析出力ステップとから成る処理を行うインターネットサイト情報分析方法である。 The present invention relates to an Internet site information analysis method for automatically accessing a Web site existing on the Internet by a computer system to collect and analyze the text information, and the computer system allows the text information and each Web site to be analyzed. An information collecting step for collecting site update date information, a word dividing step for dividing the sentence information into words, a word that is identical or similar to a predetermined keyword from the word group, and a sentence including the word a related information posts calculation step of calculating the number of information as related information posts, the proportion of the relevant information posts to total sentence information collected from each Web site, related information posted calculated for each Web site a related information post rate calculation step of the rate, and the reference date to analyze the update date information based on Internet to perform the update frequency calculation step of calculating the update frequency of each Web site, the process consisting of the relevant information Post rate and with said update frequency relative comparison output site activity analysis output step for each Web site This is a site information analysis method.

上記の各ステップに加え、前記Ｗｅｂサイトの文章情報の収集を所定の期間が経過するごとに行う情報収集ステップと、前記文章情報を単語に分割する単語分割ステップと、前記単語群の中から調査対象のキーワードと同一または類似する単語を含む文章情報を抽出する調査対象情報抽出ステップと、前記調査対象情報を構成する単語に含まれ、前記キーワードと別のキーワードである共起キーワードを抽出する共起キーワード抽出ステップと、調査対象情報に前記共起キーワードが出現する頻度をもとに、前記共起キーワード毎に評点計算する共起キーワード評点計算ステップと、前記共起キーワードを前記共起キーワード評点の順に並び替えて共起情報リストを作成するソートステップと、所定の期間ごとに得られた前記共起情報リストを、時系列に表して出力する共起情報分析出力ステップとを設けてもよい。 In addition to the above steps, an information collecting step for collecting sentence information of the website every time a predetermined period elapses, a word dividing step for dividing the sentence information into words, and an investigation from the word group A search object information extracting step for extracting sentence information including a word that is the same as or similar to the target keyword, and a co-occurrence keyword for extracting a co-occurrence keyword that is included in the words constituting the search target information and is a keyword different from the keyword. A keyword occurrence step, a co-occurrence keyword score calculation step for calculating a score for each co-occurrence keyword based on the frequency of occurrence of the co-occurrence keyword in the survey target information, and the co-occurrence keyword as the co-occurrence keyword score. Sorting step for rearranging in order to create a co-occurrence information list, and the co-occurrence information list obtained every predetermined period , It may be provided and co-occurrence information analysis output step of outputting represents the time series.

またこの発明は、コンピュータシステムにより構成され、インターネット上に存在するＷｅｂサイトに、前記コンピュータシステムによりアクセスしてその文章情報を収集し、分析を行うインターネットサイト情報分析装置において、前記Ｗｅｂサイトの文章情報の収集を行う情報収集手段と、前記文章情報を単語に分割する単語分割手段と、前記単語群の中から所定のキーワードと同一または類似する単語を抽出し、その単語を含む文章情報の数を関連情報投稿数として算出する関連情報投稿数計算手段と、各Ｗｅｂサイトから収集した文章情報数に占める前記関連情報投稿数の割合を、各Ｗｅｂサイト毎に算出して関連情報投稿率とする関連情報投稿率計算手段と、各Ｗｅｂサイトの更新日付を収集する更新日情報収集手段と、分析を行う基準日と前記更新日情報をもとに、各Ｗｅｂサイトの更新頻度を算出する更新頻度計算手段と、各Ｗｅｂサイト毎の前記関連情報投稿率と前記更新頻度とを相対比較して出力するサイト活性度分析出力手段とを備えたインターネットサイト情報分析装置である。 Also, the present invention provides an Internet site information analyzing apparatus configured to analyze a web site that is configured by a computer system, accesses the web site existing on the Internet, collects the text information, and analyzes the web site information. Information collecting means for collecting the word information, word dividing means for dividing the sentence information into words, a word that is the same as or similar to a predetermined keyword is extracted from the word group, and the number of sentence information including the word is determined. Related information posting number calculating means for calculating the number of related information postings, and a related information posting rate by calculating, for each website, the ratio of the number of related information postings to the number of text information collected from each website. Information post rate calculation means, update date information collection means for collecting the update date of each website, and analysis An update frequency calculation means for calculating the update frequency of each website based on the reference date and the update date information, and a site for outputting the related information posting rate for each website and the update frequency in a relative comparison An internet site information analysis apparatus comprising activity analysis output means.

前記活性度分析出力手段は、前記関連情報投稿率と前記更新頻度とを２次元グラフに表して出力するものである。 The activity analysis output means outputs the related information posting rate and the update frequency in a two-dimensional graph.

上記の構成に加え、前記Ｗｅｂサイトの文章情報の収集を所定の期間が経過するごとに行う情報収集手段と、前記文章情報を単語に分割する単語分割手段と、前記単語群の中から調査対象のキーワードと同一または類似する単語を含む文章情報を抽出する調査対象情報抽出手段と、前記調査対象情報を構成する単語に含まれ、前記調査対象のキーワードと同一または類似の単語を除く他の単語である共起キーワードを抽出する共起キーワード抽出手段と、調査対象情報に前記共起キーワードが出現する頻度をもとに、前記共起キーワード毎に評点計算する共起キーワード評点計算手段と、前記共起キーワードを前記共起キーワード評点の順に並び替えて共起情報リストを作成するソート手段と、所定の期間ごとに得られた前記共起情報リストを、時系列に表して出力する共起情報分析出力手段とを設けてもよい。 In addition to the above configuration, information collecting means for collecting the sentence information of the Web site every time a predetermined period elapses, word dividing means for dividing the sentence information into words, and a survey target from the word group Search target information extracting means for extracting sentence information including a word that is the same as or similar to the keyword of the keyword, and other words that are included in the words constituting the search target information and exclude the word that is the same as or similar to the keyword of the search target A co-occurrence keyword extracting means for extracting the co-occurrence keyword, a co-occurrence keyword score calculating means for calculating a score for each co-occurrence keyword based on the frequency of occurrence of the co-occurrence keyword in the survey target information, Sort means for rearranging co-occurrence keywords in the order of the co-occurrence keyword scores to create a co-occurrence information list, and the co-occurrence information list obtained every predetermined period , May be provided and co-occurrence information analysis output means for outputting represents the time series.

この発明によれば、Ｗｅｂサイトに公開され蓄積された膨大な量の情報を分析し、的確なトレンド情報を容易に得ることが可能になる。 According to the present invention, it is possible to analyze an enormous amount of information released and accumulated on a website and easily obtain accurate trend information.

特に、請求項１~３記載の発明によれば、各Ｗｅｂサイトについて所定キーワードに関連した情報の投稿率と更新頻度を計算するサイト活性度分析によって、活発に情報発信しているＷｅｂサイトを容易に知ることができ、それら情報発信源として注目すべきＷｅｂサイトに絞って情報収集を行えば、有益なトレンド情報を効率よく得ることができる。 In particular, according to the first to third aspects of the present invention, it is easy to make websites actively transmitting information through site activity analysis that calculates the posting rate and update frequency of information related to a predetermined keyword for each website. If the information is collected by focusing on Web sites that should be noted as information sources, useful trend information can be obtained efficiently.

さらに、所定のキーワードに共起する別のキーワードの変化を時系列で分析する共起情報分析を行うことによって、所定のキーワードの周囲に広がっていく投稿者の興味の変化も実態的に把捉することができる。 In addition, by carrying out the co-occurrence information analysis to analyze a time series variation of another keyword co-occurring in a given keyword, also tangibly to grasping changes in the interest of contributors spread around the predetermined keyword be able to.

この発明のインターネットサイト情報分析装置の一実施形態を配置したネットワークシステム全体の構成を示す図である。It is a figure which shows the structure of the whole network system which has arrange | positioned one Embodiment of the internet site information analysis apparatus of this invention. この発明のインターネットサイト情報分析方法の第一の実施形態であるサイト活性度分析に係るフローチャートである。It is a flowchart which concerns on the site activity analysis which is 1st embodiment of the internet site information analysis method of this invention. 本実施形態において文章情報と更新頻度計算の例を示す図である。It is a figure which shows the example of text information and update frequency calculation in this embodiment. 本実施形態において関連記事投稿数計算を行うステップに係るフローチャートである。It is a flowchart which concerns on the step which performs the number calculation of related articles in this embodiment. 本実施形態において関連記事投稿率の計算結果リストの例を示す図である。It is a figure which shows the example of the calculation result list | wrist of a related article contribution rate in this embodiment. 本実施形態においてサイト活性度分析結果の出力形式の例を示すグラフである。It is a graph which shows the example of the output format of a site activity analysis result in this embodiment. インターネットサイト情報分析方法の他の実施形態である共起情報分析に係るフローチャートである。It is a flowchart which concerns on the co-occurrence information analysis which is other embodiment of the internet site information analysis method. 他の実施形態において文章情報から調査対象情報を抽出する例を示す図である。It is a figure which shows the example which extracts investigation object information from text information in other embodiment. 他の実施形態において各共起キーワードの評点計算を行うステップに係るフローチャートである。It is a flowchart which concerns on the step which performs the score calculation of each co-occurrence keyword in other embodiment. 他の実施形態において共起情報分析結果の出力形式の例を示す図である。It is a figure which shows the example of the output format of a co-occurrence information analysis result in other embodiment.

以下、本発明のインターネットサイト情報分析装置１０を配置したネットワークシステムの実施形態を図１に基づいて説明する。このネットワークシステムにおいては、多くの人が感想や意見などの情報を公開しているＷｅｂサイト１２と、指定されたＷｅｂサイトから定期的にＲＳＳ（Rich Site Summary）フォーマットの文章情報を収集するクローラ１４と、クローラ１４が収集した情報がデータベース化して記憶している記事データベース１６と、後述する分析を行うプログラムを備えたアナライザ１８と、分析する評価軸に対応に対応する評価表現とその評価スコアが設定されている評価表現辞書群データベース２０と、分析の結果を格納する分析結果データベース２２と、利用者２６が所持するパソコンが接続され、所望の分析結果を分析結果データベース２２から抽出して利用者２６に配信するポータルサーバ２４とが、インターネット上にそれぞれ配置されている。 Hereinafter, an embodiment of a network system in which the Internet site information analysis apparatus 10 of the present invention is arranged will be described with reference to FIG. In this network system, a crawler 14 that regularly collects text information in RSS (Rich Site Summary) format from a designated website and a website 12 on which many people publish information such as comments and opinions. And an article database 16 in which information collected by the crawler 14 is stored as a database, an analyzer 18 having a program for performing an analysis, which will be described later, an evaluation expression corresponding to the evaluation axis to be analyzed, and an evaluation score thereof. The set evaluation expression dictionary group database 20, the analysis result database 22 for storing the analysis results, and the personal computer owned by the user 26 are connected, and a desired analysis result is extracted from the analysis result database 22 to the user. 26 are distributed on the Internet. .

以下、本発明の一実施形態に係るサイト活性度分析を行うインターネットサイト情報分析方法を、図２から図６に基づいて説明する。まず、図２に示すフローを用いて概要を説明する。ステップＳ２１０では、各Ｗｅｂサイトから文章情報と更新日情報を収集する。この時、インターネットサイト情報分析装置１０では、クローラ１４と、記事データベース１６と、サイト活性度分析アナライザ１８ｂとがこの情報収集手段および更新日情報取得手段としての働きを行う。次に、ステップＳ２２０で、収集した文章情報を単語（品詞）に分解する。インターネットサイト情報分析装置１０では、サイト活性度分析アナライザ１８ｂがこの単語分割手段としての働きを行う。そして、ステップＳ２３０で、分割された単語群の中に所定のキーワードと同一又は類似するものを含む文章情報、すなわち関連情報の投稿数を算出する。インターネットサイト情報分析装置１０では、サイト活性度分析アナライザ１８ｂがこの関連情報投稿数計算手段としての働きを行う。さらに、ステップＳ２４０で、そのＷｅｂサイトから収集した文章情報数に占める関連情報投稿数の割合である関連情報投稿率を計算し、計算結果を格納する。インターネットサイト情報分析装置１０では、サイト活性度分析アナライザ１８ｂと、サイト活性度分析結果データベース２２ｂがこの関連情報投稿率計算手段としての働きを行う。 Hereinafter, the Internet site information analyzing method for performing site activity analysis according to an embodiment of the present invention will be described with reference to FIGS. 2-6. First, an outline will be described using the flow shown in FIG. In step S210, text information and update date information are collected from each Web site. At this time, in the Internet site information analysis apparatus 10, the crawler 14, the article database 16, and the site activity analysis analyzer 18b function as the information collection unit and the update date information acquisition unit. Next, in step S220, the collected sentence information is decomposed into words (parts of speech). In the Internet site information analyzing apparatus 10, the site activity analysis analyzer 18b functions as this word dividing means. Then, in step S230, the sentence information including the same or similar to the predetermined keyword in the divided word group, that is, the number of related information posts is calculated. In the Internet site information analysis apparatus 10, the site activity analysis analyzer 18b functions as the related information posting number calculation means. In step S240, the related information posting rate, which is the ratio of the number of related information posts to the number of text information collected from the website, is calculated, and the calculation result is stored. In the Internet site information analysis apparatus 10, the site activity analysis analyzer 18b and the site activity analysis result database 22b function as the related information posting rate calculation means.

ステップＳ２５０は、分析を行う基準日と更新日情報をもとに、各Ｗｅｂサイトの更新頻度を計算し、計算結果を格納するもので、インターネットサイト情報分析装置１０では、サイト活性度分析アナライザ１８ｂと、サイト活性度分析結果データベース２２ｂとがこの更新頻度計算手段としての働きを行う。 Step S250 calculates the update frequency of each Web site based on the reference date for analysis and update date information, and stores the calculation result. The Internet site information analysis apparatus 10 uses the site activity analysis analyzer 18b. The site activity analysis result database 22b serves as the update frequency calculation means.

ステップＳ２６０は、利用者２６の要求に応じ、関連情報投稿率と更新頻度という２つの計算値が割り付けられた各Ｗｅｂサイトをグラフに表示して出力する。インターネットサイト情報分析装置１０では、サイト活性度分析結果データベース２２ｂと、ポータルサーバ２４が備えるサイト活性度分析表示フレームワーク２４ｂとがこのサイト分析出力手段としての働きを行う。 In step S260, in response to a request from the user 26, each Web site to which two calculated values of the related information posting rate and the update frequency are assigned is displayed on a graph and output. In the Internet site information analysis apparatus 10, the site activity analysis result database 22b and the site activity analysis display framework 24b provided in the portal server 24 function as this site analysis output means.

次に、上記サイト活性度分析の各ステップについて、詳細に説明する。図３に示すように各Ｗｅｂサイトには、複数の文章情報と各Ｗｅｂサイトの最終更新日の情報が存在する。ステップＳ２１０では、例えばＷｅｂサイト１であれば、文章情報ａ１，ａ２、および「最終更新日：９月１１日」という更新日情報を収集する。さらにこれらの文章情報はステップＳ２２０において、名詞、形容詞、動詞等の単語（品詞）に分解される。 Next, each step of the site activity analysis will be described in detail. As shown in FIG. 3, each Web site includes a plurality of text information and information on the last update date of each Web site. In step S210, for example, for the Web site 1, the text information a1 and a2 and the update date information “last update date: September 11” are collected. Further, in step S220, the sentence information is decomposed into words (parts of speech) such as nouns, adjectives and verbs.

ステップＳ２３０について、さらに詳細な処理について図４に基づいて説明する。ステップＳ２３０は、ステップＳ２３１で、調査対象となる所定のキーワードが与えられると、そのキーワードと類似の単語群を、類語辞典の一種であるシソーラスなどを用いて抽出する。そしてステップＳ２３２で、各Ｗｅｂサイト毎に、キーワードと同一または類似の単語群のいずれかを含む文章情報、すなわち関連情報を抽出する。さらにステップＳ２３３で、抽出された関連文章情報をカウントし、その数を累積計算する。そして判断ステップであるステップＳ２３４において、すべてのＷｅｂサイトについての計算がされたか否かを判断し、ＮＯであれば次のＷｅｂサイトについてステップＳ２３２からＳ２３３を繰り返し、ＹＥＳになった時点でステップＳ２３０が終了して次のステップ２４０へ移行する。 Further detailed processing of step S230 will be described with reference to FIG. In step S230, when a predetermined keyword to be investigated is given in step S231, a word group similar to the keyword is extracted using a thesaurus which is a kind of thesaurus. In step S232, sentence information including any word group that is the same as or similar to the keyword, that is, related information is extracted for each Web site. In step S233, the extracted related sentence information is counted and the number is cumulatively calculated. In step S234, which is a determination step, it is determined whether or not calculations have been performed for all websites. If NO, steps S232 to S233 are repeated for the next website. End and go to the next step 240.

このようにステップＳ２３０では、例えば、「車」というキーワードが与えられると、ステップＳ２３１によって「軽四」「ハイブリッドカー」「自動車」といった俗称、略称あるいは正式名称その他の単語を類似語として抽出する。従って、関連情報の内容の分析およびその投稿数の算出を漏れなく行うことができる。 Thus, in step S230, for example, when the keyword “car” is given, common words, abbreviations, formal names, and other words such as “light four”, “hybrid car”, and “automobile” are extracted as similar words in step S231. Therefore, it is possible to analyze the contents of related information and calculate the number of posts without omission.

ステップＳ２４０では、Ｗｅｂサイトごとに関連情報の投稿率を計算し、その計算結果をサイト活性度分析結果データベース２２ｂに格納する。関連情報投稿率は、各Ｗｅｂサイトから収集した文章情報の総数を分母に、その中の所定のキーワードについての関連情報の数を分子に配して除算計算を行っている。例えば、図３に示すＷｅｂサイト１には、文章情報ａ１，ａ２の２件あり、そのうち、キーワード「車」の関連情報は「ハイブリッドカー」という単語を含む文章情報ａ１の１件である。従って、Ｗｅｂサイト１におけるキーワード「車」についての関連情報投稿率は０．５と計算される。このようにしてキーワードごと、かつＷｅｂサイトごとに関連情報投稿率が計算され、その計算結果は図５のリストのように系統立ててサイト活性度分析結果データベース２２ｂに格納される。 In step S240, the posting rate of related information is calculated for each Web site, and the calculation result is stored in the site activity analysis result database 22b. The related information posting rate is calculated by dividing the total number of sentence information collected from each Web site by using the denominator and the number of related information for a predetermined keyword in the denominator. For example, the Web site 1 shown in FIG. 3 has two pieces of sentence information a1 and a2, and the related information of the keyword “car” is one piece of sentence information a1 including the word “hybrid car”. Therefore, the related information posting rate for the keyword “car” on the Web site 1 is calculated as 0.5. In this way, the related information posting rate is calculated for each keyword and for each Web site, and the calculation results are systematically stored in the site activity analysis result database 22b as shown in the list of FIG.

ステップＳ２５０では、各Ｗｅｂサイトの更新頻度を計算し、その計算結果をサイト活性度分析結果データベース２２ｂに格納する。図３の計算例では、分析を行う基準日とそのＷｅｂサイトの最終更新日との差に１を加算し、その逆数を更新頻度と定義している。この定義によれば、Ｗｅｂサイト１の場合は、分析を行う基準日と最終更新日がともに９月１１日（同日）のため更新頻度は１．０となる。また、Ｗｅｂサイト２の場合は、同様の計算を行うと更新頻度は０．０１１となる。つまり、頻繁に更新されているＷｅｂサイト１は更新頻度が高い値となり、更新されずに放置されている期間が長いＷｅｂサイト２は更新頻度が低い値を示すことになる。 In step S250, the update frequency of each Web site is calculated, and the calculation result is stored in the site activity analysis result database 22b. In the calculation example of FIG. 3, 1 is added to the difference between the reference date for analysis and the last update date of the Web site, and the reciprocal is defined as the update frequency. According to this definition, in the case of Web site 1, the update frequency is 1.0 because both the reference date for analysis and the last update date are September 11 (the same day). In the case of the Web site 2, if the same calculation is performed, the update frequency is 0.011. That is, the frequently updated Web site 1 has a high update frequency, and the Web site 2 that has been left unupdated for a long period shows a low update frequency.

ステップＳ２６０では、ステップＳ２４０で所定のキーワードについて割り付けられた関連情報投稿率を横軸に、ステップ２５０で割り付けられた更新頻度を縦軸にして、各Ｗｅｂサイトの相対的な位置づけをグラフに表し、サイト活性度分析情報としてＰＵＬＬ型（利用者２６が必要に応じて情報を取り出す）で提供する。例えば図６に示すように、グラフの右上に位置するＷｅｂサイトは、「車」に関する情報が多く、かつ、頻繁に更新されているＷｅｂサイト群であるので、Ｗｅｂサイト１，２，５のように活発に情報発信しているＷｅｂサイトにアクセスすれば、「車」に関する有益な情報が得られそうだということが分かる。逆に、グラフの左下に位置するＷｅｂサイトは、「車」に関する情報が少なく、かつ、更新頻度も低いＷｅｂサイト群であるので、Ｗｅｂサイト８のように活動が低調なＷｅｂサイトにアクセスしても、「車」に関する有益な情報が得られそうにないということが分かる。 In step S260, the relative information posting ratio assigned for the predetermined keyword in step S240 is plotted on the horizontal axis, the update frequency assigned in step 250 is plotted on the vertical axis, and the relative positioning of each website is represented in a graph. The site activity analysis information is provided in PULL type (the user 26 extracts information as needed). For example, as shown in FIG. 6, the website located in the upper right of the graph is a group of websites that have a lot of information related to “cars” and are frequently updated. If you access a website that actively transmits information, you will find that you can get useful information about "cars". Conversely, the website located at the lower left of the graph is a group of websites with little information on “cars” and a low update frequency. Therefore, a website with low activity such as website 8 is accessed. However, it turns out that useful information about "cars" is unlikely to be obtained.

以上説明したこの実施形態に係るサイト活性度分析（ステップＳ２１０からステップＳ２６０）のインターネットサイト情報分析方法によれば、活発に情報発信しているＷｅｂサイトを容易に知ることができ、情報発信源として注目すべきそれらのＷｅｂサイトに絞って情報収集を行えば、有益なトレンド情報を効率よく得ることができる。 According to the Internet site information analysis method of the site activity analysis (steps S210 to S260) according to this embodiment described above, it is possible to easily know a website that is actively transmitting information, and as an information transmission source. If information is collected focusing on those Web sites to be noticed, useful trend information can be obtained efficiently.

次に、共起情報分析を行うインターネットサイト情報分析方法の一実施形態について、図７〜図１０に基づいて説明する。まず、図７に示すフローを用いて概要を説明する。ステップＳ３１０では、各Ｗｅｂサイトから所定の期間が経過するごとに文章情報を収集する。インターネットサイト情報分析装置１０では、クローラ１４と、記事データベース１６と、共起情報分析アナライザ１８ｃとがこの情報収集手段としての働きを行う。次に、ステップＳ３２０で、収集した文章情報を単語（品詞）に分解する。インターネットサイト情報分析装置１０では、共起情報分析アナライザ１８ｃがこの単語分割手段としての働きを行う。ステップＳ３３０では、分割された単語群の中に所定のキーワードと同一又は類似するものを含む文章情報、すなわち調査対象情報を抽出する。この時、インターネットサイト情報分析装置１０では、共起情報分析アナライザ１８ｃがこの調査対象情報抽出手段としての働きを行う。そして、ステップＳ３４０で、抽出された調査対象情報を構成する単語の中から、上記所定のキーワードと同一又は類似の単語を除く他の単語、すなわち共起キーワードを抽出する。インターネットサイト情報分析装置１０では、共起情報分析アナライザ１８ｃがこの共起キーワード抽出手段としての働きを行う。 Next, an embodiment of an Internet site information analysis method for performing co-occurrence information analysis will be described with reference to FIGS. First, an outline will be described using the flow shown in FIG. In step S310, sentence information is collected every time a predetermined period elapses from each Web site. In the Internet site information analyzing apparatus 10, the crawler 14, the article database 16, and the co-occurrence information analyzing analyzer 18c function as this information collecting means. Next, in step S320, the collected sentence information is broken down into words (parts of speech). In the Internet site information analyzing apparatus 10, the co-occurrence information analyzing analyzer 18c functions as the word dividing means. In step S330, sentence information including the same or similar to the predetermined keyword in the divided word group, that is, survey target information is extracted. At this time, in the Internet site information analysis apparatus 10, the co-occurrence information analysis analyzer 18c functions as this investigation target information extraction means. In step S340, other words excluding the same or similar words as the predetermined keyword, that is, co-occurrence keywords are extracted from the words constituting the extracted survey target information. In the Internet site information analysis apparatus 10, the co-occurrence information analysis analyzer 18c functions as this co-occurrence keyword extracting means.

ステップＳ３５０では、抽出された共起キーワードが調査対象情報に出現する頻度を基に、共起キーワードごとの評点計算を行う。インターネットサイト情報分析装置１０では、共起情報分析アナライザ１８ｃがこの共起キーワード評点計算手段としての働きを行う。次に、ステップＳ３６０で、共起キーワード群を共起キーワード評点の順に並び替えた共起情報リストを作成し、それを格納する。インターネットサイト情報分析装置１０では、共起情報分析アナライザ１８ｃと、共起情報分析結果データベース２２ｃとがこのソート手段としての働きを行う。そして、ステップＳ３７０は、利用者２６の要求に応じ、所定の期間が経過するごとに作成した共起情報リストを時系列に表して出力する。インターネットサイト情報分析装置１０では、共起情報分析結果データベース２２ｃと、ポータルサーバ２４が備える共起情報分析表示フレームワーク２４ｃとがこの共起情報分析出力手段としての働きを行う。 In step S350, a score is calculated for each co-occurrence keyword based on the frequency with which the extracted co-occurrence keyword appears in the survey target information. In the Internet site information analysis apparatus 10, the co-occurrence information analysis analyzer 18c functions as this co-occurrence keyword score calculation means. Next, in step S360, a co-occurrence information list in which the co-occurrence keyword groups are rearranged in the order of the co-occurrence keyword scores is created and stored. In the Internet site information analysis apparatus 10, the co-occurrence information analysis analyzer 18c and the co-occurrence information analysis result database 22c serve as the sorting means. In step S370, in response to a request from the user 26, the co-occurrence information list created every time a predetermined period elapses is output in a time series. In the Internet site information analysis apparatus 10, the co-occurrence information analysis result database 22c and the co-occurrence information analysis display framework 24c included in the portal server 24 function as the co-occurrence information analysis output means.

次に、上記共起情報分析の各ステップについて、詳細に説明する。図８に示すように各Ｗｅｂサイトには、複数の文章情報が存在する。ステップＳ３１０では、例えばＷｅｂサイト１であれば、文章情報ａ１，ａ２を収集する。さらにこれらの文章情報はステップＳ３２０において、名詞、形容詞、動詞等の単語（品詞）に分解される。 Next, each step of the co-occurrence information analysis will be described in detail. As shown in FIG. 8, each Web site has a plurality of pieces of text information. In step S310, for example, if it is Web site 1, sentence information a1 and a2 are collected. Further, in step S320, the sentence information is decomposed into words (parts of speech) such as nouns, adjectives and verbs.

ステップＳ３３０では、調査対象となる所定のキーワードが与えられると、そのキーワードと類似の単語群を類語辞典の一種であるシソーラスなどを用いて抽出し、そのキーワードと同一または類似の単語群のいずれかを含む文章情報、すなわち調査対象情報を抽出する。例えば図８の例によれば、「デジタルカメラ」というキーワードが与えられると、「デジカメ」「デジタルスチルカメラ」「デジタルビデオカメラ」といった俗称、略称あるいは正式名称その他の単語を類似語として抽出する。そして、「デジタルカメラ」およびその類似語を含む文章情報ａ１，ａ２，ｂ２を調査対象情報として抽出する。このように、本来的に調査すべき調査対象情報を漏れなく抽出することを可能にしている。 In step S330, when a predetermined keyword to be investigated is given, a word group similar to the keyword is extracted using a thesaurus which is a kind of thesaurus, and either one of the same or similar word groups as the keyword is extracted. Text information including, that is, survey target information is extracted. For example, in the example of FIG. 8, when the keyword “digital camera” is given, common names such as “digital camera”, “digital still camera”, and “digital video camera”, abbreviated names, and other words are extracted as similar words. Then, text information a1, a2, and b2 including “digital camera” and similar words are extracted as investigation target information. In this way, it is possible to extract the investigation object information that should be originally investigated without omission.

ステップＳ３４０では、抽出された調査対象情報を構成する単語の中から、上記所定のキーワードと同一又は類似の単語を除く他の単語、すなわち共起キーワードを抽出する。例えば、文章情報ａ１であれば「Ａ社」「Ｂ社」「性能」が共起キーワードに該当する。ここで、「（株）Ａ」「株式会社Ａ」「Ａ」「Ａ社」といった共起キーワードが別個に抽出された場合、すべて「Ａ社」と同義語であるとして問題なければ、一つの共起キーワードとして取りまとめて、次のステップに進めばよい。 In step S340, other words excluding words that are the same as or similar to the predetermined keyword, that is, co-occurrence keywords are extracted from the words constituting the extracted survey target information. For example, in the case of sentence information a1, “Company A”, “Company B”, and “Performance” correspond to the co-occurrence keywords. Here, when co-occurrence keywords such as “(A) Co., Ltd.”, “A Co., Ltd.”, “A”, and “Company A” are separately extracted, all are synonymous with “Company A” and there is no problem. Collect them as co-occurrence keywords and go to the next step.

ステップＳ３５０について、さらに詳細な処理について図９のフローに基づいて説明する。ステップＳ３５１では、情報収集したすべての文章情報の数をカウントする。例えば、図８の例において、情報収集をＷｅｂサイト１，２のみを対象に行ったとすれば、文章情報の総数は５とカウントされる。ステップＳ３５２では、調査対象情報に該当する文章情報の数をカウントする。例えば図８の例では、文章情報の総数５のうち、キーワード「デジタルカメラ」に対する調査対象情報の数は３とカウントされる。ステップＳ３５３では、ステップ３４０で抽出した共起キーワードと同一の単語が含まれる文章情報の数を、共起キーワードごとにカウントする。例えば図８の例では、文章情報の数５のうち、共起キーワード「Ａ社」が含まれる文章情報の数は３とカウントされる。ステップＳ３５４では、ステップＳ３４０で抽出した共起キーワードと同一の単語が含まれる調査対象情報の数を、各共起キーワードごとにカウントする。例えば図８の例では、「デジタルカメラ」に対する調査対象情報の数３のうち、共起キーワード「Ａ社」が含まれる調査対象情報の数は２とカウントされる。 Further detailed processing of step S350 will be described based on the flow of FIG. In step S351, the number of all sentence information collected is counted. For example, in the example of FIG. 8, if information collection is performed only on the Web sites 1 and 2, the total number of text information is counted as 5. In step S352, the number of text information corresponding to the survey target information is counted. For example, in the example of FIG. 8, out of the total number 5 of text information, the number of survey target information for the keyword “digital camera” is counted as 3. In step S353, the number of sentence information including the same word as the co-occurrence keyword extracted in step 340 is counted for each co-occurrence keyword. For example, in the example of FIG. 8, the number of text information including the co-occurrence keyword “Company A” is counted as 3 out of 5 text information. In step S354, the number of investigation target information including the same word as the co-occurrence keyword extracted in step S340 is counted for each co-occurrence keyword. For example, in the example of FIG. 8, the number of pieces of investigation target information including the co-occurrence keyword “Company A” is counted as two out of three pieces of investigation target information for “digital camera”.

ステップＳ３５５では、各共起キーワードごとにその共起キーワードの評点を計算する。ここで、共起キーワードの評点は、ステップＳ３５４のカウント数にＳ３５１のカウント数を積算したものを、Ｓ３５２のカウント数にＳ３５３のカウント数を積算したもので除算し、さらにそれを、２を底とする対数に換算したもの、と定義するのが好ましい。例えば図８の例では、共起キーワード「Ａ社」の評点は、ステップＳ３５１のカウント数である５、ステップＳ３５２のカウント数である３、ステップ３５３のカウント数である３、ステップＳ３５４のカウント数である２を用いて計算され、評点０．１５２が得られる。そして判断ステップであるステップＳ３５６において、すべての共起キーワードについての計算がされたか否かを判断し、ＮＯであれば次の共起キーワードについてステップＳ３５３からＳ３５５を繰り返し、ＹＥＳになった時点でステップＳ３５０が終了する。 In step S355, the score of the co-occurrence keyword is calculated for each co-occurrence keyword. Here, the score of the co-occurrence keyword is obtained by dividing the sum of the count number of S351 by the count number of Step S354 by the sum of the count number of S352 and the count number of S353. It is preferable to define that converted into a logarithm. For example, in the example of FIG. 8, the score of the co-occurrence keyword “Company A” is 5 which is the count number in step S351, 3 which is the count number in step S352, 3 which is the count number in step 353, and the count number in step S354. Is used to obtain a rating of 0.152. In step S356, which is a determination step, it is determined whether or not all co-occurrence keywords have been calculated. If NO, steps S353 to S355 are repeated for the next co-occurrence keyword. S350 ends.

なお、本実施形態のステップＳ３５０では、その相関関係の連鎖の要素をも評点計算に組み入れているという特徴を有している。本来、キーワード「デジタルカメラ」と共起キーワード「Ａ社」との間に相関関係がある場合には、共起キーワード「Ａ社」に対応して共起する「デジタルカメラ」以外のキーワード群との間にも相関関係が存在するものと考えるべきである。しかしながら、相関関係の連鎖にまで着目すると計算量が膨大になる問題もあり、一般的にはそこまでの処理は行われていなかった。本実施形態では、ステップＳ３５１のカウント数とステップＳ３５３のカウント数を計算式に盛り込むことによって、「デジタルカメラ」と「Ａ社」の相関関係の強さだけでなく、例えば「液晶テレビ」と「Ａ社」の相関関係の強さについても、その違いが相対的に各評点に反映させることができる。 Note that step S350 of the present embodiment has a feature that the correlation chain element is also incorporated in the score calculation. Originally, when there is a correlation between the keyword “digital camera” and the co-occurrence keyword “Company A”, It should be considered that there is a correlation between the two. However, if attention is paid to the chain of correlations, there is a problem that the amount of calculation becomes enormous, and generally, the processing up to that point has not been performed. In this embodiment, by incorporating the count number in step S351 and the count number in step S353 into the calculation formula, not only the strength of the correlation between “digital camera” and “Company A” but also “liquid crystal television” and “ Regarding the strength of the correlation of “Company A”, the difference can be relatively reflected in each score.

ステップＳ３６０では、共起キーワード群を共起キーワード評点の順に並び替えた共起情報リストを作成し、それを共起情報分析結果情報データベース２２ｃに格納する。さらにステップＳ３７０では、利用者２６の要求に応じ、所定の期間が経過するごとに作成した共起情報リストを時系列に表し、共起情報分析出力としてＰＵＬＬ型（利用者２６が必要に応じて情報を取り出す）で提供する。図１０は、Ｗｅｂサイト１，２を含むすべてのＷｅｂサイトについて評点計算を行った例である。例えばキーワード「デジタルカメラ」についてみると、共起キーワード「製品Ｗ」は２００７年７月の時点では評点が低くランク外であったが、２００７年８月には２位までランクアップしている。従って、「デジタルカメラ」の分野では、「製品Ｗ」が投稿者の話題の中心になってきていることが分かる。また、共起キーワード「Ｂ社」は２００７年７月の時点では２位にランクされていたが、２００７年８月には５位までランクダウンしている。従って、投稿者は「Ｂ社」に注目しなくなってきていることが分かる。 In step S360, a co-occurrence information list in which the co-occurrence keyword groups are rearranged in the order of the co-occurrence keyword scores is created and stored in the co-occurrence information analysis result information database 22c. Further, in step S370, the co-occurrence information list created every time a predetermined period elapses in response to a request from the user 26 is displayed in time series, and the PULL type (the user 26 as necessary) Information is taken out). FIG. 10 shows an example in which the score calculation is performed for all websites including the websites 1 and 2. For example, in the case of the keyword “digital camera”, the co-occurrence keyword “product W” had a low score and was not ranked as of July 2007, but it rose to the second place in August 2007. Therefore, it can be seen that in the field of “digital camera”, “product W” has become the center of the contributor's topic. Further, the co-occurrence keyword “Company B” was ranked second in July 2007, but was ranked down to fifth in August 2007. Therefore, it can be seen that the poster has stopped paying attention to “Company B”.

以上説明した共起情報分析（ステップＳ３１０からステップＳ３７０）によれば、所定のキーワードに共起する別のキーワードの変化を時系列で分析することによって、所定のキーワードの周囲に広がっていく投稿者の興味の変化を実態的に把捉することができる。 According to the co-occurrence information analysis described above (steps S310 to S370) , a poster who spreads around a predetermined keyword by analyzing changes in another keyword co-occurring with the predetermined keyword in time series. Can change the interests of the actual situation.

なお、本発明は上記実施形態に限定するものではなく、インターネットサイト情報分析装置１０は、上記のサイト活性度分析に係る作業手段またはステップを有する分析装置または分析方法に加え、上記の共起情報分析に係る作業手段またはステップを設ける構成にしても良い。 The present invention is not limited to the above embodiment, the Internet site information analyzer 10, in addition to the analyzer or analysis process having a working means or steps according to the above sites activity analysis, the above occurrence information You may make it the structure which provides the working means or step which concerns on an analysis .

また、ステップＳ２６０が出力する分析結果の表示フレームは、各データが有する複数の特性値の相対的な関係が視覚的に認識可能なものであればよく、本実施形態に例示したグラフ化イメージに限定するものではない。グラフの目盛を対数表示にしたり、凡例を付して複数の分析結果を重ねて表するなどして、より視覚に訴えるための工夫がなされるべきものである。 The display frame of the analysis result output in step S260 may be any graph as long as it can visually recognize the relative relationship between a plurality of characteristic values of each data. It is not limited. The graph scale should be displayed in logarithmic form, or a legend should be attached to display the results of multiple analysis.

また、ステップＳ２４０に定義する関連投稿率の計算式、ステップＳ２５０に定義する更新頻度の計算式、ステップＳ３５０に定義する共起キーワード評点の計算式は、調査分析の対象とする事物やその分野ごとの個別の事情など鑑みて定義したものであれば、上記実施形態の計算式に限定するものではない。例えば、ステップＳ３５０に定義した共起キーワード評点の計算式であれば、対数の底の値を変更したり、特定の情報の数について２乗した数値を代入するなど、細かく解析したい内容が特性値として顕著に表れるように別の計算式を定義してもよい。 Further, the calculation formula of the related posting rate defined in step S240, the calculation formula of the update frequency defined in step S250, and the calculation formula of the co-occurrence keyword score defined in step S350 are the items to be investigated and analyzed for each field As long as it is defined in view of individual circumstances, the calculation formula of the above embodiment is not limited. For example, in the case of the co-occurrence keyword score calculation formula defined in step S350, the content to be analyzed in detail, such as changing the base value of the logarithm or substituting a square value for the number of specific information, is the characteristic value. Another calculation formula may be defined so as to appear prominently.

なお、上記実施形態のような一連の処理動作をプログラムとして構築し、インターネットサイト情報分析装置１０として利用されるサーバーコンピュータにインストールし、ＣＰＵなどの制御手段によって実行させる他、そのプログラムをネットワークを介して流通させるようにしてもよい。また、構築されたプログラムをインターネットサイト情報分析装置１０として利用される各種のコンピュータに接続されるハードディスク装置、フレキシブルディスク、ＣＤ−ＲＯＭなどの可搬記憶媒体に格納し、コンピュータにインストールして実行させるようにしてもよい。 A series of processing operations as in the above embodiment is constructed as a program, installed in a server computer used as the Internet site information analysis apparatus 10 and executed by a control means such as a CPU, and the program is transmitted via a network. May be distributed. Further, the constructed program is stored in a portable storage medium such as a hard disk device, a flexible disk, or a CD-ROM connected to various computers used as the Internet site information analysis apparatus 10, and is installed and executed on the computer. You may do it.

１０インターネットサイト情報分析装置
１２Ｗｅｂサイト
１４クローラ
１６記事データベース
１８アナライザ
２０評価表現辞書群データベース
２２分析結果データベース
２４ポータルサーバ
２６利用者 DESCRIPTION OF SYMBOLS 10 Internet site information analyzer 12 Website 14 Crawler 16 Article database 18 Analyzer 20 Evaluation expression dictionary group database 22 Analysis result database 24 Portal server 26 User

Claims

インターネット上に存在するＷｅｂサイトに、コンピュータシステムにより自動的にアクセスしてその文章情報を収集し、分析を行うインターネットサイト情報分析方法において、
前記コンピュータシステムにより、
前記文章情報と各Ｗｅｂサイトの更新日情報を収集する情報収集ステップと、
前記文章情報を単語に分割する単語分割ステップと、
前記単語群の中から所定のキーワードと同一または類似する単語を抽出し、その単語を含む文章情報の数を関連情報投稿数として算出する関連情報投稿数計算ステップと、
各Ｗｅｂサイトから収集した文章情報数に占める前記関連情報投稿数の割合を、各Ｗｅｂサイト毎に算出して関連情報投稿率とする関連情報投稿率計算ステップと、
分析を行う基準日と前記更新日情報をもとに、各Ｗｅｂサイトの更新頻度を算出する更新頻度計算ステップと、
各Ｗｅｂサイト毎の前記関連情報投稿率と前記更新頻度とを相対比較して出力するサイト活性度分析出力ステップと、
から成る処理を行うことを特徴とするインターネットサイト情報分析方法。 In an Internet site information analysis method for automatically accessing a website existing on the Internet by a computer system to collect and analyze the text information,
By the computer system,
An information collecting step of collecting the sentence information and update date information of each website;
A word dividing step for dividing the sentence information into words;
Extracting a word that is the same as or similar to a predetermined keyword from the word group, and calculating a related information posting number calculating step for calculating the number of text information including the word as a related information posting number;
A related information posting rate calculating step of calculating a ratio of the number of related information postings to the number of text information collected from each Web site and calculating the related information posting rate for each Web site;
An update frequency calculating step for calculating an update frequency of each Web site based on a reference date for analysis and the update date information;
A site activity analysis output step of outputting the related information posting rate for each website and the update frequency relative to each other;
Internet site information analysis method and performing a process consisting of.

コンピュータシステムにより構成され、インターネット上に存在するＷｅｂサイトに前記コンピュータシステムによりアクセスしてその文章情報を収集し、分析を行うインターネットサイト情報分析装置において、
前記Ｗｅｂサイトの文章情報の収集を行う情報収集手段と、
前記文章情報を単語に分割する単語分割手段と、
前記単語群の中から所定のキーワードと同一または類似する単語を抽出し、その単語を含む文章情報の数を関連情報投稿数として算出する関連情報投稿数計算手段と、
各Ｗｅｂサイトから収集した文章情報数に占める前記関連情報投稿数の割合を、各Ｗｅｂサイト毎に算出して関連情報投稿率とする関連情報投稿率計算手段と、
各Ｗｅｂサイトの更新日付を収集する更新日情報収集手段と、
分析を行う基準日と前記更新日情報をもとに、各Ｗｅｂサイトの更新頻度を算出する更新頻度計算手段と、
各Ｗｅｂサイト毎の前記関連情報投稿率と前記更新頻度とを相対比較して出力するサイト活性度分析出力手段と、
を備えたことを特徴とするインターネットサイト情報分析装置 In an Internet site information analyzing apparatus configured by a computer system, accessing a Web site existing on the Internet by the computer system , collecting the sentence information, and performing analysis,
Information collecting means for collecting text information of the website;
Word dividing means for dividing the sentence information into words;
A related information posting number calculating means for extracting a word that is the same as or similar to a predetermined keyword from the word group and calculating the number of pieces of text information including the word as the related information posting number;
A related information posting rate calculating means for calculating a ratio of the number of related information postings to the number of text information collected from each Web site for each Web site to be a related information posting rate;
Update date information collecting means for collecting the update date of each website;
Update frequency calculation means for calculating the update frequency of each website based on the reference date for analysis and the update date information;
Site activity analysis output means for relatively comparing and outputting the related information posting rate for each Web site and the update frequency;
Internet site information analyzer characterized by comprising

前記サイト活性度分析出力手段は、前記関連情報投稿率と前記更新頻度とを２次元グラフに表して出力することを特徴とする請求項２記載のインターネットサイト情報分析装置。
3. The Internet site information analysis apparatus according to claim 2, wherein the site activity analysis output means outputs the related information posting rate and the update frequency in a two-dimensional graph.