CN103838797A - Method for optimizing mobile search engine - Google Patents

Method for optimizing mobile search engine Download PDF

Info

Publication number
CN103838797A
CN103838797A CN201210491498.7A CN201210491498A CN103838797A CN 103838797 A CN103838797 A CN 103838797A CN 201210491498 A CN201210491498 A CN 201210491498A CN 103838797 A CN103838797 A CN 103838797A
Authority
CN
China
Prior art keywords
wml
search engine
stu
mobile
page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201210491498.7A
Other languages
Chinese (zh)
Inventor
李勇
郑世超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DALIAN LINGDONG TECHNOLOGY DEVELOPMENT Co Ltd
Original Assignee
DALIAN LINGDONG TECHNOLOGY DEVELOPMENT Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DALIAN LINGDONG TECHNOLOGY DEVELOPMENT Co Ltd filed Critical DALIAN LINGDONG TECHNOLOGY DEVELOPMENT Co Ltd
Priority to CN201210491498.7A priority Critical patent/CN103838797A/en
Publication of CN103838797A publication Critical patent/CN103838797A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a method for optimizing a mobile search engine. The method comprises the following steps of designing a mobile search engine framework, establishing a URL list, editing a translator, and designing a WAP interface. Due to the fact that a mobile module is added on an existing Internet search engine framework according to the current situation of the mobile search engine, a mode for establishing the mobile search engine by HTML resources is provided. According to the mode, HTML web pages captured by web spiders are processed in a centralized mode, theme information extraction is conducted on the HTML web pages, then the theme information is converted into WML web pages which can be identified by a mobile phone and are stored in a WML snapshoot library, when a user click item record to check a specific web page, a system can not directly link the web page on the Internet but link a WML web page snapshot corresponding to the web page, and the requirement for mobile search of the user is met. In actual application, the mode is used for successfully establishing a mobile search engine facing the life service field and covering the catering, entertainment and yellow page information of nearly 40 cities in China.

Description

A kind of mobile search engine optimization method
Technical field
The present invention relates to development of Mobile Internet technology, particularly a kind of optimization method of mobile search engine.
Background technology
Search engine refers to according to certain strategy, uses specific computer program to gather information from internet, after information being organized and is processed, and for user provides retrieval service, the system by information display relevant user search to user.Now, along with the innovation of wireless communication technique and popularizing of mobile phone, mobile Internet access becomes development trend gradually, inquires about the demand of clothing, food, lodging and transportion--basic necessities of life information in order to meet user whenever and wherever possible, how to set up mobile search engine, becomes the focus of mobile network's application.Mobile Internet access is subject to the restriction of mobile phone terminal and transmission bandwidth, and pure html text only has minority intelligence type to support, most of mobile phones are only identified the language of wap protocol mark, as WML or xHTML.But the network information is mainly expressed with html language, the resource-constrained of WAP, cannot provide enough information to crawl the WAP page as the mobile search engine of information source merely.Therefore, how to break through restriction, make mobile phone client also can search the magnanimity information that derives from HTML, become one of subject matter of mobile search.For using mobile phone to browse html page, general method is to add a WAP gateway, in the time that mobile phone sends the request of browsing html web page, first reads this webpage by gateway, and converts it into corresponding WML, re-sends to mobile phone.This mode is also the current spread path that universal search engine is expanded to mobile search engine.But the mode of this real time translation, obviously performance and the bandwidth requirement to gateway is higher.
The present invention is directed to the present situation of mobile search engine, on the framework of existing internet search engine, add mobile module, propose a kind of HTML of utilization resource and set up the mode of mobile search engine, the html web page which captures by focusing on Web Spider, the html web page that Web Spider is captured is translated processing, translated into the snapshots of web pages of WML form, generated the snapshots of web pages of WML language, met user's mobile search demand.The mobile search engine of setting up with this technology, does not need the support of real time translation gateway, can expand easily existing search engine system.In actual applications, make successfully to set up in this way a mobile search engine towards service for life field, covered food and drink, amusement and the yellow page information in nearly 40 cities, the whole nation.
Summary of the invention
According to the present situation of mobile search engine, a kind of mobile search engine optimization method is proposed, comprise the following steps:
A, design mobile search engine framework
Comprise the following steps: this search engine framework is also made up of searcher, index, searcher, four parts of user interface, also has mobile module, as mobile search engine,
It comprises three parts:
Translater, the HTML page that spider is captured is converted into WML page;
WML snapshots of web pages storehouse, preserves the WML page after transforming;
WAP interface, by the user interface of mobile phone access;
B, set up url list
Deposit the webpage grabbing in web page library, and all hyperlink on webpage are deposited in url list;
C, editor's translater
Translater has home page filter, subject information filters and three parts of translation;
C.1 home page filter
First catalog page is filtered, will not translate, count and the ratio that links number according to the text section of webpage, carry out the character of paging, deposit index database in;
C.2 subject information filters
Extract the Topic relative part of webpage, select the tree-model of the STU-DOM that does not rely on information source,
Using the table of webpage, tr, div and tbody label node as piecemeal node, for the local correlation degree Local Correlativity for choice of a piece) and context dependent degree Contextual Correlativity weigh; Local correlation degree determines by piece internal chaining and content, and its computing formula can be expressed as:
LinkCount ( STU i ) = Σ j = 1 N LinkCount ( STUC ij )
CountentLenth ( STU i ) = Σ j = 1 N ContentLength ( STUC ij )
LocalCorrelativity ( STU i )
= LinkCount ( STU i ) CountentLenth ( STU i )
Wherein, ContentLength and LinkCount represent respectively word number and the link number in piece, j sub-block of expression;
Context dependent degree determines by piece internal chaining and father's piece content, and its computing formula can be expressed as:
Contextual Correlativ ity ( STU i )
= LinkCount ( STU i ) CountentLenth ( STU Pi )
Wherein, STU pirepresent STU ifather node;
The design's regulation local correlation degree threshold value is 2, and the threshold value of context dependent degree is 70;
C3. HTML is transformed to WML:
In the time that HTML piece transforms, first to remove the element that WML cannot process, as labels such as style, front, script; Then, set up the mapping table that html tag and WML label transform, according to relation list, HTML be converted into the readable WML of mobile phone,
The text cannot a screen display on mobile phone showing, need to carry out paging processing, and deposit in the snapshot storehouse of WML;
D, design WAP interface
WAP interface is the man-machine interaction query interface taking mobile phone as carrier; Adopt WML or xHTML language design; Content on design WAP is as far as possible terse: in the list page of Search Results, entry number is no more than at most ten.
Compared with prior art, the present invention has following beneficial effect:
1, the present invention can break through restriction, makes mobile phone client also can search the magnanimity information that derives from HTML, for mobile search provides information widely.
2, the mobile search engine that the present invention sets up with this technology, does not need the support of real time translation gateway, has departed from performance and the higher problem of bandwidth requirement to gateway, can expand easily existing search engine system.
Brief description of the drawings
The present invention has accompanying drawing 2 width, wherein:
Fig. 1 is mobile search engine system frame diagram.
Fig. 2 is mobile search interface schematic diagram.
Embodiment
A, design mobile search engine framework
Comprise the following steps: the same with general search automotive engine system, this search engine framework is also made up of searcher, index, searcher, four parts of user interface, add mobile module, make it to become and extend the mobile search engine of expanding out, it comprises three parts:
Translater, the HTML page that spider is captured is converted into WML page;
WML snapshots of web pages storehouse, preserves the WML page after transforming;
WAP interface, by the user interface of mobile phone access.
Basic framework as shown in Figure 1.
B。, set up url list
First the present invention is started by Web Spider, regularly automatically starts and captures internet site, deposits the webpage grabbing in web page library, and all hyperlink on webpage are deposited in url list.
C, editor's translater
Due to mobile search engine need fast, directly, refining Query Information is returned to user, but in the webpage that spider captures, not only exist part without theme page; And, even there is the page of theme conventionally also to have the irrelevant information of a large amount of and theme.Therefore directly translate and be not suitable for, according to the feature of mobile search, according to the feature of mobile search, translater is designed to home page filter, subject information filters and three parts of translation.
C.1 home page filter
First catalog page is filtered, will not translate, count and the ratio that links number according to the text section of webpage, carry out the character of paging, deposit index database in.Index carries out word segmentation processing by oneself through the web document capturing, and the position occurring in webpage by word and frequency computation part weights, then deposits word segmentation result in index database.
C.2 subject information filters
Extract the Topic relative part of webpage, selection does not rely on the tree-model of the STU-DOM of information source, using label nodes such as the <table> of webpage, <tr>, <div> and <tbody> as piecemeal node, weigh for local correlation degree for choice (Local Correlativity) and the context dependent degree (Contextual Correlativity) of a piece.Local correlation degree determines by piece internal chaining and content, and its computing formula can be expressed as:
LinkCount ( STU i ) = &Sigma; j = 1 N LinkCount ( STUC ij )
CountentLenth ( STU i ) = &Sigma; j = 1 N ContentLength ( STUC ij )
LocalCorrelativity ( STU i )
= LinkCount ( STU i ) CountentLenth ( STU i )
Wherein, ContentLength and LinkCount represent respectively word number and the link number in piece, j sub-block of expression.
Context dependent degree determines by piece internal chaining and father's piece content, and its computing formula can be expressed as:
Contextual Correlativ ity ( STU i )
= LinkCount ( STU i ) CountentLenth ( STU Pi )
Wherein, STU pirepresent STU ifather node.
The design's regulation local correlation degree threshold value is 2, and the threshold value of context dependent degree is 70.Html web page is carried out to subject information extraction.
C.3 HTML is transformed to WML
In the time that HTML piece transforms, first to remove the element that WML cannot process, as labels such as <style>, <front>, <script>.Then, set up the mapping table that html tag and WML label transform, according to relation list, HTML is converted into the readable WML of mobile phone, subject information is changed into the WML page that mobile phone can be identified, larger for word length, the text cannot a screen display on mobile phone showing, also needs to carry out paging processing, and deposits in the snapshot storehouse of WML.
D, design WAP interface
WAP interface is the man-machine interaction query interface taking mobile phone as carrier.Adopt WML or xHTML language design.Content on design WAP is as far as possible terse: in the list page of Search Results, entry number is no more than at most ten.In the time that user passes through WAP interface Query Information, first searcher carries out word segmentation processing to the information of user's input, and retrieve all records that comprise term, by calculating webpage weight and correlativity, query note is sorted, carry out set operation, the summary info that finally extracts each webpage feeds back to inquiring user.But in the time that user clicks bar record and watches concrete webpage, different from internet search engine, system can directly not link this webpage on internet, but links the corresponding WML snapshots of web pages of this webpage.
The method for designing according to the present invention, has developed service for life field mobile search engine www.zhaocha.mobi.It is to improve on the basis of original internet search engine www.zhaocha.com.cn, realizes effect as shown in Figure 2.

Claims (1)

1. a mobile search engine optimization method, is characterized in that: comprise the following steps:
A, design mobile search engine framework
Comprise the following steps: this search engine framework is also made up of searcher, index, searcher, four parts of user interface, also has mobile module, as mobile search engine,
It comprises three parts:
Translater, the HTML page that spider is captured is converted into WML page;
WML snapshots of web pages storehouse, preserves the WML page after transforming;
WAP interface, by the user interface of mobile phone access;
B, set up url list
Deposit the webpage grabbing in web page library, and all hyperlink on webpage are deposited in url list;
C, editor's translater
Translater has home page filter, subject information filters and three parts of translation;
C.1 home page filter
First catalog page is filtered, will not translate, count and the ratio that links number according to the text section of webpage, carry out the character of paging, deposit index database in;
C.2 subject information filters
Extract the Topic relative part of webpage, select the tree-model of the STU-DOM that does not rely on information source,
Using the table of webpage, tr, div and tbody label node as piecemeal node, for the local correlation degree Local Correlativity for choice of a piece) and context dependent degree Contextual Correlativity weigh; Local correlation degree determines by piece internal chaining and content, and its computing formula can be expressed as:
LinkCount ( STU i ) = &Sigma; j = 1 N LinkCount ( STUC ij )
CountentLenth ( STU i ) = &Sigma; j = 1 N ContentLength ( STUC ij )
LocalCorrelativity ( STU i )
= LinkCount ( STU i ) CountentLenth ( STU i )
Wherein, ContentLength and LinkCount represent respectively word number and the link number in piece, j sub-block of expression;
Context dependent degree determines by piece internal chaining and father's piece content, and its computing formula can be expressed as:
Contextual Correlativ ity ( STU i )
= LinkCount ( STU i ) CountentLenth ( STU Pi )
Wherein, STU pirepresent STU ifather node;
The design's regulation local correlation degree threshold value is 2, and the threshold value of context dependent degree is 70;
C3. HTML is transformed to WML:
In the time that HTML piece transforms, first to remove the element that WML cannot process, as labels such as style, front, script; Then, set up the mapping table that html tag and WML label transform, according to relation list, HTML be converted into the readable WML of mobile phone,
The text cannot a screen display on mobile phone showing, need to carry out paging processing, and deposit in the snapshot storehouse of WML;
D, design WAP interface
WAP interface is the man-machine interaction query interface taking mobile phone as carrier; Adopt WML or xHTML language design; Content on design WAP is as far as possible terse: in the list page of Search Results, entry number is no more than at most ten.
CN201210491498.7A 2012-11-27 2012-11-27 Method for optimizing mobile search engine Pending CN103838797A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210491498.7A CN103838797A (en) 2012-11-27 2012-11-27 Method for optimizing mobile search engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210491498.7A CN103838797A (en) 2012-11-27 2012-11-27 Method for optimizing mobile search engine

Publications (1)

Publication Number Publication Date
CN103838797A true CN103838797A (en) 2014-06-04

Family

ID=50802306

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210491498.7A Pending CN103838797A (en) 2012-11-27 2012-11-27 Method for optimizing mobile search engine

Country Status (1)

Country Link
CN (1) CN103838797A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106802914A (en) * 2016-12-06 2017-06-06 中国电子科技集团公司第三十二研究所 Heuristic multi-feature rule set webpage blocking method
CN107807937A (en) * 2016-09-09 2018-03-16 阿里巴巴集团控股有限公司 A kind of website SEO processing methods, apparatus and system
CN108062338A (en) * 2016-11-09 2018-05-22 北京国双科技有限公司 A kind of method and device of the homing capability of the evaluation function page
CN113641884A (en) * 2021-08-10 2021-11-12 南方电网数字电网研究院有限公司 Semantic-based power metering data processing method and device and computer equipment
CN113835740A (en) * 2021-11-29 2021-12-24 山东捷瑞数字科技股份有限公司 Search engine optimization-oriented automatic front-end code repairing method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101908071A (en) * 2010-08-10 2010-12-08 厦门市美亚柏科信息股份有限公司 Method and device thereof for improving search efficiency of search engine
CN102156742A (en) * 2011-04-19 2011-08-17 北京神州数码思特奇信息技术股份有限公司 Method and middleware for supporting structured document display with own browser of mobile phone
CN102325225A (en) * 2011-09-20 2012-01-18 北京鹏润鸿途科技有限公司 Method and device for playing video of mobile phone website

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101908071A (en) * 2010-08-10 2010-12-08 厦门市美亚柏科信息股份有限公司 Method and device thereof for improving search efficiency of search engine
CN102156742A (en) * 2011-04-19 2011-08-17 北京神州数码思特奇信息技术股份有限公司 Method and middleware for supporting structured document display with own browser of mobile phone
CN102325225A (en) * 2011-09-20 2012-01-18 北京鹏润鸿途科技有限公司 Method and device for playing video of mobile phone website

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
汲业等: "《一种移动搜索引擎设计与实现》", 《计算机应用与软件》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107807937A (en) * 2016-09-09 2018-03-16 阿里巴巴集团控股有限公司 A kind of website SEO processing methods, apparatus and system
CN107807937B (en) * 2016-09-09 2021-11-30 阿里巴巴集团控股有限公司 Website SEO processing method, device and system
CN108062338A (en) * 2016-11-09 2018-05-22 北京国双科技有限公司 A kind of method and device of the homing capability of the evaluation function page
CN108062338B (en) * 2016-11-09 2020-06-19 北京国双科技有限公司 Method and device for evaluating navigation capability of function page
CN106802914A (en) * 2016-12-06 2017-06-06 中国电子科技集团公司第三十二研究所 Heuristic multi-feature rule set webpage blocking method
CN113641884A (en) * 2021-08-10 2021-11-12 南方电网数字电网研究院有限公司 Semantic-based power metering data processing method and device and computer equipment
CN113835740A (en) * 2021-11-29 2021-12-24 山东捷瑞数字科技股份有限公司 Search engine optimization-oriented automatic front-end code repairing method
CN113835740B (en) * 2021-11-29 2022-02-22 山东捷瑞数字科技股份有限公司 Search engine optimization-oriented automatic front-end code repairing method

Similar Documents

Publication Publication Date Title
CN102930059B (en) Method for designing focused crawler
RU2522103C2 (en) Update notification method and browser
CN102043834B (en) Method for realizing searching by utilizing client and search client
CN101291304B (en) Transplantable network information sharing method
CN102708174B (en) Method and device for displaying rich media information in browser
CN102521251A (en) Method for directly realizing personalized search, device for realizing method, and search server
CN102760151B (en) Implementation method of open source software acquisition and searching system
CN104063454A (en) Search push method and device for mining user demands
CN103428076A (en) Method and device for transmitting information to multi-type terminals or applications
CN101097578A (en) Network resource searching method and system
CN101908071A (en) Method and device thereof for improving search efficiency of search engine
CN101599089A (en) The automatic search of update information on content of video service website and extraction system and method
CN103309884A (en) User behavior data collecting method and system
CN103838797A (en) Method for optimizing mobile search engine
CN102521232B (en) Distributed acquisition and processing system and method of internet metadata
CN102117331B (en) Video search method and system
CN102193798B (en) Method for automatically acquiring Open application programming interface (API) based on Internet
CN102722501A (en) Search engine and realization method thereof
CN102722499A (en) Search engine and implementation method thereof
CN104252348A (en) Webpage access statistics method and device based on browser
CN103389972A (en) Method and device for obtaining text based on really simple syndication (RSS)
CN104090923A (en) Method and device for displaying rich media information in browser
CN103970800A (en) Method and system for extracting and processing webpage related keywords
CN100504877C (en) Method and device for collecting web page action
CN101008946A (en) Search method of Chinese mobile communication information and device thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20140604