WO2015120713A1 - Method and apparatus for acquiring entry, computer storage medium and device - Google Patents

Method and apparatus for acquiring entry, computer storage medium and device Download PDF

Info

Publication number
WO2015120713A1
WO2015120713A1 PCT/CN2014/085481 CN2014085481W WO2015120713A1 WO 2015120713 A1 WO2015120713 A1 WO 2015120713A1 CN 2014085481 W CN2014085481 W CN 2014085481W WO 2015120713 A1 WO2015120713 A1 WO 2015120713A1
Authority
WO
WIPO (PCT)
Prior art keywords
eyeball
area
interest
word
candidate
Prior art date
Application number
PCT/CN2014/085481
Other languages
French (fr)
Chinese (zh)
Inventor
陈晓昕
吴先超
肖日新
Original Assignee
百度在线网络技术(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百度在线网络技术(北京)有限公司 filed Critical 百度在线网络技术(北京)有限公司
Publication of WO2015120713A1 publication Critical patent/WO2015120713A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/02Input arrangements using manually operated switches, e.g. using keyboards or dials
    • G06F3/023Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
    • G06F3/0233Character input methods
    • G06F3/0236Character input methods using selection techniques to select from displayed items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements

Definitions

  • the present invention relates to input method technology, and in particular, to a method and device for acquiring a term, a computer storage medium and a device.
  • Input method refers to the encoding method used to input various characters into the terminal.
  • the client of the input method software can use the loaded dictionary, that is, the vocabulary and the word frequency contained in the dictionary, to display the sorting of various candidate terms to the user, so as to facilitate user input.
  • the frequency of the use of the terms and the terms that is, the word frequency
  • the server is periodically collected by the server to update various professional dictionaries, for example, the newly appearing words are recognized as new words added to the dictionary. Or, for example, identify some words that are frequently used as hot words, and so on.
  • An aspect of the present invention provides a method for obtaining an entry, including:
  • Select at least one candidate term as a new word and/or a hot word further provide an implementation manner, the performing a tracking operation on the eyeball of the user to obtain the region of interest of the user, including: acquiring video information of the eyeball;
  • the region of interest of the eyeball is determined as the region of interest of the user based on the location area of the eyeball and the movable region of the eyeball.
  • the attention condition includes at least one of a attention time and a frequency of interest.
  • the selecting at least one candidate term as a new word and/or a hot word includes:
  • a candidate term that does not appear in the pre-configured input method dictionary is determined as a new word.
  • the aspect as described above and any possible implementation manner further provide an implementation manner, wherein the selecting at least one candidate term as a new word and/or a hot word includes:
  • a candidate term appearing in the pre-configured input method dictionary is determined as a candidate hot word; determining a heat value of the candidate hot word according to a word frequency of the candidate hot word; and setting the heat value to be greater than or equal to the heat threshold
  • the candidate hot words are identified as hot words.
  • a tracking unit configured to perform a tracking operation on a user's eyeball to obtain an interest area of the user
  • An obtaining unit configured to acquire text information in the region of interest
  • a word unit for performing a word-cutting operation on the text information to obtain a candidate term
  • a selecting unit configured to select at least one candidate term as a new word and/or a hot word.
  • the region of interest of the eyeball is determined as the region of interest of the user based on the location area of the eyeball and the movable region of the eyeball.
  • the region of interest of the eyeball satisfies the condition of interest, the region of interest of the eyeball is determined to be the region of interest of the user.
  • the attention condition includes at least one of a attention time and a frequency of interest.
  • the selecting unit is specifically used to
  • a candidate term that does not appear in the pre-configured input method dictionary is determined as a new word.
  • the candidate term appearing in the pre-configured input method dictionary is determined as a candidate hot word; determining the heat value of the selected hot word according to the word frequency of the selected hot word; and the heat A candidate hot word whose value is greater than or equal to the heat threshold is determined as a hot word.
  • a computer storage medium is provided, the computer storage medium being encoded with a computer program, the program, when executed by one or more computers, causing the one or more computers to perform the following operations:
  • the present invention provides an apparatus comprising at least one processor, a memory, and at least one computer program; the at least one computer program being stored in the memory and executed by the at least one processor; Includes instructions to do the following:
  • the embodiment of the present invention performs a tracking operation on a user's eyeball to obtain a region of interest of the user, and then acquires text information in the region of interest, and performs word cutting on the text information. Operation, to obtain candidate terms, enabling selection of at least one candidate term as a new word and/or a hot word, due to the text information of the user extracted by the region of interest of the user's current reading behavior, The acquisition operation of the candidate term, therefore, can recognize the new word and/or the hot word in time based on the text information, thereby improving the timeliness of the term acquisition.
  • FIG. 1 is a schematic flowchart of a method for acquiring an entry according to an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of an apparatus for acquiring an entry according to another embodiment of the present invention. detailed description
  • terminals involved in the embodiments of the present invention may include, but are not limited to, a mobile phone, a personal digital assistant (PDA), a wireless handheld device, a tablet computer, and a personal computer (PC). ),
  • PDA personal digital assistant
  • PC personal computer
  • MP3 player MP4 player, etc.
  • FIG. 1 is a schematic flowchart of a method for acquiring an entry according to an embodiment of the present invention, as shown in FIG. 1 .
  • the tracking operation of the user's eyeball may be specifically performed on the user interface.
  • the user interface may be a World Wide Web (Web) page displayed by the terminal, or may also be an application document displayed by the terminal, for example, an email, a WORD document, a TXT document, a PDF document, etc., the present invention This is not particularly limited.
  • execution body of 01 ⁇ 1 04 may be an identification device, which may be located in the present In the local client, for offline identification, or can be located in the server on the network side for online identification, or some functions can be located in the client, and some functions are located in the server for offline and online identification. This embodiment does not limit this.
  • the client may be an input method application installed on the terminal, or may be a webpage of the browser, as long as the entry can be implemented to provide an objective for identifying new words and/or hot words.
  • the existence form is acceptable, and this embodiment does not limit this.
  • the tracking operation of the user's eyeball is performed to obtain the region of interest of the user, thereby acquiring text information in the region of interest, and performing a word-cutting operation on the text information to obtain candidate terms.
  • Equivalently selecting at least one candidate term as a new word and/or a hot word, and performing the acquisition operation of the candidate term by using the text information of the user's interest extracted by the region of interest of the current reading behavior of the user The ability to identify new words and/or hot words in time based on these textual information, thereby improving the timeliness of term acquisition.
  • the technical solution provided by the present invention it is possible to promptly use the recognized new words and/or hot words to update various professional dictionaries loaded by the input method, that is, the input method dictionary, which can further effectively improve the dictionary of the input method. accuracy.
  • video information of the eyeball may be specifically acquired.
  • the video information of the eyeball may be composed of a plurality of frame images, and may be collected by a camera. Further, based on the video information, a location area of the eyeball is determined. Then, based on the video information, determining a movable path of the eyeball, and determining a movable area of the eyeball according to the movable path. Since the motion arc of the human eye is in a fixed interval, the movable path corresponding to the eyeball can be determined according to the video information.
  • the motion path can be an exact value or a motion interval.
  • the movable path Through the movable path, it can be further calculated based on the movable The reachable area of the road force, which is the movable area of the eyeball. Finally, the region of interest of the eyeball can be determined according to the location area of the eyeball and the movable area of the eyeball as the region of interest of the user.
  • a portion of the eyeball in a movable region of the eyeball may be determined as a region of interest of the eyeball. If the region of interest of the eyeball satisfies the condition of interest, the region of interest of the eyeball is determined to be the region of interest of the user.
  • the attention condition may include, but is not limited to, at least one of attention time and frequency of interest.
  • the staying area of the eyeball in the movable area of the eyeball is greater than or equal to 3 seconds, it may be determined that the attention area of the eyeball is the user's interest area.
  • the area of interest of the eyeball may be determined to be the user's Area of interest.
  • various text recognition methods in the prior art may be used to obtain text information in the region of interest. No particular limitation is imposed.
  • a partial screenshot of the region of interest surrounded by a user interface having text information may be acquired, and then the obtained partial screenshot may be subjected to text recognition to obtain text information in the region of interest.
  • location information of the region of interest may be acquired, and corresponding text information is determined according to the location information as text information in the region of interest.
  • the word-cutting operation of the obtained text information may be specifically performed by using various word-cutting methods in the prior art.
  • the word segmentation method based on string matching or, for example, a word segmentation method based on understanding, or, for example, a statistical word segmentation method, is not particularly limited in the present invention.
  • the word-cutting method refer to related content in the prior art, and details are not described herein again.
  • the candidate term that does not appear in the pre-configured input method dictionary may be specifically determined as a new word.
  • any candidate term in the candidate term obtained by the word-cutting operation may be acquired. If the candidate term does not appear in the pre-configured input method dictionary, the candidate term can be determined as a new word.
  • pre-configured input method dictionary may be configured on the server on the network side, or may be configured on the local client. This embodiment does not specifically limit this.
  • the candidate term appearing in the pre-configured input method dictionary may be specifically determined as a candidate hot word. Further, a heat value of the candidate hot word is determined according to a word frequency in which the candidate hot word appears. Then, the candidate hot words whose heat value is greater than or equal to the heat threshold can be determined as a hot word.
  • any candidate term in the candidate term obtained by the word-cutting operation may be acquired. If the candidate term has appeared in the pre-configured input method dictionary, the candidate term can be marked as a candidate hot word. Then, according to the input method dictionary, the word frequency of the candidate hot words occurring within a specified time range may be obtained, and according to the word frequency, the heat value of the candidate hot words may be determined. Finally, the candidate hot words with the heat value greater than or equal to the heat threshold can be determined. For hot words.
  • the heat value of the candidate hot words (the average score of all candidate hot words * the average word frequency of all candidate hot words + the score of the candidate hot words * the total word frequency of the candidate hot words in the total statistical time) I (the average word frequency of all candidate hot words + the total word frequency of the candidate hot words in the total statistical time), determine the heat value of the candidate hot words. among them,
  • the score of the candidate hot words the frequency of the candidate hot words in the most recent unit statistical time / the total word frequency of the candidate hot words in the total statistical time.
  • H has no four candidate hot words: hot words A, hot words B, hot words C, and hot words D, and unit statistical time. For one day, the total statistical time is two days.
  • the word frequency that appears during the two days of 2013-12-18 and 2013-12-19 is shown in the following table:
  • the scores of the four candidate hot words are 0.74, 0.52, 0.8, and 0.82, respectively, based on the historical data of 2013-12-18 and 2013-12-19.
  • the candidate hot words A have an additional 135 points, and each person gave 0.74 points. According to the calculation formula described above, these four points.
  • the heat values of the candidate hot words can be:
  • the order from big to small is as follows:
  • the candidate hot word D appears the most frequently, so the ranking is also the highest. If the heat value of the candidate hot word is not less than the preset heat threshold, the candidate hot word D can be determined as a hot word.
  • the local input method dictionary can be further updated by using these terms, or the input method dictionary of the cloud (network side) can be further updated by using these terms.
  • This embodiment is not particularly limited.
  • the local input method dictionary or the input method dictionary of the cloud may be updated by using at least one of the statistical Ngram information and/or the Npos information. For details, refer to related content in the prior art, and details are not described herein. .
  • the technical solution provided in this embodiment can not only identify new words and/or hot words for one user, but also effectively organize and analyze the recognition results of multiple users for multiple users. , to get new words and / or hot words for multiple users.
  • the selected new word and/or hot word may be further specifically displayed. For example, icon characters may be added to these new words and/or hot words; or, for example, these new words and/or hot words may be presented at particular candidate locations.
  • the tracking operation of the user's eyeball is performed to obtain the region of interest of the user, and then the text information in the region of interest is acquired, and the text information is subjected to a word-cutting operation to obtain a candidate.
  • the entry enables selection of at least one candidate term as a new word and/or a hot word, and the candidate term is acquired due to the text information of the user's interest extracted by the region of interest of the user's current reading behavior.
  • the operation therefore, can identify new words and/or hot words in time based on the text information, thereby improving the timeliness of the entry of the terms.
  • the technical solution provided by the present invention it is possible to promptly use the recognized new words and/or hot words to update various professional dictionaries loaded by the input method, that is, the input method dictionary, which can further effectively improve the dictionary of the input method. accuracy.
  • the acquiring means of the term of the embodiment may include a tracking unit 21, an obtaining unit 22, a word cutting unit 23, and a selecting unit 24.
  • the tracking unit 21 is configured to perform a tracking operation on the eyeball of the user to obtain a region of interest of the user, and an acquiring unit 22, configured to acquire text information in the region of interest; Performing a word-cutting operation on the text information to obtain a candidate term; the selecting unit 24 is configured to select at least one candidate term as a new word and/or a hot word.
  • the acquiring device of the term provided in this embodiment may be located in a local client for offline identification, or may be located in a server on the network side for online identification, or may be partially located. In the client, some functions are located in the server for offline and online identification. This embodiment does not limit this.
  • the client may be an input method application installed on the terminal, or may be a webpage of the browser, as long as the entry can be implemented to provide an objective for identifying new words and/or hot words.
  • the existence form is acceptable, and this embodiment does not limit this.
  • the tracking operation of the user's eyeball is performed by the tracking unit to obtain the region of interest of the user, and then the text information in the region of interest is acquired by the acquiring unit, and the text information is cut by the word-cutting unit.
  • a word operation to obtain a candidate term such that the selection unit can select at least one candidate term as a new word and/or a hot word, the text of interest of the user extracted by the region of interest of the user's current reading behavior
  • the information is used to perform the acquisition operation of the candidate term, and therefore, the new word and/or the hot word can be recognized in time based on the text information, thereby improving the timeliness of the term acquisition.
  • the tracking unit 21 may specifically perform a tracking operation on the user's eye on the user interface.
  • the user interface may be a World Wide Web (Web) page displayed by the terminal, or may also be an application document displayed by the terminal, for example, an email, a WORD document, a TXT document, a PDF document, etc., the present invention This is not particularly limited.
  • the tracking unit 21 may be specifically configured to acquire video information of the eyeball; and determine, according to the video information, a location area of the eyeball; Video information, determining a movable path of the eyeball, and determining a movable area of the eyeball according to the movable path; and determining the eyeball according to a positional area of the eyeball and a movable area of the eyeball The area of interest, as the area of interest of the user.
  • the video information of the eyeball may be composed of a plurality of frame images, and may be collected by using a camera. Since the motion arc of the human eye is in a fixed interval, the movable path corresponding to the eyeball can be determined based on the video information. The motion path can be an exact value or a motion interval. From the movable path, the reachable area based on the movable path strength can be further calculated, which is the movable area of the eyeball.
  • the tracking unit 21 may specifically determine a portion of the eyeball in a movable region of the eyeball as a region of interest of the eyeball; if the region of interest of the eyeball satisfies a condition of interest, determining The region of interest of the eyeball is the region of interest of the user.
  • the attention condition may include, but is not limited to, at least one of a attention time and a frequency of interest. For example, if the staying area of the eyeball in the movable area of the eyeball is greater than or equal to 3 seconds, the tracking unit 21 may determine that the attention area of the eyeball is the user's area of interest.
  • the tracking unit 21 may determine the region of interest of the eyeball. Is the area of interest of the user.
  • the acquiring unit 22 may obtain the text information in the region of interest by using various text recognition methods in the prior art. This is not particularly limited.
  • the acquiring unit 22 may acquire a partial screenshot of the region of interest surrounded by a user interface having text information, and then, may perform text recognition on the obtained partial screenshot to obtain the interested Text information within the area.
  • the acquiring unit 22 may acquire location information of the region of interest, and determine corresponding text information as text information in the region of interest according to the location information.
  • the word-cutting unit 23 may perform a word-cutting operation on the obtained text information by using various word-cutting methods in the prior art.
  • the word segmentation method based on string matching, or, for example, a word segmentation method based on understanding, or, for example, a statistical word segmentation method is not particularly limited in the present invention.
  • the selecting unit 24 may be specifically configured to determine a candidate term that does not appear in the pre-configured input method dictionary as a new word.
  • the selecting unit 24 may acquire any candidate term in the candidate term obtained by the word-cutting operation; if the candidate term does not appear in the pre-configured input method dictionary, the candidate term may be Determined to be a new word.
  • pre-configured input method dictionary may be configured on the server on the network side, or may be configured on the local client. This embodiment does not specifically limit this.
  • the selecting unit 24 may be specifically configured to determine a candidate term that appears in a pre-configured input method dictionary as a candidate hot word; The word frequency of the selected hot word is determined, and the heat value of the hot word is determined; and the candidate hot word whose heat value is greater than or equal to the heat value is determined as a hot word.
  • the selecting unit 24 may obtain any candidate term in the candidate term obtained by the word-cutting operation; if the candidate term has appeared in the pre-configured input method dictionary, the candidate term may be Marking as a candidate hot word; then, according to the input method dictionary, obtaining a word frequency of the candidate hot word within a specified time range, and determining a heat value of the candidate hot word according to the word frequency; finally, A candidate hot word whose heat value is greater than or equal to the heat threshold is determined as a hot word.
  • the heat value of the candidate hot words (the average score of all candidate hot words * the average word frequency of all candidate hot words + the score of the candidate hot words * the total word frequency of the candidate hot words in the total statistical time) I (the average word frequency of all candidate hot words + the total word frequency of the candidate hot words in the total statistical time), determine the heat value of the candidate hot words. among them,
  • Score of candidate hot words word frequency/candidate of candidate hot words in the most recent unit statistical time The total word frequency of the hot words in the total statistical time.
  • the acquiring device of the term provided by the embodiment may further use the term to update the local input method dictionary, or may further utilize the terms.
  • the input method dictionary of the cloud (network side) is updated, which is not particularly limited in this embodiment.
  • the local input method dictionary or the input method dictionary of the cloud may be updated by using at least one of the statistical Ngram information and/or the Npos information.
  • the technical solution provided in this embodiment can not only identify new words and/or hot words for one user, but also effectively organize and analyze the recognition results of multiple users for multiple users. , to get new words and / or hot words for multiple users.
  • the acquiring device of the term provided by the embodiment may further perform special presentation on the selected new word and/or hot word.
  • icon characters may be added to these new words and/or hot words; or, for example, these new words and/or hot words may be presented at particular candidate locations.
  • the tracking operation of the user's eyeball is performed by the tracking unit to obtain the region of interest of the user, and then the text information in the region of interest is acquired by the acquiring unit, and the text is read by the word-cutting unit.
  • the information is subjected to a word-cutting operation to obtain a candidate term, so that the selecting unit can select at least one candidate term as a new word and/or a hot word, and the user sense extracted by the region of interest of the user's current reading behavior
  • the text information of interest is used to perform the acquisition operation of the candidate term. Therefore, new words and/or hot words can be recognized in time based on the text information, thereby improving the timeliness of the term acquisition.
  • the newly identified new one can be utilized in time. Words and/or hot words update the various professional dictionaries loaded by the input method, that is, the input method dictionary, which can further effectively improve the accuracy of the input method dictionary.
  • the disclosed system, apparatus and method may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not executed.
  • the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical, mechanical or otherwise.
  • the components displayed as units may or may not be physical units, i.e., may be located in one place, or may be distributed over multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
  • the above-described integrated unit implemented in the form of a software functional unit can be stored in a computer readable storage medium.
  • the above software functional unit is stored in a storage medium, package A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) or a processor to perform some of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a removable hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program code. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

A method and apparatus for acquiring an entry. A tracking operation is carried out on eyeballs of a user to obtain a region of interest of the user; then text information in the region of interest is acquired; and a word segmentation operation is carried out on the text information to obtain candidate entries, so that at least one candidate entry can be selected to serve as a new word and/or hot word. Text information the user is interested in is extracted from a region focused by the user in the current reading behavior, and an acquisition operation of a candidate entry is carried out by means of the text information, so that a new word and/or hot word can be identified in time on the basis of the text information, and the entry acquisition timeliness can be improved.

Description

词条的获取方法、 装置、 计算机存储介质及设备 本申请要求了 申请日 为 2014 年 02 月 1 1 日 , 申请号为 20141 0047094.8, 发明名称为"词条的获取方法及装置"的中国专利申请 的优先权。  Method for acquiring entry, device, computer storage medium and device The present application claims a Chinese patent application whose application date is February 1, 2014, application number is 20141 0047094.8, and the invention name is "acquisition method and device for entry" Priority.
技术领域 Technical field
本发明涉及输入法技术, 特别涉及一种词条的获取方法、 装置、 计 算机存储介质及设备。  The present invention relates to input method technology, and in particular, to a method and device for acquiring a term, a computer storage medium and a device.
背景技术 Background technique
输入法,是指为将各种字符输入终端而釆用的编码方法,不同语言、 国家、 或地区, 有多种不同的输入法, 例如, 搜狗拼音输入法、 百度输 入法、 QQ拼音输入法等。 一般来说, 输入法软件的客户端可以釆用加载 的字典即词库和字典中包含的词频, 向用户展现各类候选词条的排序, 以方便用户的输入。 现有技术中, 为了满足用户的输入需求, 通过服务 器定期釆集词条和词条的使用频率即词频,以更新各类专业字典,例如, 将新出现的词条识别为新词添加到字典中, 或者, 再例如, 将一些使用 频率高的词条识别为热词, 等。  Input method refers to the encoding method used to input various characters into the terminal. There are many different input methods for different languages, countries, or regions, for example, Sogou Pinyin input method, Baidu input method, QQ Pinyin input method. Wait. In general, the client of the input method software can use the loaded dictionary, that is, the vocabulary and the word frequency contained in the dictionary, to display the sorting of various candidate terms to the user, so as to facilitate user input. In the prior art, in order to meet the input requirements of the user, the frequency of the use of the terms and the terms, that is, the word frequency, is periodically collected by the server to update various professional dictionaries, for example, the newly appearing words are recognized as new words added to the dictionary. Or, for example, identify some words that are frequently used as hot words, and so on.
然而, 在一些情况下, 新词和 /或热词会大量涌现, 例如, 网络语言 的出现如酱紫 (这样子) 、 表(不要) 、 杯具 (悲剧) 等, 或者, 再例 如, 突发事件如台风海燕, 等, 现有的技术方案无法及时地将这些新词 和 /或热词识别出来更新输入法所加载的各类专业字典即输入法字典,从 而导致了词条获取的及时性的降低。  However, in some cases, new words and/or hot words will emerge in large numbers, for example, the emergence of online languages such as sauce purple (such as), table (not), cups (tragedy), etc., or, for example, bursts Events such as Typhoon Haiyan, etc., the existing technical solutions can not recognize these new words and / or hot words in a timely manner to update the various professional dictionaries loaded by the input method, that is, the input method dictionary, resulting in the timeliness of the acquisition of the terms. The reduction.
发明内容 本发明的多个方面提供一种词条的获取方法、 装置、 计算机存储介 质及设备, 用以提高词条获取的及时性。 本发明的一方面, 提供一种词条的获取方法, 包括: Summary of the invention Aspects of the present invention provide a method, an apparatus, a computer storage medium, and a device for acquiring a term, which are used to improve the timeliness of term acquisition. An aspect of the present invention provides a method for obtaining an entry, including:
对用户的眼球进行跟踪操作, 以获得所述用户的感兴趣区域;  Tracking the user's eyeball to obtain the user's area of interest;
获取所述感兴趣区域内的文本信息;  Obtaining text information in the region of interest;
对所述文本信息进行切词操作, 以获得候选词条;  Performing a word-cutting operation on the text information to obtain a candidate term;
选择至少一个候选词条, 以作为新词和 /或热词。 如上所述的方面和任一可能的实现方式,进一步提供一种实现方式, 所述对用户的眼球进行跟踪操作,以获得所述用户的感兴趣区域,包括: 获取所述眼球的视频信息;  Select at least one candidate term as a new word and/or a hot word. The aspect as described above, and any possible implementation manner, further provide an implementation manner, the performing a tracking operation on the eyeball of the user to obtain the region of interest of the user, including: acquiring video information of the eyeball;
根据所述视频信息, 确定所述眼球的位置区域;  Determining a location area of the eyeball according to the video information;
根据所述视频信息, 确定所述眼球的可运动路径, 以及根据所述可 运动路径, 确定所述眼球的可移动区域;  Determining a movable path of the eyeball according to the video information, and determining a movable area of the eyeball according to the movable path;
根据所述眼球的位置区域和所述眼球的可移动区域, 确定所述眼球 的关注区域, 以作为所述用户的感兴趣区域。 如上所述的方面和任一可能的实现方式,进一步提供一种实现方式, 所述根据所述眼球的位置和所述眼球的可移动区域, 确定所述眼球的关 注区域, 以作为所述用户的感兴趣区域, 包括:  The region of interest of the eyeball is determined as the region of interest of the user based on the location area of the eyeball and the movable region of the eyeball. An aspect as described above, and any possible implementation, further providing an implementation, determining, according to a position of the eyeball and a movable area of the eyeball, an area of interest of the eyeball as the user Areas of interest, including:
将所述眼球的位置区域在所述眼球的可移动区域内的部分, 确定为 所述眼球的关注区域; 若所述眼球的关注区域满足关注条件, 确定所述眼球的关注区域为 所述用户的感兴趣区域。 如上所述的方面和任一可能的实现方式,进一步提供一种实现方式, 所述关注条件包括关注时间和关注频次中的至少一项。 如上所述的方面和任一可能的实现方式,进一步提供一种实现方式, 所述选择至少一个候选词条, 以作为新词和 /或热词, 包括: a portion of the eyeball in a movable region of the eyeball as a region of interest of the eyeball; If the region of interest of the eyeball satisfies the condition of interest, the region of interest of the eyeball is determined to be the region of interest of the user. The aspect as described above and any possible implementation manner further provide an implementation manner, where the attention condition includes at least one of a attention time and a frequency of interest. The aspect as described above and any possible implementation manner further provide an implementation manner, where the selecting at least one candidate term as a new word and/or a hot word includes:
将没有出现在预先配置的输入法字典中的候选词条确定为新词。 如上所述的方面和任一可能的实现方式,进一步提供一种实现方式, 所述选择至少一个候选词条, 以作为新词和 /或热词, 包括:  A candidate term that does not appear in the pre-configured input method dictionary is determined as a new word. The aspect as described above and any possible implementation manner further provide an implementation manner, wherein the selecting at least one candidate term as a new word and/or a hot word includes:
将出现在预先配置的输入法字典中的候选词条, 确定为候选热词; 根据所述候选热词出现的词频, 确定所述候选热词的热度值; 将热度值大于或等于热度阔值的候选热词, 确定为热词。 本发明的另一方面, 提供一种词条的获取装置, 包括:  A candidate term appearing in the pre-configured input method dictionary is determined as a candidate hot word; determining a heat value of the candidate hot word according to a word frequency of the candidate hot word; and setting the heat value to be greater than or equal to the heat threshold The candidate hot words are identified as hot words. Another aspect of the present invention provides an apparatus for acquiring an entry, including:
跟踪单元, 用于对用户的眼球进行跟踪操作, 以获得所述用户的感 兴趣区域;  a tracking unit, configured to perform a tracking operation on a user's eyeball to obtain an interest area of the user;
获取单元, 用于获取所述感兴趣区域内的文本信息;  An obtaining unit, configured to acquire text information in the region of interest;
切词单元, 用于对所述文本信息进行切词操作, 以获得候选词条; 选择单元, 用于选择至少一个候选词条, 以作为新词和 /或热词。 如上所述的方面和任一可能的实现方式,进一步提供一种实现方式, 所述跟踪单元, 具体用于 a word unit for performing a word-cutting operation on the text information to obtain a candidate term; and a selecting unit, configured to select at least one candidate term as a new word and/or a hot word. The foregoing aspect, and any possible implementation manner, further provide an implementation manner, where the tracking unit is specifically used to
获取所述眼球的视频信息;  Obtaining video information of the eyeball;
根据所述视频信息, 确定所述眼球的位置区域;  Determining a location area of the eyeball according to the video information;
根据所述视频信息, 确定所述眼球的可运动路径, 以及根据所述可 运动路径, 确定所述眼球的可移动区域; 以及  Determining a movable path of the eyeball according to the video information, and determining a movable area of the eyeball according to the movable path;
根据所述眼球的位置区域和所述眼球的可移动区域, 确定所述眼球 的关注区域, 以作为所述用户的感兴趣区域。 如上所述的方面和任一可能的实现方式,进一步提供一种实现方式, 所述跟踪单元, 具体用于  The region of interest of the eyeball is determined as the region of interest of the user based on the location area of the eyeball and the movable region of the eyeball. The foregoing aspect, and any possible implementation manner, further provide an implementation manner, where the tracking unit is specifically used to
将所述眼球的位置区域在所述眼球的可移动区域内的部分, 确定为 所述眼球的关注区域; 以及  a portion of the eyeball in a movable region of the eyeball as a region of interest of the eyeball;
若所述眼球的关注区域满足关注条件, 确定所述眼球的关注区域为 所述用户的感兴趣区域。 如上所述的方面和任一可能的实现方式,进一步提供一种实现方式, 所述关注条件包括关注时间和关注频次中的至少一项。 如上所述的方面和任一可能的实现方式,进一步提供一种实现方式, 所述选择单元, 具体用于  If the region of interest of the eyeball satisfies the condition of interest, the region of interest of the eyeball is determined to be the region of interest of the user. The aspect as described above and any possible implementation manner further provide an implementation manner, where the attention condition includes at least one of a attention time and a frequency of interest. The foregoing aspect, and any possible implementation manner, further provide an implementation manner, where the selecting unit is specifically used to
将没有出现在预先配置的输入法字典中的候选词条确定为新词。 如上所述的方面和任一可能的实现方式,进一步提供一种实现方式, 所述选择单元, 具体用于 A candidate term that does not appear in the pre-configured input method dictionary is determined as a new word. The foregoing aspect, and any possible implementation manner, further provide an implementation manner, where the selecting unit is specifically used to
将出现在预先配置的输入法字典中的候选词条, 确定为候选热词; 才艮据所述 ^美选热词出现的词频, 确定所述 ^美选热词的热度值; 以及 将热度值大于或等于热度阔值的候选热词, 确定为热词。 本发明的另一方面, 提供一种计算机存储介质, 所述计算机存储介 质被编码有计算机程序, 所述程序在被一个或多个计算机执行时使得所 述一个或多个计算机执行如下操作:  The candidate term appearing in the pre-configured input method dictionary is determined as a candidate hot word; determining the heat value of the selected hot word according to the word frequency of the selected hot word; and the heat A candidate hot word whose value is greater than or equal to the heat threshold is determined as a hot word. In another aspect of the invention, a computer storage medium is provided, the computer storage medium being encoded with a computer program, the program, when executed by one or more computers, causing the one or more computers to perform the following operations:
对用户的眼球进行跟踪操作, 以获得所述用户的感兴趣区域; 获取所述感兴趣区域内的文本信息;  Tracking the eyeball of the user to obtain the region of interest of the user; acquiring text information in the region of interest;
对所述文本信息进行切词操作, 以获得候选词条;  Performing a word-cutting operation on the text information to obtain a candidate term;
选择至少一个候选词条, 以作为新词和 /或热词。  Select at least one candidate term as a new word and/or a hot word.
本发明的另一方面, 提供一种设备, 包括至少一个处理器、 存储器 以及至少一个计算机程序; 所述至少一个计算机程序存储于所述存储器 并被所述至少一个处理器执行; 所述计算机程序包括执行以下操作的指 令: In another aspect, the present invention provides an apparatus comprising at least one processor, a memory, and at least one computer program; the at least one computer program being stored in the memory and executed by the at least one processor; Includes instructions to do the following:
对用户的眼球进行跟踪操作, 以获得所述用户的感兴趣区域; 获取所述感兴趣区域内的文本信息;  Tracking the eyeball of the user to obtain the region of interest of the user; acquiring text information in the region of interest;
对所述文本信息进行切词操作, 以获得候选词条;  Performing a word-cutting operation on the text information to obtain a candidate term;
选择至少一个候选词条, 以作为新词和 /或热词。 由上述技术方案可知, 本发明实施例通过对用户的眼球进行跟踪操 作, 以获得所述用户的感兴趣区域, 进而获取所述感兴趣区域内的文本 信息, 并对所述文本信息进行切词操作, 以获得候选词条, 使得能够选 择至少一个候选词条, 以作为新词和 /或热词, 由于釆用用户当前的阅读 行为所关注的区域所提取的用户感兴趣的文本信息, 进行候选词条的获 取操作, 因此, 能够基于这些文本信息及时地识别出新词和 /或热词, 从 而提高了词条获取的及时性。 另外, 釆用本发明提供的技术方案, 能够及时地利用所识别出的新 词和 /或热词更新输入法所加载的各类专业字典即输入法字典, 能够进一 步有效提高输入法的字典的准确性。 附图说明 Select at least one candidate term as a new word and/or a hot word. According to the foregoing technical solution, the embodiment of the present invention performs a tracking operation on a user's eyeball to obtain a region of interest of the user, and then acquires text information in the region of interest, and performs word cutting on the text information. Operation, to obtain candidate terms, enabling selection of at least one candidate term as a new word and/or a hot word, due to the text information of the user extracted by the region of interest of the user's current reading behavior, The acquisition operation of the candidate term, therefore, can recognize the new word and/or the hot word in time based on the text information, thereby improving the timeliness of the term acquisition. In addition, by using the technical solution provided by the present invention, it is possible to promptly use the recognized new words and/or hot words to update various professional dictionaries loaded by the input method, that is, the input method dictionary, which can further effectively improve the dictionary of the input method. accuracy. DRAWINGS
为了更清楚地说明本发明实施例中的技术方案, 下面将对实施例或 现有技术描述中所需要使用的附图作一简单地介绍, 显而易见地, 下面 描述中的附图是本发明的一些实施例, 对于本领域普通技术人员来讲, 在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。  In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the embodiments or the prior art description will be briefly described below. Obviously, the drawings in the following description are the present invention. For some embodiments, other drawings may be obtained from those of ordinary skill in the art without departing from the drawings.
图 1 为本发明一实施例提供的词条的获取方法的流程示意图; 图 2为本发明另一实施例提供的词条的获取装置的结构示意图。 具体实施方式  FIG. 1 is a schematic flowchart of a method for acquiring an entry according to an embodiment of the present invention; FIG. 2 is a schematic structural diagram of an apparatus for acquiring an entry according to another embodiment of the present invention. detailed description
为使本发明实施例的目的、 技术方案和优点更加清楚, 下面将结合 本发明实施例中的附图, 对本发明实施例中的技术方案进行清楚、 完整 地描述, 显然, 所描述的实施例是本发明一部分实施例, 而不是全部的 实施例。 基于本发明中的实施例, 本领域普通技术人员在没有作出创造 性劳动前提下所获得的全部其他实施例, 都属于本发明保护的范围。 In order to make the objectives, technical solutions, and advantages of the embodiments of the present invention clear, the technical solutions in the embodiments of the present invention are clear and complete. It is apparent that the described embodiments are a part of the embodiments of the invention, rather than all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
需要说明的是, 本发明实施例中所涉及的终端可以包括但不限于手 机、个人数字助理( Personal Digital Assistant, PDA )、无线手持设备、 平板电脑 ( Tablet Computer )、个人电脑 ( Personal Computer, PC )、 It should be noted that the terminals involved in the embodiments of the present invention may include, but are not limited to, a mobile phone, a personal digital assistant (PDA), a wireless handheld device, a tablet computer, and a personal computer (PC). ),
MP3播放器、 MP4播放器等。 MP3 player, MP4 player, etc.
另外, 本文中术语"和 /或", 仅仅是一种描述关联对象的关联关系, 表示可以存在三种关系, 例如, A和 /或 可以表示: 单独存在 , 同 时存在 A和 B, 单独存在 B这三种情况。 另外, 本文中字符 ", 一般表 示前后关联对象是一种"或"的关系。  In addition, the term "and/or" in this context is merely an association describing the associated object, indicating that there may be three relationships, for example, A and/or may represent: exist alone, both A and B, and B alone. These three situations. In addition, the character " in this article generally indicates that the contextual object is an "or" relationship.
图 1 为本发明一实施例提供的词条的获取方法的流程示意图, 如图 1所示。  FIG. 1 is a schematic flowchart of a method for acquiring an entry according to an embodiment of the present invention, as shown in FIG. 1 .
1 01、对用户的眼球进行跟踪操作,以获得所述用户的感兴趣区域。 可选地, 在本实施例的一个可能的实现方式中, 在 101 中, 具体可 以在用户界面上对用户的眼球进行跟踪操作。 其中, 所述用户界面可以 为终端所显示的万维网 (World Wide Web, Web )页面, 或者还可以为 终端所显示的应用文档, 例如, 电子邮件、 WORD文档、 TXT文档、 PDF文档等, 本发明对此不进行特别限定。  1 01. Tracking the eyeball of the user to obtain the region of interest of the user. Optionally, in a possible implementation manner of this embodiment, in 101, the tracking operation of the user's eyeball may be specifically performed on the user interface. The user interface may be a World Wide Web (Web) page displayed by the terminal, or may also be an application document displayed by the terminal, for example, an email, a WORD document, a TXT document, a PDF document, etc., the present invention This is not particularly limited.
1 02、 获取所述感兴趣区域内的文本信息。  1 02. Obtain text information in the area of interest.
1 03、 对所述文本信息进行切词操作, 以获得候选词条。  1 03. Perform a word-cutting operation on the text information to obtain a candidate term.
1 04、 选择至少一个候选词条, 以作为新词和 /或热词。  1 04. Select at least one candidate term as a new word and/or a hot word.
需要说明的是, 1 01 ~1 04的执行主体可以是识别装置, 可以位于本 地的客户端中, 以进行离线识别, 或者还可以位于网络侧的服务器中, 以进行在线识别, 或者也可以部分功能位于客户端中, 部分功能位于服 务器中, 以进行离线与在线结合识别, 本实施例对此不进行限定。 It should be noted that the execution body of 01~1 04 may be an identification device, which may be located in the present In the local client, for offline identification, or can be located in the server on the network side for online identification, or some functions can be located in the client, and some functions are located in the server for offline and online identification. This embodiment does not limit this.
可以理解的是,所述客户端可以是安装在终端上的输入法应用程序, 或者还可以是浏览器的一个网页, 只要能够实现词条获取, 以提供识别 新词和 /或热词的客观存在形式都可以, 本实施例对此不进行限定。  It can be understood that the client may be an input method application installed on the terminal, or may be a webpage of the browser, as long as the entry can be implemented to provide an objective for identifying new words and/or hot words. The existence form is acceptable, and this embodiment does not limit this.
这样, 通过对用户的眼球进行跟踪操作, 以获得所述用户的感兴趣 区域, 进而获取所述感兴趣区域内的文本信息, 并对所述文本信息进行 切词操作, 以获得候选词条, 使得能够选择至少一个候选词条, 以作为 新词和 /或热词, 由于釆用用户当前的阅读行为所关注的区域所提取的用 户感兴趣的文本信息, 进行候选词条的获取操作, 因此, 能够基于这些 文本信息及时地识别出新词和 /或热词, 从而提高了词条获取的及时性。  In this way, the tracking operation of the user's eyeball is performed to obtain the region of interest of the user, thereby acquiring text information in the region of interest, and performing a word-cutting operation on the text information to obtain candidate terms. Equivalently selecting at least one candidate term as a new word and/or a hot word, and performing the acquisition operation of the candidate term by using the text information of the user's interest extracted by the region of interest of the current reading behavior of the user The ability to identify new words and/or hot words in time based on these textual information, thereby improving the timeliness of term acquisition.
另外, 釆用本发明提供的技术方案, 能够及时地利用所识别出的新 词和 /或热词更新输入法所加载的各类专业字典即输入法字典, 能够进一 步有效提高输入法的字典的准确性。  In addition, by using the technical solution provided by the present invention, it is possible to promptly use the recognized new words and/or hot words to update various professional dictionaries loaded by the input method, that is, the input method dictionary, which can further effectively improve the dictionary of the input method. accuracy.
可选地, 在本实施例的一个可能的实现方式中, 在 1 01 中, 具体可 以获取所述眼球的视频信息。 其中, 所述眼球的视频信息可以由若干帧 图像组成, 可以利用摄像头进行釆集。 进而, 根据所述视频信息, 确定 所述眼球的位置区域。 然后, 根据所述视频信息, 确定所述眼球的可运 动路径, 以及根据所述可运动路径, 确定所述眼球的可移动区域。 由于 人眼球的运动弧度范围在一个固定区间内, 因此, 可以根据该视频信息 确定眼球对应的可运动路径。 该运动路径可以是一个精确的值, 也可以 是一个运动区间。 通过该可运动路径, 可以进一步计算出基于该可运动 路劲的可到达面积, 该可到达面积即为所述眼球的可移动区域。 最后, 则可以根据所述眼球的位置区域和所述眼球的可移动区域, 确定所述眼 球的关注区域, 以作为所述用户的感兴趣区域。 Optionally, in a possible implementation manner of this embodiment, in 101, video information of the eyeball may be specifically acquired. The video information of the eyeball may be composed of a plurality of frame images, and may be collected by a camera. Further, based on the video information, a location area of the eyeball is determined. Then, based on the video information, determining a movable path of the eyeball, and determining a movable area of the eyeball according to the movable path. Since the motion arc of the human eye is in a fixed interval, the movable path corresponding to the eyeball can be determined according to the video information. The motion path can be an exact value or a motion interval. Through the movable path, it can be further calculated based on the movable The reachable area of the road force, which is the movable area of the eyeball. Finally, the region of interest of the eyeball can be determined according to the location area of the eyeball and the movable area of the eyeball as the region of interest of the user.
具体地, 可以将所述眼球的位置区域在所述眼球的可移动区域内的 部分, 确定为所述眼球的关注区域。 若所述眼球的关注区域满足关注条 件, 确定所述眼球的关注区域为所述用户的感兴趣区域。  Specifically, a portion of the eyeball in a movable region of the eyeball may be determined as a region of interest of the eyeball. If the region of interest of the eyeball satisfies the condition of interest, the region of interest of the eyeball is determined to be the region of interest of the user.
其中, 所述关注条件可以包括但不限于关注时间和关注频次中的至 少一项。  The attention condition may include, but is not limited to, at least one of attention time and frequency of interest.
例如, 若所述眼球的关注区域在所述眼球的可移动区域内的停留时 间大于或等于 3秒, 则可以确定所述眼球的关注区域为所述用户的感兴 趣区域。  For example, if the staying area of the eyeball in the movable area of the eyeball is greater than or equal to 3 seconds, it may be determined that the attention area of the eyeball is the user's interest area.
或者, 再例如, 若所述眼球的关注区域在所述眼球的可移动区域内 的停留次数大于或等于 2次, 或 2次 /分钟, 则可以确定所述眼球的关注 区域为所述用户的感兴趣区域。  Or, for example, if the number of stays of the eyeball in the movable area of the eyeball is greater than or equal to 2 times, or 2 times/minute, the area of interest of the eyeball may be determined to be the user's Area of interest.
可选地, 在本实施例的一个可能的实现方式中, 在 102中, 具体可 以釆用现有技术中的各种文本识别方法, 获取所述感兴趣区域内的文本 信息, 本发明对此不进行特别限定。  Optionally, in a possible implementation manner of the embodiment, in 102, various text recognition methods in the prior art may be used to obtain text information in the region of interest. No particular limitation is imposed.
例如, 可以获取所述感兴趣区域在具有文本信息的用户界面上包围 的局部屏幕截图,然后,则可以对获取到的局部屏幕截图进行文本识别, 以获取到所述感兴趣区域内的文本信息。  For example, a partial screenshot of the region of interest surrounded by a user interface having text information may be acquired, and then the obtained partial screenshot may be subjected to text recognition to obtain text information in the region of interest. .
或者, 再例如, 可以获取所述感兴趣区域的位置信息, 根据该位置 信息, 确定对应的文本信息, 以作为所述感兴趣区域内的文本信息。  Alternatively, for example, location information of the region of interest may be acquired, and corresponding text information is determined according to the location information as text information in the region of interest.
另外, 文本识别的详细描述可以参见现有技术中的相关内容, 此处 不再赘述。 In addition, a detailed description of text recognition can be found in related content in the prior art, here No longer.
可选地, 在本实施例的一个可能的实现方式中, 在 1 03中, 具体可 以釆用现有技术中的各种切词方法, 对所获取到的文本信息进行切词操 作。 例如, 基于字符串匹配的分词方法, 或者, 再例如, 基于理解的分 词方法, 或者, 再例如, 基于统计的分词方法, 本发明对此不进行特别 限定。 切词方法的详细描述可以参见现有技术中的相关内容, 此处不再 赘述。  Optionally, in a possible implementation manner of the embodiment, in FIG. 3, the word-cutting operation of the obtained text information may be specifically performed by using various word-cutting methods in the prior art. For example, the word segmentation method based on string matching, or, for example, a word segmentation method based on understanding, or, for example, a statistical word segmentation method, is not particularly limited in the present invention. For a detailed description of the word-cutting method, refer to related content in the prior art, and details are not described herein again.
可选地, 在本实施例的一个可能的实现方式中, 在 1 04中, 具体可 以将没有出现在预先配置的输入法字典中的候选词条确定为新词。  Optionally, in a possible implementation manner of this embodiment, in FIG. 04, the candidate term that does not appear in the pre-configured input method dictionary may be specifically determined as a new word.
具体地, 可以获取切词操作所获得的候选词条中的任一候选词条。 如果该候选词条没有出现在预先配置的输入法字典中, 则可以将该候选 词条确定为新词。  Specifically, any candidate term in the candidate term obtained by the word-cutting operation may be acquired. If the candidate term does not appear in the pre-configured input method dictionary, the candidate term can be determined as a new word.
需要说明的是,预先配置的输入法字典可以配置在网络侧的服务器, 或者还可以配置在本地的客户端, 本实施例对此不进行特别限定。  It should be noted that the pre-configured input method dictionary may be configured on the server on the network side, or may be configured on the local client. This embodiment does not specifically limit this.
可选地, 在本实施例的一个可能的实现方式中, 在 1 04中, 具体可 以将出现在预先配置的输入法字典中的候选词条, 确定为候选热词。 进 而,根据所述候选热词出现的词频,确定所述候选热词的热度值。然后, 则可以将热度值大于或等于热度阔值的候选热词, 确定为热词。  Optionally, in a possible implementation manner of this embodiment, in FIG. 04, the candidate term appearing in the pre-configured input method dictionary may be specifically determined as a candidate hot word. Further, a heat value of the candidate hot word is determined according to a word frequency in which the candidate hot word appears. Then, the candidate hot words whose heat value is greater than or equal to the heat threshold can be determined as a hot word.
具体地, 可以获取切词操作所获得的候选词条中的任一候选词条。 如果该候选词条已经出现在预先配置的输入法字典中, 则可以将该候选 词条标记为候选热词。 然后, 可以根据所述输入法字典, 获取候选热词 在指定时间范围之内出现的词频, 并根据该词频, 确定所述候选热词的 热度值。 最后, 则可以将热度值大于或等于热度阔值的候选热词, 确定 为热词。 Specifically, any candidate term in the candidate term obtained by the word-cutting operation may be acquired. If the candidate term has appeared in the pre-configured input method dictionary, the candidate term can be marked as a candidate hot word. Then, according to the input method dictionary, the word frequency of the candidate hot words occurring within a specified time range may be obtained, and according to the word frequency, the heat value of the candidate hot words may be determined. Finally, the candidate hot words with the heat value greater than or equal to the heat threshold can be determined. For hot words.
例如, 具体可以根据公式, 即候选热词的热度值 = (所有候选热词 的平均得分 *所有候选热词的平均词频 +候选热词的得分 *候选热词在总 统计时间内的总词频) I (所有候选热词的平均词频 +候选热词在总统计 时间内的总词频) , 确定候选热词的热度值。 其中,  For example, it can be specifically according to the formula, that is, the heat value of the candidate hot words = (the average score of all candidate hot words * the average word frequency of all candidate hot words + the score of the candidate hot words * the total word frequency of the candidate hot words in the total statistical time) I (the average word frequency of all candidate hot words + the total word frequency of the candidate hot words in the total statistical time), determine the heat value of the candidate hot words. among them,
候选热词的得分=候选热词在最近一个单位统计时间内的词频 /候选 热词在总统计时间内的总词频。  The score of the candidate hot words = the frequency of the candidate hot words in the most recent unit statistical time / the total word frequency of the candidate hot words in the total statistical time.
以下结合具体的实施例对上述实施过程进行详细说明, H没四个候 选热词即矣选热词 A、 矣选热词 B、 矣选热词 C和矣选热词 D, 以及单 位统计时间为一天, 总统计时间为两天。 在 2013-12-18和 2013-12-19 这两天内出现的词频如下表所示:  The above implementation process is described in detail below with reference to specific embodiments. H has no four candidate hot words: hot words A, hot words B, hot words C, and hot words D, and unit statistical time. For one day, the total statistical time is two days. The word frequency that appears during the two days of 2013-12-18 and 2013-12-19 is shown in the following table:
Figure imgf000013_0001
从表格中显示的数据可以看出, 根据 2013-12-18和 2013-12-19这 两天的历史数据分别计算出这四个候选热词的得分分别为 0.74、 0.52、 0.8、 0.82。此时,可假设已经有 320个人预先给每个词都打了 0.72分, 候选热词 A还有额外的 135个人打分, 每个人都给了 0.74分, 按照前 面所述的计算公式, 这四个候选热词的热度值分别可以为:
Figure imgf000013_0001
From the data shown in the table, it can be seen that the scores of the four candidate hot words are 0.74, 0.52, 0.8, and 0.82, respectively, based on the historical data of 2013-12-18 and 2013-12-19. At this point, it can be assumed that 320 people have already scored 0.72 points for each word in advance, and the candidate hot words A have an additional 135 points, and each person gave 0.74 points. According to the calculation formula described above, these four points. The heat values of the candidate hot words can be:
A: (0.72*320+0.74*135)/(320+135)=0.725  A: (0.72*320+0.74*135)/(320+135)=0.725
B: (0.72*320+0.52*290)/(320+290)=0.625  B: (0.72*320+0.52*290)/(320+290)=0.625
C: (0.72*320+0.8*5)/(320+5)=0.721  C: (0.72*320+0.8*5)/(320+5)=0.721
D: (0.72*320+0.82*850)/(320+850)=0.793  D: (0.72*320+0.82*850)/(320+850)=0.793
按照热度值, 从大到小排序如下:  According to the heat value, the order from big to small is as follows:
D>A>C>B  D>A>C>B
可以看出, 候选热词 D出现的频率最高, 因此排名也最靠前, 如果 该候选热词的热度值不小于预先设置的热度阔值, 则可以将候选热词 D 确定为热词。  It can be seen that the candidate hot word D appears the most frequently, so the ranking is also the highest. If the heat value of the candidate hot word is not less than the preset heat threshold, the candidate hot word D can be determined as a hot word.
可以理解的是, 在识别出新词和 /或热词之后, 还可以进一步利用这 些词条更新本地的输入法字典, 或者还可以进一步利用这些词条更新云 端 (网络侧) 的输入法字典, 本实施例对此不进行特别限定。 具体可以 釆用所统计的 Ngram信息和 /或 Npos信息中的至少一项, 更新本地的 输入法字典或云端的输入法字典, 详细描述可以参见现有技术中的相关 内容, 此处不再赘述。  It can be understood that after identifying new words and/or hot words, the local input method dictionary can be further updated by using these terms, or the input method dictionary of the cloud (network side) can be further updated by using these terms. This embodiment is not particularly limited. For example, the local input method dictionary or the input method dictionary of the cloud may be updated by using at least one of the statistical Ngram information and/or the Npos information. For details, refer to related content in the prior art, and details are not described herein. .
可以理解的是,本实施例提供的技术方案,不仅可以针对一个用户, 识别出新词和 /或热词, 还可以进一步针对多个用户, 对多个用户的识别 结果进行有效的整理和分析, 以获得针对多个用户的新词和 /或热词。 可选地, 在本实施例的一个可能的实现方式中, 在 1 04之后, 还可 以进一步对所选择的新词和 /或热词进行特殊展现。 例如, 可以给这些新 词和 /或热词增加图标标识; 或者, 再例如, 可以在特殊的候选位置展现 这些新词和 /或热词。 It can be understood that the technical solution provided in this embodiment can not only identify new words and/or hot words for one user, but also effectively organize and analyze the recognition results of multiple users for multiple users. , to get new words and / or hot words for multiple users. Optionally, in a possible implementation manner of this embodiment, after 10 04, the selected new word and/or hot word may be further specifically displayed. For example, icon characters may be added to these new words and/or hot words; or, for example, these new words and/or hot words may be presented at particular candidate locations.
本实施例中, 通过对用户的眼球进行跟踪操作, 以获得所述用户的 感兴趣区域, 进而获取所述感兴趣区域内的文本信息, 并对所述文本信 息进行切词操作, 以获得候选词条, 使得能够选择至少一个候选词条, 以作为新词和 /或热词, 由于釆用用户当前的阅读行为所关注的区域所提 取的用户感兴趣的文本信息, 进行候选词条的获取操作, 因此, 能够基 于这些文本信息及时地识别出新词和 /或热词,从而提高了词条获取的及 时性。  In this embodiment, the tracking operation of the user's eyeball is performed to obtain the region of interest of the user, and then the text information in the region of interest is acquired, and the text information is subjected to a word-cutting operation to obtain a candidate. The entry enables selection of at least one candidate term as a new word and/or a hot word, and the candidate term is acquired due to the text information of the user's interest extracted by the region of interest of the user's current reading behavior. The operation, therefore, can identify new words and/or hot words in time based on the text information, thereby improving the timeliness of the entry of the terms.
另外, 釆用本发明提供的技术方案, 能够及时地利用所识别出的新 词和 /或热词更新输入法所加载的各类专业字典即输入法字典, 能够进一 步有效提高输入法的字典的准确性。  In addition, by using the technical solution provided by the present invention, it is possible to promptly use the recognized new words and/or hot words to update various professional dictionaries loaded by the input method, that is, the input method dictionary, which can further effectively improve the dictionary of the input method. accuracy.
需要说明的是, 对于前述的各方法实施例, 为了简单描述, 故将其 都表述为一系列的动作组合, 但是本领域技术人员应该知悉, 本发明并 不受所描述的动作顺序的限制, 因为依据本发明, 某些步骤可以釆用其 他顺序或者同时进行。 其次, 本领域技术人员也应该知悉, 说明书中所 描述的实施例均属于优选实施例, 所涉及的动作和模块并不一定是本发 明所必须的。  It should be noted that, for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should understand that the present invention is not limited by the described action sequence. Because in accordance with the present invention, certain steps may be performed in other sequences or concurrently. In addition, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the present invention.
在上述实施例中, 对各个实施例的描述都各有侧重, 某个实施例中 没有详述的部分, 可以参见其他实施例的相关描述。  In the above embodiments, the descriptions of the various embodiments are different, and the parts that are not detailed in an embodiment can be referred to the related descriptions of other embodiments.
图 2为本发明另一实施例提供的词条的获取装置的结构示意图, 如 图 2所示。 本实施例的词条的获取装置可以包括跟踪单元 21、 获取单元 22、 切词单元 23和选择单元 24。 其中, 跟踪单元 21, 用于对用户的眼 球进行跟踪操作, 以获得所述用户的感兴趣区域; 获取单元 22, 用于获 取所述感兴趣区域内的文本信息; 切词单元 23, 用于对所述文本信息进 行切词操作, 以获得候选词条; 选择单元 24, 用于选择至少一个候选词 条, 以作为新词和 /或热词。 2 is a schematic structural diagram of an apparatus for acquiring an entry according to another embodiment of the present invention, such as Figure 2 shows. The acquiring means of the term of the embodiment may include a tracking unit 21, an obtaining unit 22, a word cutting unit 23, and a selecting unit 24. The tracking unit 21 is configured to perform a tracking operation on the eyeball of the user to obtain a region of interest of the user, and an acquiring unit 22, configured to acquire text information in the region of interest; Performing a word-cutting operation on the text information to obtain a candidate term; the selecting unit 24 is configured to select at least one candidate term as a new word and/or a hot word.
需要说明的是, 本实施例提供的词条的获取装置, 可以位于本地的 客户端中, 以进行离线识别, 或者还可以位于网络侧的服务器中, 以进 行在线识别, 或者也可以部分功能位于客户端中, 部分功能位于服务器 中, 以进行离线与在线结合识别, 本实施例对此不进行限定。  It should be noted that the acquiring device of the term provided in this embodiment may be located in a local client for offline identification, or may be located in a server on the network side for online identification, or may be partially located. In the client, some functions are located in the server for offline and online identification. This embodiment does not limit this.
可以理解的是,所述客户端可以是安装在终端上的输入法应用程序, 或者还可以是浏览器的一个网页, 只要能够实现词条获取, 以提供识别 新词和 /或热词的客观存在形式都可以, 本实施例对此不进行限定。  It can be understood that the client may be an input method application installed on the terminal, or may be a webpage of the browser, as long as the entry can be implemented to provide an objective for identifying new words and/or hot words. The existence form is acceptable, and this embodiment does not limit this.
这样, 通过跟踪单元对用户的眼球进行跟踪操作, 以获得所述用户 的感兴趣区域, 进而由获取单元获取所述感兴趣区域内的文本信息, 并 由切词单元对所述文本信息进行切词操作, 以获得候选词条, 使得选择 单元能够选择至少一个候选词条, 以作为新词和 /或热词, 由于釆用用户 当前的阅读行为所关注的区域所提取的用户感兴趣的文本信息, 进行候 选词条的获取操作, 因此, 能够基于这些文本信息及时地识别出新词和 / 或热词, 从而提高了词条获取的及时性。  In this way, the tracking operation of the user's eyeball is performed by the tracking unit to obtain the region of interest of the user, and then the text information in the region of interest is acquired by the acquiring unit, and the text information is cut by the word-cutting unit. a word operation to obtain a candidate term, such that the selection unit can select at least one candidate term as a new word and/or a hot word, the text of interest of the user extracted by the region of interest of the user's current reading behavior The information is used to perform the acquisition operation of the candidate term, and therefore, the new word and/or the hot word can be recognized in time based on the text information, thereby improving the timeliness of the term acquisition.
另外, 釆用本发明提供的技术方案, 能够及时地利用所识别出的新 词和 /或热词更新输入法所加载的各类专业字典即输入法字典, 能够进一 步有效提高输入法的字典的准确性。 可选地, 在本实施例的一个可能的实现方式中, 所述跟踪单元 21 具体可以在用户界面上对用户的眼球进行跟踪操作。 其中, 所述用户界 面可以为终端所显示的万维网 (World Wide Web, Web )页面, 或者还 可以为终端所显示的应用文档, 例如, 电子邮件、 WORD文档、 TXT 文档、 PDF文档等, 本发明对此不进行特别限定。 In addition, by using the technical solution provided by the present invention, it is possible to promptly use the recognized new words and/or hot words to update various professional dictionaries loaded by the input method, that is, the input method dictionary, which can further effectively improve the dictionary of the input method. accuracy. Optionally, in a possible implementation manner of the embodiment, the tracking unit 21 may specifically perform a tracking operation on the user's eye on the user interface. The user interface may be a World Wide Web (Web) page displayed by the terminal, or may also be an application document displayed by the terminal, for example, an email, a WORD document, a TXT document, a PDF document, etc., the present invention This is not particularly limited.
可选地, 在本实施例的一个可能的实现方式中, 所述跟踪单元 21 具体可以用于获取所述眼球的视频信息; 根据所述视频信息, 确定所述 眼球的位置区域; 根据所述视频信息, 确定所述眼球的可运动路径, 以 及根据所述可运动路径, 确定所述眼球的可移动区域; 以及根据所述眼 球的位置区域和所述眼球的可移动区域, 确定所述眼球的关注区域, 以 作为所述用户的感兴趣区域。  Optionally, in a possible implementation manner of the embodiment, the tracking unit 21 may be specifically configured to acquire video information of the eyeball; and determine, according to the video information, a location area of the eyeball; Video information, determining a movable path of the eyeball, and determining a movable area of the eyeball according to the movable path; and determining the eyeball according to a positional area of the eyeball and a movable area of the eyeball The area of interest, as the area of interest of the user.
其中, 所述眼球的视频信息可以由若干帧图像组成, 可以利用摄像 头进行釆集。 由于人眼球的运动弧度范围在一个固定区间内, 因此, 可 以根据该视频信息确定眼球对应的可运动路径。 该运动路径可以是一个 精确的值, 也可以是一个运动区间。 通过该可运动路径, 可以进一步计 算出基于该可运动路劲的可到达面积, 该可到达面积即为所述眼球的可 移动区域。  The video information of the eyeball may be composed of a plurality of frame images, and may be collected by using a camera. Since the motion arc of the human eye is in a fixed interval, the movable path corresponding to the eyeball can be determined based on the video information. The motion path can be an exact value or a motion interval. From the movable path, the reachable area based on the movable path strength can be further calculated, which is the movable area of the eyeball.
具体地, 所述跟踪单元 21具体可以将所述眼球的位置区域在所述 眼球的可移动区域内的部分, 确定为所述眼球的关注区域; 若所述眼球 的关注区域满足关注条件, 确定所述眼球的关注区域为所述用户的感兴 趣区域。  Specifically, the tracking unit 21 may specifically determine a portion of the eyeball in a movable region of the eyeball as a region of interest of the eyeball; if the region of interest of the eyeball satisfies a condition of interest, determining The region of interest of the eyeball is the region of interest of the user.
其中, 所述关注条件可以包括但不限于关注时间和关注频次中的至 少一项。 例如, 若所述眼球的关注区域在所述眼球的可移动区域内的停留时 间大于或等于 3秒, 所述跟踪单元 21则可以确定所述眼球的关注区域 为所述用户的感兴趣区域。 The attention condition may include, but is not limited to, at least one of a attention time and a frequency of interest. For example, if the staying area of the eyeball in the movable area of the eyeball is greater than or equal to 3 seconds, the tracking unit 21 may determine that the attention area of the eyeball is the user's area of interest.
或者, 再例如, 若所述眼球的关注区域在所述眼球的可移动区域内 的停留次数大于或等于 2次, 或 2次 /分钟, 所述跟踪单元 21则可以确 定所述眼球的关注区域为所述用户的感兴趣区域。  Or, for example, if the number of times the region of interest of the eyeball is in the movable region of the eyeball is greater than or equal to 2 times, or 2 times/minute, the tracking unit 21 may determine the region of interest of the eyeball. Is the area of interest of the user.
可选地, 在本实施例的一个可能的实现方式中, 所述获取单元 22 具体可以釆用现有技术中的各种文本识别方法, 获取所述感兴趣区域内 的文本信息, 本发明对此不进行特别限定。  Optionally, in a possible implementation manner of the embodiment, the acquiring unit 22 may obtain the text information in the region of interest by using various text recognition methods in the prior art. This is not particularly limited.
例如, 所述获取单元 22可以获取所述感兴趣区域在具有文本信息 的用户界面上包围的局部屏幕截图, 然后, 则可以对获取到的局部屏幕 截图进行文本识别, 以获取到所述感兴趣区域内的文本信息。  For example, the acquiring unit 22 may acquire a partial screenshot of the region of interest surrounded by a user interface having text information, and then, may perform text recognition on the obtained partial screenshot to obtain the interested Text information within the area.
或者, 再例如, 所述获取单元 22可以获取所述感兴趣区域的位置 信息, 根据该位置信息, 确定对应的文本信息, 以作为所述感兴趣区域 内的文本信息。  Alternatively, for example, the acquiring unit 22 may acquire location information of the region of interest, and determine corresponding text information as text information in the region of interest according to the location information.
另外, 文本识别的详细描述可以参见现有技术中的相关内容, 此处 不再赘述。  In addition, a detailed description of the text identification can be found in related content in the prior art, and details are not described herein again.
可选地, 在本实施例的一个可能的实现方式中, 切词单元 23具体 可以釆用现有技术中的各种切词方法, 对所获取到的文本信息进行切词 操作。 例如, 基于字符串匹配的分词方法, 或者, 再例如, 基于理解的 分词方法, 或者, 再例如, 基于统计的分词方法, 本发明对此不进行特 别限定。 切词方法的详细描述可以参见现有技术中的相关内容, 此处不 再赘述。 可选地, 在本实施例的一个可能的实现方式中, 所述选择单元 24, 具体可以用于将没有出现在预先配置的输入法字典中的候选词条确定为 新词。 Optionally, in a possible implementation manner of the embodiment, the word-cutting unit 23 may perform a word-cutting operation on the obtained text information by using various word-cutting methods in the prior art. For example, the word segmentation method based on string matching, or, for example, a word segmentation method based on understanding, or, for example, a statistical word segmentation method, is not particularly limited in the present invention. For a detailed description of the word-cutting method, refer to related content in the prior art, and details are not described herein again. Optionally, in a possible implementation manner of the embodiment, the selecting unit 24 may be specifically configured to determine a candidate term that does not appear in the pre-configured input method dictionary as a new word.
具体地, 所述选择单元 24可以获取切词操作所获得的候选词条中 的任一候选词条;如果该候选词条没有出现在预先配置的输入法字典中, 则可以将该候选词条确定为新词。  Specifically, the selecting unit 24 may acquire any candidate term in the candidate term obtained by the word-cutting operation; if the candidate term does not appear in the pre-configured input method dictionary, the candidate term may be Determined to be a new word.
需要说明的是,预先配置的输入法字典可以配置在网络侧的服务器, 或者还可以配置在本地的客户端, 本实施例对此不进行特别限定。  It should be noted that the pre-configured input method dictionary may be configured on the server on the network side, or may be configured on the local client. This embodiment does not specifically limit this.
可选地, 在本实施例的一个可能的实现方式中, 所述选择单元 24, 具体可以用于将出现在预先配置的输入法字典中的候选词条, 确定为候 选热词; 4艮据所述美选热词出现的词频, 确定所述美选热词的热度值; 以及将热度值大于或等于热度阔值的候选热词, 确定为热词。  Optionally, in a possible implementation manner of the embodiment, the selecting unit 24 may be specifically configured to determine a candidate term that appears in a pre-configured input method dictionary as a candidate hot word; The word frequency of the selected hot word is determined, and the heat value of the hot word is determined; and the candidate hot word whose heat value is greater than or equal to the heat value is determined as a hot word.
具体地, 所述选择单元 24可以获取切词操作所获得的候选词条中 的任一候选词条;如果该候选词条已经出现在预先配置的输入法字典中, 则可以将该候选词条标记为候选热词;然后,可以根据所述输入法字典, 获取候选热词在指定时间范围之内出现的词频, 并根据该词频, 确定所 述候选热词的热度值; 最后, 则可以将热度值大于或等于热度阔值的候 选热词, 确定为热词。  Specifically, the selecting unit 24 may obtain any candidate term in the candidate term obtained by the word-cutting operation; if the candidate term has appeared in the pre-configured input method dictionary, the candidate term may be Marking as a candidate hot word; then, according to the input method dictionary, obtaining a word frequency of the candidate hot word within a specified time range, and determining a heat value of the candidate hot word according to the word frequency; finally, A candidate hot word whose heat value is greater than or equal to the heat threshold is determined as a hot word.
例如, 具体可以根据公式, 即候选热词的热度值 = (所有候选热词 的平均得分 *所有候选热词的平均词频 +候选热词的得分 *候选热词在总 统计时间内的总词频) I (所有候选热词的平均词频 +候选热词在总统计 时间内的总词频) , 确定候选热词的热度值。 其中,  For example, it can be specifically according to the formula, that is, the heat value of the candidate hot words = (the average score of all candidate hot words * the average word frequency of all candidate hot words + the score of the candidate hot words * the total word frequency of the candidate hot words in the total statistical time) I (the average word frequency of all candidate hot words + the total word frequency of the candidate hot words in the total statistical time), determine the heat value of the candidate hot words. among them,
候选热词的得分=候选热词在最近一个单位统计时间内的词频 /候选 热词在总统计时间内的总词频。 Score of candidate hot words = word frequency/candidate of candidate hot words in the most recent unit statistical time The total word frequency of the hot words in the total statistical time.
详细描述可以参见图 1对应的实施例中的相关内容,此处不再赘述。 可以理解的是, 本实施例提供的的词条的获取装置在识别出新词和 / 或热词之后, 还可以进一步利用这些词条更新本地的输入法字典, 或者 还可以进一步利用这些词条更新云端 (网络侧) 的输入法字典, 本实施 例对此不进行特别限定。具体可以釆用所统计的 Ngram信息和 /或 Npos 信息中的至少一项, 更新本地的输入法字典或云端的输入法字典, 详细 描述可以参见现有技术中的相关内容, 此处不再赘述。  For details, refer to related content in the embodiment corresponding to FIG. 1, and details are not described herein again. It can be understood that, after the new word and/or the hot word is recognized, the acquiring device of the term provided by the embodiment may further use the term to update the local input method dictionary, or may further utilize the terms. The input method dictionary of the cloud (network side) is updated, which is not particularly limited in this embodiment. Specifically, the local input method dictionary or the input method dictionary of the cloud may be updated by using at least one of the statistical Ngram information and/or the Npos information. For details, refer to related content in the prior art, and details are not described herein. .
可以理解的是,本实施例提供的技术方案,不仅可以针对一个用户, 识别出新词和 /或热词, 还可以进一步针对多个用户, 对多个用户的识别 结果进行有效的整理和分析, 以获得针对多个用户的新词和 /或热词。  It can be understood that the technical solution provided in this embodiment can not only identify new words and/or hot words for one user, but also effectively organize and analyze the recognition results of multiple users for multiple users. , to get new words and / or hot words for multiple users.
可选地, 在本实施例的一个可能的实现方式中, 本实施例提供的词 条的获取装置还可以进一步对所选择的新词和 /或热词进行特殊展现。例 如, 可以给这些新词和 /或热词增加图标标识; 或者, 再例如, 可以在特 殊的候选位置展现这些新词和 /或热词。  Optionally, in a possible implementation manner of the embodiment, the acquiring device of the term provided by the embodiment may further perform special presentation on the selected new word and/or hot word. For example, icon characters may be added to these new words and/or hot words; or, for example, these new words and/or hot words may be presented at particular candidate locations.
本实施例中, 通过跟踪单元对用户的眼球进行跟踪操作, 以获得所 述用户的感兴趣区域, 进而由获取单元获取所述感兴趣区域内的文本信 息, 并由切词单元对所述文本信息进行切词操作, 以获得候选词条, 使 得选择单元能够选择至少一个候选词条, 以作为新词和 /或热词, 由于釆 用用户当前的阅读行为所关注的区域所提取的用户感兴趣的文本信息, 进行候选词条的获取操作, 因此, 能够基于这些文本信息及时地识别出 新词和 /或热词, 从而提高了词条获取的及时性。  In this embodiment, the tracking operation of the user's eyeball is performed by the tracking unit to obtain the region of interest of the user, and then the text information in the region of interest is acquired by the acquiring unit, and the text is read by the word-cutting unit. The information is subjected to a word-cutting operation to obtain a candidate term, so that the selecting unit can select at least one candidate term as a new word and/or a hot word, and the user sense extracted by the region of interest of the user's current reading behavior The text information of interest is used to perform the acquisition operation of the candidate term. Therefore, new words and/or hot words can be recognized in time based on the text information, thereby improving the timeliness of the term acquisition.
另外, 釆用本发明提供的技术方案, 能够及时地利用所识别出的新 词和 /或热词更新输入法所加载的各类专业字典即输入法字典, 能够进一 步有效提高输入法的字典的准确性。 In addition, by using the technical solution provided by the present invention, the newly identified new one can be utilized in time. Words and/or hot words update the various professional dictionaries loaded by the input method, that is, the input method dictionary, which can further effectively improve the accuracy of the input method dictionary.
所属领域的技术人员可以清楚地了解到, 为描述的方便和简洁, 上 述描述的***, 装置和单元的具体工作过程, 可以参考前述方法实施例 中的对应过程, 在此不再赘述。  A person skilled in the art can clearly understand that, for the convenience and brevity of the description, the specific working process of the system, the device and the unit described above can be referred to the corresponding process in the foregoing method embodiment, and details are not described herein again.
在本发明所提供的几个实施例中, 应该理解到, 所揭露的***, 装 置和方法, 可以通过其它的方式实现。 例如, 以上所描述的装置实施例 仅仅是示意性的, 例如, 所述单元的划分, 仅仅为一种逻辑功能划分, 实际实现时可以有另外的划分方式, 例如多个单元或组件可以结合或者 可以集成到另一个***, 或一些特征可以忽略, 或不执行。 另一点, 所 显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接 口, 装置或单元的间接耦合或通信连接, 可以是电性, 机械或其它的形 式。 作为单元显示的部件可以是或者也可以不是物理单元, 即可以位于一个 地方, 或者也可以分布到多个网络单元上。 可以根据实际的需要选择其 中的部分或者全部单元来实现本实施例方案的目的。  In the several embodiments provided by the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not executed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical, mechanical or otherwise. The components displayed as units may or may not be physical units, i.e., may be located in one place, or may be distributed over multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solution of the embodiment.
另外, 在本发明各个实施例中的各功能单元可以集成在一个处理单 元中, 也可以是各个单元单独物理存在, 也可以两个或两个以上单元集 成在一个单元中。 上述集成的单元既可以釆用硬件的形式实现, 也可以 釆用硬件加软件功能单元的形式实现。  In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
上述以软件功能单元的形式实现的集成的单元, 可以存储在一个计 算机可读取存储介质中。 上述软件功能单元存储在一个存储介质中, 包 括若干指令用以使得一台计算机装置 (可以是个人计算机, 服务器, 或 者网络装置等)或处理器(processor )执行本发明各个实施例所述方法 的部分步骤。 而前述的存储介质包括: U盘、 移动硬盘、 只读存储器 ( Read-Only Memory, ROM ) 、 随机存取存储器 ( Random Access Memory, RAM ) 、 磁碟或者光盘等各种可以存储程序代码的介质。 The above-described integrated unit implemented in the form of a software functional unit can be stored in a computer readable storage medium. The above software functional unit is stored in a storage medium, package A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) or a processor to perform some of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a removable hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program code. .
最后应说明的是: 以上实施例仅用以说明本发明的技术方案, 而非 对其限制; 尽管参照前述实施例对本发明进行了详细的说明, 本领域的 普通技术人员应当理解: 其依然可以对前述各实施例所记载的技术方案 进行修改, 或者对其中部分技术特征进行等同替换; 而这些修改或者替 换, 并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和 范围。  It should be noted that the above embodiments are only for explaining the technical solutions of the present invention, and are not intended to be limiting; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that: The technical solutions described in the foregoing embodiments are modified, or some of the technical features are equivalently replaced. The modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1、 一种词条的获取方法, 其特征在于, 包括: 1. A method for obtaining entries, which is characterized by including:
对用户的眼球进行跟踪操作, 以获得所述用户的感兴趣区域; 获取所述感兴趣区域内的文本信息; Perform a tracking operation on the user's eyeballs to obtain the user's area of interest; Obtain text information within the area of interest;
对所述文本信息进行切词操作, 以获得候选词条; Perform word segmentation operations on the text information to obtain candidate entries;
选择至少一个候选词条, 以作为新词和 /或热词。 Select at least one candidate term as a new word and/or hot word.
2、 根据权利要求 1所述的方法, 其特征在于, 所述对用户的眼球进 行跟踪操作, 以获得所述用户的感兴趣区域, 包括: 2. The method according to claim 1, wherein the tracking operation on the user's eyeballs to obtain the user's area of interest includes:
获取所述眼球的视频信息; Obtain video information of the eyeball;
根据所述视频信息, 确定所述眼球的位置区域; According to the video information, determine the position area of the eyeball;
根据所述视频信息, 确定所述眼球的可运动路径, 以及根据所述可 运动路径, 确定所述眼球的可移动区域; Determine the movable path of the eyeball based on the video information, and determine the movable area of the eyeball based on the movable path;
根据所述眼球的位置区域和所述眼球的可移动区域, 确定所述眼球 的关注区域, 以作为所述用户的感兴趣区域。 According to the position area of the eyeball and the movable area of the eyeball, the area of interest of the eyeball is determined as the user's area of interest.
3、 根据权利要求 2所述的方法, 其特征在于, 所述根据所述眼球的 位置和所述眼球的可移动区域, 确定所述眼球的关注区域, 以作为所述 用户的感兴趣区域, 包括: 3. The method according to claim 2, wherein the area of interest of the eyeball is determined based on the position of the eyeball and the movable area of the eyeball as the user's area of interest, include:
将所述眼球的位置区域在所述眼球的可移动区域内的部分, 确定为 所述眼球的关注区域; Determine the portion of the eyeball's position area within the movable area of the eyeball as the area of interest of the eyeball;
若所述眼球的关注区域满足关注条件, 确定所述眼球的关注区域为 所述用户的感兴趣区域。 If the attention area of the eyeball meets the attention condition, the attention area of the eyeball is determined to be the user's area of interest.
4、 根据权利要求 3所述的方法, 其特征在于, 所述关注条件包括关 注时间和关注频次中的至少一项。 4. The method according to claim 3, wherein the attention condition includes at least one of attention time and attention frequency.
5、 根据权利要求 1 ~4任一权利要求所述的方法, 其特征在于, 所 述选择至少一个候选词条, 以作为新词和 /或热词, 包括: 5. The method according to any one of claims 1 to 4, wherein the selecting at least one candidate entry as a new word and/or hot word includes:
将没有出现在预先配置的输入法字典中的候选词条确定为新词。 Candidate entries that do not appear in the preconfigured input method dictionary are determined as new words.
6、 根据权利要求 1 ~4任一权利要求所述的方法, 其特征在于, 所 述选择至少一个候选词条, 以作为新词和 /或热词, 包括: 6. The method according to any one of claims 1 to 4, wherein the selecting at least one candidate entry as a new word and/or hot word includes:
将出现在预先配置的输入法字典中的候选词条, 确定为候选热词; 根据所述候选热词出现的词频, 确定所述候选热词的热度值; 将热度值大于或等于热度阔值的候选热词, 确定为热词。 Determine the candidate entries that appear in the preconfigured input method dictionary as candidate hot words; determine the popularity value of the candidate hot word according to the word frequency of the candidate hot word; determine the popularity value to be greater than or equal to the popularity threshold value The candidate hot words are determined as hot words.
7、 一种词条的获取装置, 其特征在于, 包括: 7. A device for acquiring entries, characterized by including:
跟踪单元, 用于对用户的眼球进行跟踪操作, 以获得所述用户的感 兴趣区域; A tracking unit is used to track the user's eyeballs to obtain the user's area of interest;
获取单元, 用于获取所述感兴趣区域内的文本信息; An acquisition unit, used to acquire text information within the area of interest;
切词单元, 用于对所述文本信息进行切词操作, 以获得候选词条; 选择单元, 用于选择至少一个候选词条, 以作为新词和 /或热词。 The word segmentation unit is used to perform word segmentation operations on the text information to obtain candidate word entries; the selection unit is used to select at least one candidate word entry as a new word and/or a hot word.
8、 根据权利要求 7所述的装置, 其特征在于, 所述跟踪单元, 具体 用于 8. The device according to claim 7, characterized in that the tracking unit is specifically used to
获取所述眼球的视频信息; Obtain video information of the eyeball;
根据所述视频信息, 确定所述眼球的位置区域; According to the video information, determine the position area of the eyeball;
根据所述视频信息, 确定所述眼球的可运动路径, 以及根据所述可 运动路径, 确定所述眼球的可移动区域; 以及 Determine the movable path of the eyeball based on the video information, and determine the movable area of the eyeball based on the movable path; and
根据所述眼球的位置区域和所述眼球的可移动区域, 确定所述眼球 的关注区域, 以作为所述用户的感兴趣区域。 According to the position area of the eyeball and the movable area of the eyeball, the area of interest of the eyeball is determined as the user's area of interest.
9、 根据权利要求 8所述的装置, 其特征在于, 所述跟踪单元, 具体 用于 9. The device according to claim 8, characterized in that, the tracking unit, specifically used for
将所述眼球的位置区域在所述眼球的可移动区域内的部分, 确定为 所述眼球的关注区域; 以及 Determine the portion of the eyeball's position area within the movable area of the eyeball as the area of interest for the eyeball; and
若所述眼球的关注区域满足关注条件, 确定所述眼球的关注区域为 所述用户的感兴趣区域。 If the attention area of the eyeball meets the attention condition, the attention area of the eyeball is determined to be the user's area of interest.
1 0、 根据权利要求 9所述的装置, 其特征在于, 所述关注条件包括 关注时间和关注频次中的至少一项。 10. The device according to claim 9, wherein the attention condition includes at least one of attention time and attention frequency.
1 1、 根据权利要求 7~ 1 0任一权利要求所述的装置, 其特征在于, 所述选择单元, 具体用于 11. The device according to any one of claims 7 to 10, characterized in that the selection unit is specifically used for
将没有出现在预先配置的输入法字典中的候选词条确定为新词。 Candidate entries that do not appear in the preconfigured input method dictionary are determined as new words.
1 2、 根据权利要求 7~ 1 0任一权利要求所述的装置, 其特征在于, 所述选择单元, 具体用于 12. The device according to any one of claims 7 to 10, characterized in that the selection unit is specifically used for
将出现在预先配置的输入法字典中的候选词条, 确定为候选热词; 才艮据所述 4矣选热词出现的词频, 确定所述 4矣选热词的热度值; 以及 将热度值大于或等于热度阔值的候选热词, 确定为热词。 Determine the candidate entries that appear in the preconfigured input method dictionary as candidate hot words; determine the popularity value of the four selected hot words based on the frequency of occurrence of the four selected hot words; and determine the popularity of the four selected hot words. Candidate hot words whose value is greater than or equal to the popularity threshold are determined to be hot words.
1 3、 一种计算机存储介质, 所述计算机存储介质被编码有计算机程 序, 其特征在于, 所述程序在被一个或多个计算机执行时使得所述一个 或多个计算机执行如下操作: 13. A computer storage medium, the computer storage medium is encoded with a computer program, characterized in that, when executed by one or more computers, the program causes the one or more computers to perform the following operations:
对用户的眼球进行跟踪操作, 以获得所述用户的感兴趣区域; 获取所述感兴趣区域内的文本信息; Perform a tracking operation on the user's eyeballs to obtain the user's area of interest; Obtain text information within the area of interest;
对所述文本信息进行切词操作, 以获得候选词条; Perform word segmentation operations on the text information to obtain candidate entries;
选择至少一个候选词条, 以作为新词和 /或热词。 Select at least one candidate term as a new word and/or hot word.
1 4、 一种设备, 包括至少一个处理器、 存储器以及至少一个计算机 程序; 所述至少一个计算机程序存储于所述存储器并被所述至少一个处 理器执行; 其特征在于, 所述计算机程序包括执行以下操作的指令: 对用户的眼球进行跟踪操作, 以获得所述用户的感兴趣区域; 获取所述感兴趣区域内的文本信息; 1 4. A device including at least one processor, memory and at least one computer Program; The at least one computer program is stored in the memory and executed by the at least one processor; It is characterized in that the computer program includes instructions for performing the following operations: tracking the user's eyeballs to obtain the The user's area of interest; Obtain text information within the area of interest;
对所述文本信息进行切词操作, 以获得候选词条; Perform word segmentation operations on the text information to obtain candidate entries;
选择至少一个候选词条, 以作为新词和 /或热词。 Select at least one candidate term as a new word and/or hot word.
PCT/CN2014/085481 2014-02-11 2014-08-29 Method and apparatus for acquiring entry, computer storage medium and device WO2015120713A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410047094.8 2014-02-11
CN201410047094.8A CN103823849A (en) 2014-02-11 2014-02-11 Method and device for acquiring entries

Publications (1)

Publication Number Publication Date
WO2015120713A1 true WO2015120713A1 (en) 2015-08-20

Family

ID=50758913

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/085481 WO2015120713A1 (en) 2014-02-11 2014-08-29 Method and apparatus for acquiring entry, computer storage medium and device

Country Status (2)

Country Link
CN (1) CN103823849A (en)
WO (1) WO2015120713A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180107359A1 (en) * 2016-10-18 2018-04-19 Smartisan Digital Co., Ltd. Text processing method and device

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103823849A (en) * 2014-02-11 2014-05-28 百度在线网络技术(北京)有限公司 Method and device for acquiring entries
CN105335416B (en) * 2014-08-05 2018-11-02 佳能株式会社 Method for extracting content, contents extraction device and the system for contents extraction
CN104484453B (en) * 2014-12-30 2018-01-26 北京元心科技有限公司 Determine the method and device of Webpage hot spot region
CN106257378A (en) * 2016-07-19 2016-12-28 北京新美互通科技有限公司 A kind of emoticon input method and device
CN105677018A (en) * 2015-12-28 2016-06-15 宇龙计算机通信科技(深圳)有限公司 Screen capture method and apparatus, and electronic device
CN108108017B (en) * 2017-12-11 2021-03-23 维沃移动通信有限公司 Search information processing method and mobile terminal
CN109325223B (en) * 2018-07-24 2023-08-25 阿里巴巴(中国)有限公司 Article recommendation method and device and electronic equipment
CN110377701B (en) * 2019-07-02 2022-02-11 北京奇艺世纪科技有限公司 Hot word processing method and device, electronic equipment and storage medium
CN112748809A (en) * 2019-10-16 2021-05-04 北京搜狗科技发展有限公司 Input method entry display method and device
CN111626035B (en) * 2020-04-08 2022-09-02 华为技术有限公司 Layout analysis method and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101364220A (en) * 2007-11-21 2009-02-11 上海埃帕信息科技有限公司 Method for generating word frequency database based on user personality
CN101567004A (en) * 2009-02-06 2009-10-28 浙江大学 English text automatic abstracting method based on eye tracking
CN102163377A (en) * 2010-02-24 2011-08-24 英特尔公司 Facial tracking electronic reader
CN103336576A (en) * 2013-06-28 2013-10-02 优视科技有限公司 Method and device for operating browser based on eye-movement tracking
CN103345305A (en) * 2013-07-22 2013-10-09 百度在线网络技术(北京)有限公司 Method and device for controlling candidate words of mobile terminal input method and mobile terminal
CN103823849A (en) * 2014-02-11 2014-05-28 百度在线网络技术(北京)有限公司 Method and device for acquiring entries

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101119334A (en) * 2007-09-21 2008-02-06 腾讯科技(深圳)有限公司 Method, system and equipment for obtaining neology
CN103300815B (en) * 2012-03-15 2015-05-13 凹凸电子(武汉)有限公司 Eyeball focus determination method, device and system
CN103440038B (en) * 2013-08-28 2016-06-15 中国人民大学 A kind of information acquisition system based on eye recognition and application thereof

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101364220A (en) * 2007-11-21 2009-02-11 上海埃帕信息科技有限公司 Method for generating word frequency database based on user personality
CN101567004A (en) * 2009-02-06 2009-10-28 浙江大学 English text automatic abstracting method based on eye tracking
CN102163377A (en) * 2010-02-24 2011-08-24 英特尔公司 Facial tracking electronic reader
CN103336576A (en) * 2013-06-28 2013-10-02 优视科技有限公司 Method and device for operating browser based on eye-movement tracking
CN103345305A (en) * 2013-07-22 2013-10-09 百度在线网络技术(北京)有限公司 Method and device for controlling candidate words of mobile terminal input method and mobile terminal
CN103823849A (en) * 2014-02-11 2014-05-28 百度在线网络技术(北京)有限公司 Method and device for acquiring entries

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180107359A1 (en) * 2016-10-18 2018-04-19 Smartisan Digital Co., Ltd. Text processing method and device
US10489047B2 (en) * 2016-10-18 2019-11-26 Beijing Bytedance Network Technology Co Ltd. Text processing method and device

Also Published As

Publication number Publication date
CN103823849A (en) 2014-05-28

Similar Documents

Publication Publication Date Title
WO2015120713A1 (en) Method and apparatus for acquiring entry, computer storage medium and device
US9716765B2 (en) Information push method and apparatus
JP6114403B2 (en) Method and apparatus for providing input candidate item corresponding to input character string
KR102197364B1 (en) Mobile video search
US9965569B2 (en) Truncated autosuggest on a touchscreen computing device
CN105183761B (en) Sensitive word replacing method and device
US20140067818A1 (en) Pushing specific content to a predetermined webpage
CN106095845B (en) Text classification method and device
EP3872652B1 (en) Method and apparatus for processing video, electronic device, medium and product
CN109325223B (en) Article recommendation method and device and electronic equipment
US20140184514A1 (en) Input processing method and apparatus
US10002610B2 (en) Presentation supporting device, presentation supporting method, and computer-readable recording medium
WO2022042401A1 (en) Multimedia content publishing method and apparatus, electronic device, and storage medium
EP2828769A1 (en) Method and system for predicting words in a message
US11080348B2 (en) System and method for user-oriented topic selection and browsing
KR20150048751A (en) Feature-based candidate selection
CN103678460B (en) For identifying the method and system for the non-text elements for being suitable to be communicated in multi-language environment
US20150154287A1 (en) Method for providing recommend information for mobile terminal browser and system using the same
WO2016138349A1 (en) Systems and methods of structuring reviews with auto-generated tags
JP6419969B2 (en) Method and apparatus for providing image presentation information
JP2024064941A (en) Display method, device, pen-type electronic dictionary, electronic device, and storage medium
CN115248890B (en) User interest portrait generation method and device, electronic equipment and storage medium
JP6457058B1 (en) Intellectual property system, intellectual property support method and intellectual property support program
CN109753646B (en) Article attribute identification method and electronic equipment
WO2015109902A1 (en) Personalized information processing method, device and apparatus, and nonvolatile computer storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14882332

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14882332

Country of ref document: EP

Kind code of ref document: A1