WO2015120713A1

WO2015120713A1 - Method and apparatus for acquiring entry, computer storage medium and device

Info

Publication number: WO2015120713A1
Application number: PCT/CN2014/085481
Authority: WO
Inventors: 陈晓昕; 吴先超; 肖日新
Original assignee: 百度在线网络技术（北京）有限公司
Priority date: 2014-02-11
Filing date: 2014-08-29
Publication date: 2015-08-20
Also published as: CN103823849A

Abstract

A method and apparatus for acquiring an entry. A tracking operation is carried out on eyeballs of a user to obtain a region of interest of the user; then text information in the region of interest is acquired; and a word segmentation operation is carried out on the text information to obtain candidate entries, so that at least one candidate entry can be selected to serve as a new word and/or hot word. Text information the user is interested in is extracted from a region focused by the user in the current reading behavior, and an acquisition operation of a candidate entry is carried out by means of the text information, so that a new word and/or hot word can be identified in time on the basis of the text information, and the entry acquisition timeliness can be improved.

Description

词条的获取方法、装置、计算机存储介质及设备本申请要求了申请日为 2014 年 02 月 1 1 日，申请号为 20141 0047094.8，发明名称为"词条的获取方法及装置"的中国专利申请的优先权。 Method for acquiring entry, device, computer storage medium and device The present application claims a Chinese patent application whose application date is February 1, 2014, application number is 20141 0047094.8, and the invention name is "acquisition method and device for entry" Priority.

技术领域 Technical field

本发明涉及输入法技术，特别涉及一种词条的获取方法、装置、计算机存储介质及设备。 The present invention relates to input method technology, and in particular, to a method and device for acquiring a term, a computer storage medium and a device.

背景技术 Background technique

输入法，是指为将各种字符输入终端而釆用的编码方法，不同语言、国家、或地区，有多种不同的输入法，例如，搜狗拼音输入法、百度输入法、 QQ拼音输入法等。一般来说，输入法软件的客户端可以釆用加载的字典即词库和字典中包含的词频，向用户展现各类候选词条的排序，以方便用户的输入。现有技术中，为了满足用户的输入需求，通过服务器定期釆集词条和词条的使用频率即词频，以更新各类专业字典，例如，将新出现的词条识别为新词添加到字典中，或者，再例如，将一些使用频率高的词条识别为热词，等。 Input method refers to the encoding method used to input various characters into the terminal. There are many different input methods for different languages, countries, or regions, for example, Sogou Pinyin input method, Baidu input method, QQ Pinyin input method. Wait. In general, the client of the input method software can use the loaded dictionary, that is, the vocabulary and the word frequency contained in the dictionary, to display the sorting of various candidate terms to the user, so as to facilitate user input. In the prior art, in order to meet the input requirements of the user, the frequency of the use of the terms and the terms, that is, the word frequency, is periodically collected by the server to update various professional dictionaries, for example, the newly appearing words are recognized as new words added to the dictionary. Or, for example, identify some words that are frequently used as hot words, and so on.

然而，在一些情况下，新词和 /或热词会大量涌现，例如，网络语言的出现如酱紫（这样子）、表（不要）、杯具（悲剧）等，或者，再例如，突发事件如台风海燕，等，现有的技术方案无法及时地将这些新词和 /或热词识别出来更新输入法所加载的各类专业字典即输入法字典，从而导致了词条获取的及时性的降低。 However, in some cases, new words and/or hot words will emerge in large numbers, for example, the emergence of online languages such as sauce purple (such as), table (not), cups (tragedy), etc., or, for example, bursts Events such as Typhoon Haiyan, etc., the existing technical solutions can not recognize these new words and / or hot words in a timely manner to update the various professional dictionaries loaded by the input method, that is, the input method dictionary, resulting in the timeliness of the acquisition of the terms. The reduction.

发明内容本发明的多个方面提供一种词条的获取方法、装置、计算机存储介质及设备，用以提高词条获取的及时性。本发明的一方面，提供一种词条的获取方法，包括： Summary of the invention Aspects of the present invention provide a method, an apparatus, a computer storage medium, and a device for acquiring a term, which are used to improve the timeliness of term acquisition. An aspect of the present invention provides a method for obtaining an entry, including:

对用户的眼球进行跟踪操作，以获得所述用户的感兴趣区域; Tracking the user's eyeball to obtain the user's area of interest;

获取所述感兴趣区域内的文本信息； Obtaining text information in the region of interest;

对所述文本信息进行切词操作，以获得候选词条； Performing a word-cutting operation on the text information to obtain a candidate term;

选择至少一个候选词条，以作为新词和 /或热词。如上所述的方面和任一可能的实现方式，进一步提供一种实现方式，所述对用户的眼球进行跟踪操作，以获得所述用户的感兴趣区域，包括：获取所述眼球的视频信息； Select at least one candidate term as a new word and/or a hot word. The aspect as described above, and any possible implementation manner, further provide an implementation manner, the performing a tracking operation on the eyeball of the user to obtain the region of interest of the user, including: acquiring video information of the eyeball;

根据所述视频信息，确定所述眼球的位置区域； Determining a location area of the eyeball according to the video information;

根据所述视频信息，确定所述眼球的可运动路径，以及根据所述可运动路径，确定所述眼球的可移动区域； Determining a movable path of the eyeball according to the video information, and determining a movable area of the eyeball according to the movable path;

根据所述眼球的位置区域和所述眼球的可移动区域，确定所述眼球的关注区域，以作为所述用户的感兴趣区域。如上所述的方面和任一可能的实现方式，进一步提供一种实现方式，所述根据所述眼球的位置和所述眼球的可移动区域，确定所述眼球的关注区域，以作为所述用户的感兴趣区域，包括： The region of interest of the eyeball is determined as the region of interest of the user based on the location area of the eyeball and the movable region of the eyeball. An aspect as described above, and any possible implementation, further providing an implementation, determining, according to a position of the eyeball and a movable area of the eyeball, an area of interest of the eyeball as the user Areas of interest, including:

将所述眼球的位置区域在所述眼球的可移动区域内的部分，确定为所述眼球的关注区域；若所述眼球的关注区域满足关注条件，确定所述眼球的关注区域为所述用户的感兴趣区域。如上所述的方面和任一可能的实现方式，进一步提供一种实现方式，所述关注条件包括关注时间和关注频次中的至少一项。如上所述的方面和任一可能的实现方式，进一步提供一种实现方式，所述选择至少一个候选词条，以作为新词和 /或热词，包括： a portion of the eyeball in a movable region of the eyeball as a region of interest of the eyeball; If the region of interest of the eyeball satisfies the condition of interest, the region of interest of the eyeball is determined to be the region of interest of the user. The aspect as described above and any possible implementation manner further provide an implementation manner, where the attention condition includes at least one of a attention time and a frequency of interest. The aspect as described above and any possible implementation manner further provide an implementation manner, where the selecting at least one candidate term as a new word and/or a hot word includes:

将没有出现在预先配置的输入法字典中的候选词条确定为新词。如上所述的方面和任一可能的实现方式，进一步提供一种实现方式，所述选择至少一个候选词条，以作为新词和 /或热词，包括： A candidate term that does not appear in the pre-configured input method dictionary is determined as a new word. The aspect as described above and any possible implementation manner further provide an implementation manner, wherein the selecting at least one candidate term as a new word and/or a hot word includes:

将出现在预先配置的输入法字典中的候选词条，确定为候选热词；根据所述候选热词出现的词频，确定所述候选热词的热度值；将热度值大于或等于热度阔值的候选热词，确定为热词。本发明的另一方面，提供一种词条的获取装置，包括： A candidate term appearing in the pre-configured input method dictionary is determined as a candidate hot word; determining a heat value of the candidate hot word according to a word frequency of the candidate hot word; and setting the heat value to be greater than or equal to the heat threshold The candidate hot words are identified as hot words. Another aspect of the present invention provides an apparatus for acquiring an entry, including:

跟踪单元，用于对用户的眼球进行跟踪操作，以获得所述用户的感兴趣区域； a tracking unit, configured to perform a tracking operation on a user's eyeball to obtain an interest area of the user;

获取单元，用于获取所述感兴趣区域内的文本信息； An obtaining unit, configured to acquire text information in the region of interest;

切词单元，用于对所述文本信息进行切词操作，以获得候选词条；选择单元，用于选择至少一个候选词条，以作为新词和 /或热词。如上所述的方面和任一可能的实现方式，进一步提供一种实现方式，所述跟踪单元，具体用于 a word unit for performing a word-cutting operation on the text information to obtain a candidate term; and a selecting unit, configured to select at least one candidate term as a new word and/or a hot word. The foregoing aspect, and any possible implementation manner, further provide an implementation manner, where the tracking unit is specifically used to

获取所述眼球的视频信息； Obtaining video information of the eyeball;

根据所述视频信息，确定所述眼球的可运动路径，以及根据所述可运动路径，确定所述眼球的可移动区域；以及 Determining a movable path of the eyeball according to the video information, and determining a movable area of the eyeball according to the movable path;

根据所述眼球的位置区域和所述眼球的可移动区域，确定所述眼球的关注区域，以作为所述用户的感兴趣区域。如上所述的方面和任一可能的实现方式，进一步提供一种实现方式，所述跟踪单元，具体用于 The region of interest of the eyeball is determined as the region of interest of the user based on the location area of the eyeball and the movable region of the eyeball. The foregoing aspect, and any possible implementation manner, further provide an implementation manner, where the tracking unit is specifically used to

将所述眼球的位置区域在所述眼球的可移动区域内的部分，确定为所述眼球的关注区域；以及 a portion of the eyeball in a movable region of the eyeball as a region of interest of the eyeball;

若所述眼球的关注区域满足关注条件，确定所述眼球的关注区域为所述用户的感兴趣区域。如上所述的方面和任一可能的实现方式，进一步提供一种实现方式，所述关注条件包括关注时间和关注频次中的至少一项。如上所述的方面和任一可能的实现方式，进一步提供一种实现方式，所述选择单元，具体用于 If the region of interest of the eyeball satisfies the condition of interest, the region of interest of the eyeball is determined to be the region of interest of the user. The aspect as described above and any possible implementation manner further provide an implementation manner, where the attention condition includes at least one of a attention time and a frequency of interest. The foregoing aspect, and any possible implementation manner, further provide an implementation manner, where the selecting unit is specifically used to

将没有出现在预先配置的输入法字典中的候选词条确定为新词。如上所述的方面和任一可能的实现方式，进一步提供一种实现方式，所述选择单元，具体用于 A candidate term that does not appear in the pre-configured input method dictionary is determined as a new word. The foregoing aspect, and any possible implementation manner, further provide an implementation manner, where the selecting unit is specifically used to

将出现在预先配置的输入法字典中的候选词条，确定为候选热词；才艮据所述 ^美选热词出现的词频，确定所述 ^美选热词的热度值；以及将热度值大于或等于热度阔值的候选热词，确定为热词。本发明的另一方面，提供一种计算机存储介质，所述计算机存储介质被编码有计算机程序，所述程序在被一个或多个计算机执行时使得所述一个或多个计算机执行如下操作： The candidate term appearing in the pre-configured input method dictionary is determined as a candidate hot word; determining the heat value of the selected hot word according to the word frequency of the selected hot word; and the heat A candidate hot word whose value is greater than or equal to the heat threshold is determined as a hot word. In another aspect of the invention, a computer storage medium is provided, the computer storage medium being encoded with a computer program, the program, when executed by one or more computers, causing the one or more computers to perform the following operations:

对用户的眼球进行跟踪操作，以获得所述用户的感兴趣区域；获取所述感兴趣区域内的文本信息； Tracking the eyeball of the user to obtain the region of interest of the user; acquiring text information in the region of interest;

选择至少一个候选词条，以作为新词和 /或热词。 Select at least one candidate term as a new word and/or a hot word.

本发明的另一方面，提供一种设备，包括至少一个处理器、存储器以及至少一个计算机程序；所述至少一个计算机程序存储于所述存储器并被所述至少一个处理器执行；所述计算机程序包括执行以下操作的指令： In another aspect, the present invention provides an apparatus comprising at least one processor, a memory, and at least one computer program; the at least one computer program being stored in the memory and executed by the at least one processor; Includes instructions to do the following:

选择至少一个候选词条，以作为新词和 /或热词。由上述技术方案可知，本发明实施例通过对用户的眼球进行跟踪操作，以获得所述用户的感兴趣区域，进而获取所述感兴趣区域内的文本信息，并对所述文本信息进行切词操作，以获得候选词条，使得能够选择至少一个候选词条，以作为新词和 /或热词，由于釆用用户当前的阅读行为所关注的区域所提取的用户感兴趣的文本信息，进行候选词条的获取操作，因此，能够基于这些文本信息及时地识别出新词和 /或热词，从而提高了词条获取的及时性。另外，釆用本发明提供的技术方案，能够及时地利用所识别出的新词和 /或热词更新输入法所加载的各类专业字典即输入法字典，能够进一步有效提高输入法的字典的准确性。附图说明 Select at least one candidate term as a new word and/or a hot word. According to the foregoing technical solution, the embodiment of the present invention performs a tracking operation on a user's eyeball to obtain a region of interest of the user, and then acquires text information in the region of interest, and performs word cutting on the text information. Operation, to obtain candidate terms, enabling selection of at least one candidate term as a new word and/or a hot word, due to the text information of the user extracted by the region of interest of the user's current reading behavior, The acquisition operation of the candidate term, therefore, can recognize the new word and/or the hot word in time based on the text information, thereby improving the timeliness of the term acquisition. In addition, by using the technical solution provided by the present invention, it is possible to promptly use the recognized new words and/or hot words to update various professional dictionaries loaded by the input method, that is, the input method dictionary, which can further effectively improve the dictionary of the input method. accuracy. DRAWINGS

为了更清楚地说明本发明实施例中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。 In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the embodiments or the prior art description will be briefly described below. Obviously, the drawings in the following description are the present invention. For some embodiments, other drawings may be obtained from those of ordinary skill in the art without departing from the drawings.

图 1 为本发明一实施例提供的词条的获取方法的流程示意图；图 2为本发明另一实施例提供的词条的获取装置的结构示意图。具体实施方式 FIG. 1 is a schematic flowchart of a method for acquiring an entry according to an embodiment of the present invention; FIG. 2 is a schematic structural diagram of an apparatus for acquiring an entry according to another embodiment of the present invention. detailed description

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的全部其他实施例，都属于本发明保护的范围。 In order to make the objectives, technical solutions, and advantages of the embodiments of the present invention clear, the technical solutions in the embodiments of the present invention are clear and complete. It is apparent that the described embodiments are a part of the embodiments of the invention, rather than all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

需要说明的是，本发明实施例中所涉及的终端可以包括但不限于手机、个人数字助理（ Personal Digital Assistant, PDA )、无线手持设备、平板电脑 ( Tablet Computer )、个人电脑 ( Personal Computer, PC )、 It should be noted that the terminals involved in the embodiments of the present invention may include, but are not limited to, a mobile phone, a personal digital assistant (PDA), a wireless handheld device, a tablet computer, and a personal computer (PC). ),

MP3播放器、 MP4播放器等。 MP3 player, MP4 player, etc.

另外，本文中术语"和 /或"，仅仅是一种描述关联对象的关联关系，表示可以存在三种关系，例如， A和 /或可以表示：单独存在，同时存在 A和 B，单独存在 B这三种情况。另外，本文中字符 "，一般表示前后关联对象是一种"或"的关系。 In addition, the term "and/or" in this context is merely an association describing the associated object, indicating that there may be three relationships, for example, A and/or may represent: exist alone, both A and B, and B alone. These three situations. In addition, the character " in this article generally indicates that the contextual object is an "or" relationship.

图 1 为本发明一实施例提供的词条的获取方法的流程示意图，如图 1所示。 FIG. 1 is a schematic flowchart of a method for acquiring an entry according to an embodiment of the present invention, as shown in FIG. 1 .

1 01、对用户的眼球进行跟踪操作，以获得所述用户的感兴趣区域。可选地，在本实施例的一个可能的实现方式中，在 101 中，具体可以在用户界面上对用户的眼球进行跟踪操作。其中，所述用户界面可以为终端所显示的万维网（World Wide Web， Web )页面，或者还可以为终端所显示的应用文档，例如，电子邮件、 WORD文档、 TXT文档、 PDF文档等，本发明对此不进行特别限定。 1 01. Tracking the eyeball of the user to obtain the region of interest of the user. Optionally, in a possible implementation manner of this embodiment, in 101, the tracking operation of the user's eyeball may be specifically performed on the user interface. The user interface may be a World Wide Web (Web) page displayed by the terminal, or may also be an application document displayed by the terminal, for example, an email, a WORD document, a TXT document, a PDF document, etc., the present invention This is not particularly limited.

1 02、获取所述感兴趣区域内的文本信息。 1 02. Obtain text information in the area of interest.

1 03、对所述文本信息进行切词操作，以获得候选词条。 1 03. Perform a word-cutting operation on the text information to obtain a candidate term.

1 04、选择至少一个候选词条，以作为新词和 /或热词。 1 04. Select at least one candidate term as a new word and/or a hot word.

需要说明的是， 1 01 ~1 04的执行主体可以是识别装置，可以位于本地的客户端中，以进行离线识别，或者还可以位于网络侧的服务器中，以进行在线识别，或者也可以部分功能位于客户端中，部分功能位于服务器中，以进行离线与在线结合识别，本实施例对此不进行限定。 It should be noted that the execution body of 01~1 04 may be an identification device, which may be located in the present In the local client, for offline identification, or can be located in the server on the network side for online identification, or some functions can be located in the client, and some functions are located in the server for offline and online identification. This embodiment does not limit this.

可以理解的是，所述客户端可以是安装在终端上的输入法应用程序，或者还可以是浏览器的一个网页，只要能够实现词条获取，以提供识别新词和 /或热词的客观存在形式都可以，本实施例对此不进行限定。 It can be understood that the client may be an input method application installed on the terminal, or may be a webpage of the browser, as long as the entry can be implemented to provide an objective for identifying new words and/or hot words. The existence form is acceptable, and this embodiment does not limit this.

这样，通过对用户的眼球进行跟踪操作，以获得所述用户的感兴趣区域，进而获取所述感兴趣区域内的文本信息，并对所述文本信息进行切词操作，以获得候选词条，使得能够选择至少一个候选词条，以作为新词和 /或热词，由于釆用用户当前的阅读行为所关注的区域所提取的用户感兴趣的文本信息，进行候选词条的获取操作，因此，能够基于这些文本信息及时地识别出新词和 /或热词，从而提高了词条获取的及时性。 In this way, the tracking operation of the user's eyeball is performed to obtain the region of interest of the user, thereby acquiring text information in the region of interest, and performing a word-cutting operation on the text information to obtain candidate terms. Equivalently selecting at least one candidate term as a new word and/or a hot word, and performing the acquisition operation of the candidate term by using the text information of the user's interest extracted by the region of interest of the current reading behavior of the user The ability to identify new words and/or hot words in time based on these textual information, thereby improving the timeliness of term acquisition.

另外，釆用本发明提供的技术方案，能够及时地利用所识别出的新词和 /或热词更新输入法所加载的各类专业字典即输入法字典，能够进一步有效提高输入法的字典的准确性。 In addition, by using the technical solution provided by the present invention, it is possible to promptly use the recognized new words and/or hot words to update various professional dictionaries loaded by the input method, that is, the input method dictionary, which can further effectively improve the dictionary of the input method. accuracy.

可选地，在本实施例的一个可能的实现方式中，在 1 01 中，具体可以获取所述眼球的视频信息。其中，所述眼球的视频信息可以由若干帧图像组成，可以利用摄像头进行釆集。进而，根据所述视频信息，确定所述眼球的位置区域。然后，根据所述视频信息，确定所述眼球的可运动路径，以及根据所述可运动路径，确定所述眼球的可移动区域。由于人眼球的运动弧度范围在一个固定区间内，因此，可以根据该视频信息确定眼球对应的可运动路径。该运动路径可以是一个精确的值，也可以是一个运动区间。通过该可运动路径，可以进一步计算出基于该可运动路劲的可到达面积，该可到达面积即为所述眼球的可移动区域。最后，则可以根据所述眼球的位置区域和所述眼球的可移动区域，确定所述眼球的关注区域，以作为所述用户的感兴趣区域。 Optionally, in a possible implementation manner of this embodiment, in 101, video information of the eyeball may be specifically acquired. The video information of the eyeball may be composed of a plurality of frame images, and may be collected by a camera. Further, based on the video information, a location area of the eyeball is determined. Then, based on the video information, determining a movable path of the eyeball, and determining a movable area of the eyeball according to the movable path. Since the motion arc of the human eye is in a fixed interval, the movable path corresponding to the eyeball can be determined according to the video information. The motion path can be an exact value or a motion interval. Through the movable path, it can be further calculated based on the movable The reachable area of the road force, which is the movable area of the eyeball. Finally, the region of interest of the eyeball can be determined according to the location area of the eyeball and the movable area of the eyeball as the region of interest of the user.

具体地，可以将所述眼球的位置区域在所述眼球的可移动区域内的部分，确定为所述眼球的关注区域。若所述眼球的关注区域满足关注条件，确定所述眼球的关注区域为所述用户的感兴趣区域。 Specifically, a portion of the eyeball in a movable region of the eyeball may be determined as a region of interest of the eyeball. If the region of interest of the eyeball satisfies the condition of interest, the region of interest of the eyeball is determined to be the region of interest of the user.

其中，所述关注条件可以包括但不限于关注时间和关注频次中的至少一项。 The attention condition may include, but is not limited to, at least one of attention time and frequency of interest.

例如，若所述眼球的关注区域在所述眼球的可移动区域内的停留时间大于或等于 3秒，则可以确定所述眼球的关注区域为所述用户的感兴趣区域。 For example, if the staying area of the eyeball in the movable area of the eyeball is greater than or equal to 3 seconds, it may be determined that the attention area of the eyeball is the user's interest area.

或者，再例如，若所述眼球的关注区域在所述眼球的可移动区域内的停留次数大于或等于 2次，或 2次 /分钟，则可以确定所述眼球的关注区域为所述用户的感兴趣区域。 Or, for example, if the number of stays of the eyeball in the movable area of the eyeball is greater than or equal to 2 times, or 2 times/minute, the area of interest of the eyeball may be determined to be the user's Area of interest.

可选地，在本实施例的一个可能的实现方式中，在 102中，具体可以釆用现有技术中的各种文本识别方法，获取所述感兴趣区域内的文本信息，本发明对此不进行特别限定。 Optionally, in a possible implementation manner of the embodiment, in 102, various text recognition methods in the prior art may be used to obtain text information in the region of interest. No particular limitation is imposed.

例如，可以获取所述感兴趣区域在具有文本信息的用户界面上包围的局部屏幕截图，然后，则可以对获取到的局部屏幕截图进行文本识别，以获取到所述感兴趣区域内的文本信息。 For example, a partial screenshot of the region of interest surrounded by a user interface having text information may be acquired, and then the obtained partial screenshot may be subjected to text recognition to obtain text information in the region of interest. .

或者，再例如，可以获取所述感兴趣区域的位置信息，根据该位置信息，确定对应的文本信息，以作为所述感兴趣区域内的文本信息。 Alternatively, for example, location information of the region of interest may be acquired, and corresponding text information is determined according to the location information as text information in the region of interest.

另外，文本识别的详细描述可以参见现有技术中的相关内容，此处不再赘述。 In addition, a detailed description of text recognition can be found in related content in the prior art, here No longer.

可选地，在本实施例的一个可能的实现方式中，在 1 03中，具体可以釆用现有技术中的各种切词方法，对所获取到的文本信息进行切词操作。例如，基于字符串匹配的分词方法，或者，再例如，基于理解的分词方法，或者，再例如，基于统计的分词方法，本发明对此不进行特别限定。切词方法的详细描述可以参见现有技术中的相关内容，此处不再赘述。 Optionally, in a possible implementation manner of the embodiment, in FIG. 3, the word-cutting operation of the obtained text information may be specifically performed by using various word-cutting methods in the prior art. For example, the word segmentation method based on string matching, or, for example, a word segmentation method based on understanding, or, for example, a statistical word segmentation method, is not particularly limited in the present invention. For a detailed description of the word-cutting method, refer to related content in the prior art, and details are not described herein again.

可选地，在本实施例的一个可能的实现方式中，在 1 04中，具体可以将没有出现在预先配置的输入法字典中的候选词条确定为新词。 Optionally, in a possible implementation manner of this embodiment, in FIG. 04, the candidate term that does not appear in the pre-configured input method dictionary may be specifically determined as a new word.

具体地，可以获取切词操作所获得的候选词条中的任一候选词条。如果该候选词条没有出现在预先配置的输入法字典中，则可以将该候选词条确定为新词。 Specifically, any candidate term in the candidate term obtained by the word-cutting operation may be acquired. If the candidate term does not appear in the pre-configured input method dictionary, the candidate term can be determined as a new word.

需要说明的是，预先配置的输入法字典可以配置在网络侧的服务器，或者还可以配置在本地的客户端，本实施例对此不进行特别限定。 It should be noted that the pre-configured input method dictionary may be configured on the server on the network side, or may be configured on the local client. This embodiment does not specifically limit this.

可选地，在本实施例的一个可能的实现方式中，在 1 04中，具体可以将出现在预先配置的输入法字典中的候选词条，确定为候选热词。进而，根据所述候选热词出现的词频，确定所述候选热词的热度值。然后，则可以将热度值大于或等于热度阔值的候选热词，确定为热词。 Optionally, in a possible implementation manner of this embodiment, in FIG. 04, the candidate term appearing in the pre-configured input method dictionary may be specifically determined as a candidate hot word. Further, a heat value of the candidate hot word is determined according to a word frequency in which the candidate hot word appears. Then, the candidate hot words whose heat value is greater than or equal to the heat threshold can be determined as a hot word.

具体地，可以获取切词操作所获得的候选词条中的任一候选词条。如果该候选词条已经出现在预先配置的输入法字典中，则可以将该候选词条标记为候选热词。然后，可以根据所述输入法字典，获取候选热词在指定时间范围之内出现的词频，并根据该词频，确定所述候选热词的热度值。最后，则可以将热度值大于或等于热度阔值的候选热词，确定为热词。 Specifically, any candidate term in the candidate term obtained by the word-cutting operation may be acquired. If the candidate term has appeared in the pre-configured input method dictionary, the candidate term can be marked as a candidate hot word. Then, according to the input method dictionary, the word frequency of the candidate hot words occurring within a specified time range may be obtained, and according to the word frequency, the heat value of the candidate hot words may be determined. Finally, the candidate hot words with the heat value greater than or equal to the heat threshold can be determined. For hot words.

例如，具体可以根据公式，即候选热词的热度值 = (所有候选热词的平均得分 *所有候选热词的平均词频 +候选热词的得分 *候选热词在总统计时间内的总词频） I (所有候选热词的平均词频 +候选热词在总统计时间内的总词频），确定候选热词的热度值。其中， For example, it can be specifically according to the formula, that is, the heat value of the candidate hot words = (the average score of all candidate hot words * the average word frequency of all candidate hot words + the score of the candidate hot words * the total word frequency of the candidate hot words in the total statistical time) I (the average word frequency of all candidate hot words + the total word frequency of the candidate hot words in the total statistical time), determine the heat value of the candidate hot words. among them,

候选热词的得分=候选热词在最近一个单位统计时间内的词频 /候选热词在总统计时间内的总词频。 The score of the candidate hot words = the frequency of the candidate hot words in the most recent unit statistical time / the total word frequency of the candidate hot words in the total statistical time.

以下结合具体的实施例对上述实施过程进行详细说明， H没四个候选热词即矣选热词 A、矣选热词 B、矣选热词 C和矣选热词 D，以及单位统计时间为一天，总统计时间为两天。在 2013-12-18和 2013-12-19 这两天内出现的词频如下表所示： The above implementation process is described in detail below with reference to specific embodiments. H has no four candidate hot words: hot words A, hot words B, hot words C, and hot words D, and unit statistical time. For one day, the total statistical time is two days. The word frequency that appears during the two days of 2013-12-18 and 2013-12-19 is shown in the following table:

从表格中显示的数据可以看出，根据 2013-12-18和 2013-12-19这两天的历史数据分别计算出这四个候选热词的得分分别为 0.74、 0.52、 0.8、 0.82。此时，可假设已经有 320个人预先给每个词都打了 0.72分，候选热词 A还有额外的 135个人打分，每个人都给了 0.74分，按照前面所述的计算公式，这四个候选热词的热度值分别可以为：

From the data shown in the table, it can be seen that the scores of the four candidate hot words are 0.74, 0.52, 0.8, and 0.82, respectively, based on the historical data of 2013-12-18 and 2013-12-19. At this point, it can be assumed that 320 people have already scored 0.72 points for each word in advance, and the candidate hot words A have an additional 135 points, and each person gave 0.74 points. According to the calculation formula described above, these four points. The heat values of the candidate hot words can be:

A: (0.72*320+0.74*135)/(320+135)=0.725 A: (0.72*320+0.74*135)/(320+135)=0.725

B: (0.72*320+0.52*290)/(320+290)=0.625 B: (0.72*320+0.52*290)/(320+290)=0.625

C: (0.72*320+0.8*5)/(320+5)=0.721 C: (0.72*320+0.8*5)/(320+5)=0.721

D: (0.72*320+0.82*850)/(320+850)=0.793 D: (0.72*320+0.82*850)/(320+850)=0.793

按照热度值，从大到小排序如下： According to the heat value, the order from big to small is as follows:

D>A>C>B D>A>C>B

可以看出，候选热词 D出现的频率最高，因此排名也最靠前，如果该候选热词的热度值不小于预先设置的热度阔值，则可以将候选热词 D 确定为热词。 It can be seen that the candidate hot word D appears the most frequently, so the ranking is also the highest. If the heat value of the candidate hot word is not less than the preset heat threshold, the candidate hot word D can be determined as a hot word.

可以理解的是，在识别出新词和 /或热词之后，还可以进一步利用这些词条更新本地的输入法字典，或者还可以进一步利用这些词条更新云端（网络侧）的输入法字典，本实施例对此不进行特别限定。具体可以釆用所统计的 Ngram信息和 /或 Npos信息中的至少一项，更新本地的输入法字典或云端的输入法字典，详细描述可以参见现有技术中的相关内容，此处不再赘述。 It can be understood that after identifying new words and/or hot words, the local input method dictionary can be further updated by using these terms, or the input method dictionary of the cloud (network side) can be further updated by using these terms. This embodiment is not particularly limited. For example, the local input method dictionary or the input method dictionary of the cloud may be updated by using at least one of the statistical Ngram information and/or the Npos information. For details, refer to related content in the prior art, and details are not described herein. .

可以理解的是，本实施例提供的技术方案，不仅可以针对一个用户，识别出新词和 /或热词，还可以进一步针对多个用户，对多个用户的识别结果进行有效的整理和分析，以获得针对多个用户的新词和 /或热词。可选地，在本实施例的一个可能的实现方式中，在 1 04之后，还可以进一步对所选择的新词和 /或热词进行特殊展现。例如，可以给这些新词和 /或热词增加图标标识；或者，再例如，可以在特殊的候选位置展现这些新词和 /或热词。 It can be understood that the technical solution provided in this embodiment can not only identify new words and/or hot words for one user, but also effectively organize and analyze the recognition results of multiple users for multiple users. , to get new words and / or hot words for multiple users. Optionally, in a possible implementation manner of this embodiment, after 10 04, the selected new word and/or hot word may be further specifically displayed. For example, icon characters may be added to these new words and/or hot words; or, for example, these new words and/or hot words may be presented at particular candidate locations.

本实施例中，通过对用户的眼球进行跟踪操作，以获得所述用户的感兴趣区域，进而获取所述感兴趣区域内的文本信息，并对所述文本信息进行切词操作，以获得候选词条，使得能够选择至少一个候选词条，以作为新词和 /或热词，由于釆用用户当前的阅读行为所关注的区域所提取的用户感兴趣的文本信息，进行候选词条的获取操作，因此，能够基于这些文本信息及时地识别出新词和 /或热词，从而提高了词条获取的及时性。 In this embodiment, the tracking operation of the user's eyeball is performed to obtain the region of interest of the user, and then the text information in the region of interest is acquired, and the text information is subjected to a word-cutting operation to obtain a candidate. The entry enables selection of at least one candidate term as a new word and/or a hot word, and the candidate term is acquired due to the text information of the user's interest extracted by the region of interest of the user's current reading behavior. The operation, therefore, can identify new words and/or hot words in time based on the text information, thereby improving the timeliness of the entry of the terms.

需要说明的是，对于前述的各方法实施例，为了简单描述，故将其都表述为一系列的动作组合，但是本领域技术人员应该知悉，本发明并不受所描述的动作顺序的限制，因为依据本发明，某些步骤可以釆用其他顺序或者同时进行。其次，本领域技术人员也应该知悉，说明书中所描述的实施例均属于优选实施例，所涉及的动作和模块并不一定是本发明所必须的。 It should be noted that, for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should understand that the present invention is not limited by the described action sequence. Because in accordance with the present invention, certain steps may be performed in other sequences or concurrently. In addition, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the present invention.

在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其他实施例的相关描述。 In the above embodiments, the descriptions of the various embodiments are different, and the parts that are not detailed in an embodiment can be referred to the related descriptions of other embodiments.

图 2为本发明另一实施例提供的词条的获取装置的结构示意图，如图 2所示。本实施例的词条的获取装置可以包括跟踪单元 21、获取单元 22、切词单元 23和选择单元 24。其中，跟踪单元 21，用于对用户的眼球进行跟踪操作，以获得所述用户的感兴趣区域；获取单元 22，用于获取所述感兴趣区域内的文本信息；切词单元 23，用于对所述文本信息进行切词操作，以获得候选词条；选择单元 24，用于选择至少一个候选词条，以作为新词和 /或热词。 2 is a schematic structural diagram of an apparatus for acquiring an entry according to another embodiment of the present invention, such as Figure 2 shows. The acquiring means of the term of the embodiment may include a tracking unit 21, an obtaining unit 22, a word cutting unit 23, and a selecting unit 24. The tracking unit 21 is configured to perform a tracking operation on the eyeball of the user to obtain a region of interest of the user, and an acquiring unit 22, configured to acquire text information in the region of interest; Performing a word-cutting operation on the text information to obtain a candidate term; the selecting unit 24 is configured to select at least one candidate term as a new word and/or a hot word.

需要说明的是，本实施例提供的词条的获取装置，可以位于本地的客户端中，以进行离线识别，或者还可以位于网络侧的服务器中，以进行在线识别，或者也可以部分功能位于客户端中，部分功能位于服务器中，以进行离线与在线结合识别，本实施例对此不进行限定。 It should be noted that the acquiring device of the term provided in this embodiment may be located in a local client for offline identification, or may be located in a server on the network side for online identification, or may be partially located. In the client, some functions are located in the server for offline and online identification. This embodiment does not limit this.

这样，通过跟踪单元对用户的眼球进行跟踪操作，以获得所述用户的感兴趣区域，进而由获取单元获取所述感兴趣区域内的文本信息，并由切词单元对所述文本信息进行切词操作，以获得候选词条，使得选择单元能够选择至少一个候选词条，以作为新词和 /或热词，由于釆用用户当前的阅读行为所关注的区域所提取的用户感兴趣的文本信息，进行候选词条的获取操作，因此，能够基于这些文本信息及时地识别出新词和 / 或热词，从而提高了词条获取的及时性。 In this way, the tracking operation of the user's eyeball is performed by the tracking unit to obtain the region of interest of the user, and then the text information in the region of interest is acquired by the acquiring unit, and the text information is cut by the word-cutting unit. a word operation to obtain a candidate term, such that the selection unit can select at least one candidate term as a new word and/or a hot word, the text of interest of the user extracted by the region of interest of the user's current reading behavior The information is used to perform the acquisition operation of the candidate term, and therefore, the new word and/or the hot word can be recognized in time based on the text information, thereby improving the timeliness of the term acquisition.

另外，釆用本发明提供的技术方案，能够及时地利用所识别出的新词和 /或热词更新输入法所加载的各类专业字典即输入法字典，能够进一步有效提高输入法的字典的准确性。可选地，在本实施例的一个可能的实现方式中，所述跟踪单元 21 具体可以在用户界面上对用户的眼球进行跟踪操作。其中，所述用户界面可以为终端所显示的万维网（World Wide Web， Web )页面，或者还可以为终端所显示的应用文档，例如，电子邮件、 WORD文档、 TXT 文档、 PDF文档等，本发明对此不进行特别限定。 In addition, by using the technical solution provided by the present invention, it is possible to promptly use the recognized new words and/or hot words to update various professional dictionaries loaded by the input method, that is, the input method dictionary, which can further effectively improve the dictionary of the input method. accuracy. Optionally, in a possible implementation manner of the embodiment, the tracking unit 21 may specifically perform a tracking operation on the user's eye on the user interface. The user interface may be a World Wide Web (Web) page displayed by the terminal, or may also be an application document displayed by the terminal, for example, an email, a WORD document, a TXT document, a PDF document, etc., the present invention This is not particularly limited.

可选地，在本实施例的一个可能的实现方式中，所述跟踪单元 21 具体可以用于获取所述眼球的视频信息；根据所述视频信息，确定所述眼球的位置区域；根据所述视频信息，确定所述眼球的可运动路径，以及根据所述可运动路径，确定所述眼球的可移动区域；以及根据所述眼球的位置区域和所述眼球的可移动区域，确定所述眼球的关注区域，以作为所述用户的感兴趣区域。 Optionally, in a possible implementation manner of the embodiment, the tracking unit 21 may be specifically configured to acquire video information of the eyeball; and determine, according to the video information, a location area of the eyeball; Video information, determining a movable path of the eyeball, and determining a movable area of the eyeball according to the movable path; and determining the eyeball according to a positional area of the eyeball and a movable area of the eyeball The area of interest, as the area of interest of the user.

其中，所述眼球的视频信息可以由若干帧图像组成，可以利用摄像头进行釆集。由于人眼球的运动弧度范围在一个固定区间内，因此，可以根据该视频信息确定眼球对应的可运动路径。该运动路径可以是一个精确的值，也可以是一个运动区间。通过该可运动路径，可以进一步计算出基于该可运动路劲的可到达面积，该可到达面积即为所述眼球的可移动区域。 The video information of the eyeball may be composed of a plurality of frame images, and may be collected by using a camera. Since the motion arc of the human eye is in a fixed interval, the movable path corresponding to the eyeball can be determined based on the video information. The motion path can be an exact value or a motion interval. From the movable path, the reachable area based on the movable path strength can be further calculated, which is the movable area of the eyeball.

具体地，所述跟踪单元 21具体可以将所述眼球的位置区域在所述眼球的可移动区域内的部分，确定为所述眼球的关注区域；若所述眼球的关注区域满足关注条件，确定所述眼球的关注区域为所述用户的感兴趣区域。 Specifically, the tracking unit 21 may specifically determine a portion of the eyeball in a movable region of the eyeball as a region of interest of the eyeball; if the region of interest of the eyeball satisfies a condition of interest, determining The region of interest of the eyeball is the region of interest of the user.

其中，所述关注条件可以包括但不限于关注时间和关注频次中的至少一项。例如，若所述眼球的关注区域在所述眼球的可移动区域内的停留时间大于或等于 3秒，所述跟踪单元 21则可以确定所述眼球的关注区域为所述用户的感兴趣区域。 The attention condition may include, but is not limited to, at least one of a attention time and a frequency of interest. For example, if the staying area of the eyeball in the movable area of the eyeball is greater than or equal to 3 seconds, the tracking unit 21 may determine that the attention area of the eyeball is the user's area of interest.

或者，再例如，若所述眼球的关注区域在所述眼球的可移动区域内的停留次数大于或等于 2次，或 2次 /分钟，所述跟踪单元 21则可以确定所述眼球的关注区域为所述用户的感兴趣区域。 Or, for example, if the number of times the region of interest of the eyeball is in the movable region of the eyeball is greater than or equal to 2 times, or 2 times/minute, the tracking unit 21 may determine the region of interest of the eyeball. Is the area of interest of the user.

可选地，在本实施例的一个可能的实现方式中，所述获取单元 22 具体可以釆用现有技术中的各种文本识别方法，获取所述感兴趣区域内的文本信息，本发明对此不进行特别限定。 Optionally, in a possible implementation manner of the embodiment, the acquiring unit 22 may obtain the text information in the region of interest by using various text recognition methods in the prior art. This is not particularly limited.

例如，所述获取单元 22可以获取所述感兴趣区域在具有文本信息的用户界面上包围的局部屏幕截图，然后，则可以对获取到的局部屏幕截图进行文本识别，以获取到所述感兴趣区域内的文本信息。 For example, the acquiring unit 22 may acquire a partial screenshot of the region of interest surrounded by a user interface having text information, and then, may perform text recognition on the obtained partial screenshot to obtain the interested Text information within the area.

或者，再例如，所述获取单元 22可以获取所述感兴趣区域的位置信息，根据该位置信息，确定对应的文本信息，以作为所述感兴趣区域内的文本信息。 Alternatively, for example, the acquiring unit 22 may acquire location information of the region of interest, and determine corresponding text information as text information in the region of interest according to the location information.

另外，文本识别的详细描述可以参见现有技术中的相关内容，此处不再赘述。 In addition, a detailed description of the text identification can be found in related content in the prior art, and details are not described herein again.

可选地，在本实施例的一个可能的实现方式中，切词单元 23具体可以釆用现有技术中的各种切词方法，对所获取到的文本信息进行切词操作。例如，基于字符串匹配的分词方法，或者，再例如，基于理解的分词方法，或者，再例如，基于统计的分词方法，本发明对此不进行特别限定。切词方法的详细描述可以参见现有技术中的相关内容，此处不再赘述。可选地，在本实施例的一个可能的实现方式中，所述选择单元 24，具体可以用于将没有出现在预先配置的输入法字典中的候选词条确定为新词。 Optionally, in a possible implementation manner of the embodiment, the word-cutting unit 23 may perform a word-cutting operation on the obtained text information by using various word-cutting methods in the prior art. For example, the word segmentation method based on string matching, or, for example, a word segmentation method based on understanding, or, for example, a statistical word segmentation method, is not particularly limited in the present invention. For a detailed description of the word-cutting method, refer to related content in the prior art, and details are not described herein again. Optionally, in a possible implementation manner of the embodiment, the selecting unit 24 may be specifically configured to determine a candidate term that does not appear in the pre-configured input method dictionary as a new word.

具体地，所述选择单元 24可以获取切词操作所获得的候选词条中的任一候选词条；如果该候选词条没有出现在预先配置的输入法字典中，则可以将该候选词条确定为新词。 Specifically, the selecting unit 24 may acquire any candidate term in the candidate term obtained by the word-cutting operation; if the candidate term does not appear in the pre-configured input method dictionary, the candidate term may be Determined to be a new word.

可选地，在本实施例的一个可能的实现方式中，所述选择单元 24，具体可以用于将出现在预先配置的输入法字典中的候选词条，确定为候选热词； 4艮据所述美选热词出现的词频，确定所述美选热词的热度值；以及将热度值大于或等于热度阔值的候选热词，确定为热词。 Optionally, in a possible implementation manner of the embodiment, the selecting unit 24 may be specifically configured to determine a candidate term that appears in a pre-configured input method dictionary as a candidate hot word; The word frequency of the selected hot word is determined, and the heat value of the hot word is determined; and the candidate hot word whose heat value is greater than or equal to the heat value is determined as a hot word.

具体地，所述选择单元 24可以获取切词操作所获得的候选词条中的任一候选词条；如果该候选词条已经出现在预先配置的输入法字典中，则可以将该候选词条标记为候选热词；然后，可以根据所述输入法字典，获取候选热词在指定时间范围之内出现的词频，并根据该词频，确定所述候选热词的热度值；最后，则可以将热度值大于或等于热度阔值的候选热词，确定为热词。 Specifically, the selecting unit 24 may obtain any candidate term in the candidate term obtained by the word-cutting operation; if the candidate term has appeared in the pre-configured input method dictionary, the candidate term may be Marking as a candidate hot word; then, according to the input method dictionary, obtaining a word frequency of the candidate hot word within a specified time range, and determining a heat value of the candidate hot word according to the word frequency; finally, A candidate hot word whose heat value is greater than or equal to the heat threshold is determined as a hot word.

候选热词的得分=候选热词在最近一个单位统计时间内的词频 /候选热词在总统计时间内的总词频。 Score of candidate hot words = word frequency/candidate of candidate hot words in the most recent unit statistical time The total word frequency of the hot words in the total statistical time.

详细描述可以参见图 1对应的实施例中的相关内容，此处不再赘述。可以理解的是，本实施例提供的的词条的获取装置在识别出新词和 / 或热词之后，还可以进一步利用这些词条更新本地的输入法字典，或者还可以进一步利用这些词条更新云端（网络侧）的输入法字典，本实施例对此不进行特别限定。具体可以釆用所统计的 Ngram信息和 /或 Npos 信息中的至少一项，更新本地的输入法字典或云端的输入法字典，详细描述可以参见现有技术中的相关内容，此处不再赘述。 For details, refer to related content in the embodiment corresponding to FIG. 1, and details are not described herein again. It can be understood that, after the new word and/or the hot word is recognized, the acquiring device of the term provided by the embodiment may further use the term to update the local input method dictionary, or may further utilize the terms. The input method dictionary of the cloud (network side) is updated, which is not particularly limited in this embodiment. Specifically, the local input method dictionary or the input method dictionary of the cloud may be updated by using at least one of the statistical Ngram information and/or the Npos information. For details, refer to related content in the prior art, and details are not described herein. .

可以理解的是，本实施例提供的技术方案，不仅可以针对一个用户，识别出新词和 /或热词，还可以进一步针对多个用户，对多个用户的识别结果进行有效的整理和分析，以获得针对多个用户的新词和 /或热词。 It can be understood that the technical solution provided in this embodiment can not only identify new words and/or hot words for one user, but also effectively organize and analyze the recognition results of multiple users for multiple users. , to get new words and / or hot words for multiple users.

可选地，在本实施例的一个可能的实现方式中，本实施例提供的词条的获取装置还可以进一步对所选择的新词和 /或热词进行特殊展现。例如，可以给这些新词和 /或热词增加图标标识；或者，再例如，可以在特殊的候选位置展现这些新词和 /或热词。 Optionally, in a possible implementation manner of the embodiment, the acquiring device of the term provided by the embodiment may further perform special presentation on the selected new word and/or hot word. For example, icon characters may be added to these new words and/or hot words; or, for example, these new words and/or hot words may be presented at particular candidate locations.

本实施例中，通过跟踪单元对用户的眼球进行跟踪操作，以获得所述用户的感兴趣区域，进而由获取单元获取所述感兴趣区域内的文本信息，并由切词单元对所述文本信息进行切词操作，以获得候选词条，使得选择单元能够选择至少一个候选词条，以作为新词和 /或热词，由于釆用用户当前的阅读行为所关注的区域所提取的用户感兴趣的文本信息，进行候选词条的获取操作，因此，能够基于这些文本信息及时地识别出新词和 /或热词，从而提高了词条获取的及时性。 In this embodiment, the tracking operation of the user's eyeball is performed by the tracking unit to obtain the region of interest of the user, and then the text information in the region of interest is acquired by the acquiring unit, and the text is read by the word-cutting unit. The information is subjected to a word-cutting operation to obtain a candidate term, so that the selecting unit can select at least one candidate term as a new word and/or a hot word, and the user sense extracted by the region of interest of the user's current reading behavior The text information of interest is used to perform the acquisition operation of the candidate term. Therefore, new words and/or hot words can be recognized in time based on the text information, thereby improving the timeliness of the term acquisition.

另外，釆用本发明提供的技术方案，能够及时地利用所识别出的新词和 /或热词更新输入法所加载的各类专业字典即输入法字典，能够进一步有效提高输入法的字典的准确性。 In addition, by using the technical solution provided by the present invention, the newly identified new one can be utilized in time. Words and/or hot words update the various professional dictionaries loaded by the input method, that is, the input method dictionary, which can further effectively improve the accuracy of the input method dictionary.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的***，装置和单元的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。 A person skilled in the art can clearly understand that, for the convenience and brevity of the description, the specific working process of the system, the device and the unit described above can be referred to the corresponding process in the foregoing method embodiment, and details are not described herein again.

在本发明所提供的几个实施例中，应该理解到，所揭露的***，装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个***，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。 In the several embodiments provided by the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not executed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical, mechanical or otherwise. The components displayed as units may or may not be physical units, i.e., may be located in one place, or may be distributed over multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solution of the embodiment.

另外，在本发明各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以釆用硬件的形式实现，也可以釆用硬件加软件功能单元的形式实现。 In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.

上述以软件功能单元的形式实现的集成的单元，可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中，包括若干指令用以使得一台计算机装置（可以是个人计算机，服务器，或者网络装置等）或处理器（processor )执行本发明各个实施例所述方法的部分步骤。而前述的存储介质包括： U盘、移动硬盘、只读存储器 ( Read-Only Memory, ROM ) 、随机存取存储器 ( Random Access Memory, RAM ) 、磁碟或者光盘等各种可以存储程序代码的介质。 The above-described integrated unit implemented in the form of a software functional unit can be stored in a computer readable storage medium. The above software functional unit is stored in a storage medium, package A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) or a processor to perform some of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a removable hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program code. .

最后应说明的是：以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。 It should be noted that the above embodiments are only for explaining the technical solutions of the present invention, and are not intended to be limiting; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that: The technical solutions described in the foregoing embodiments are modified, or some of the technical features are equivalently replaced. The modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1、一种词条的获取方法，其特征在于，包括： 1. A method for obtaining entries, which is characterized by including:

对用户的眼球进行跟踪操作，以获得所述用户的感兴趣区域；获取所述感兴趣区域内的文本信息； Perform a tracking operation on the user's eyeballs to obtain the user's area of interest; Obtain text information within the area of interest;

对所述文本信息进行切词操作，以获得候选词条； Perform word segmentation operations on the text information to obtain candidate entries;

选择至少一个候选词条，以作为新词和 /或热词。 Select at least one candidate term as a new word and/or hot word.

2、根据权利要求 1所述的方法，其特征在于，所述对用户的眼球进行跟踪操作，以获得所述用户的感兴趣区域，包括： 2. The method according to claim 1, wherein the tracking operation on the user's eyeballs to obtain the user's area of interest includes:

获取所述眼球的视频信息； Obtain video information of the eyeball;

根据所述视频信息，确定所述眼球的位置区域； According to the video information, determine the position area of the eyeball;

根据所述视频信息，确定所述眼球的可运动路径，以及根据所述可运动路径，确定所述眼球的可移动区域； Determine the movable path of the eyeball based on the video information, and determine the movable area of the eyeball based on the movable path;

根据所述眼球的位置区域和所述眼球的可移动区域，确定所述眼球的关注区域，以作为所述用户的感兴趣区域。 According to the position area of the eyeball and the movable area of the eyeball, the area of interest of the eyeball is determined as the user's area of interest.

3、根据权利要求 2所述的方法，其特征在于，所述根据所述眼球的位置和所述眼球的可移动区域，确定所述眼球的关注区域，以作为所述用户的感兴趣区域，包括： 3. The method according to claim 2, wherein the area of interest of the eyeball is determined based on the position of the eyeball and the movable area of the eyeball as the user's area of interest, include:

将所述眼球的位置区域在所述眼球的可移动区域内的部分，确定为所述眼球的关注区域； Determine the portion of the eyeball's position area within the movable area of the eyeball as the area of interest of the eyeball;

若所述眼球的关注区域满足关注条件，确定所述眼球的关注区域为所述用户的感兴趣区域。 If the attention area of the eyeball meets the attention condition, the attention area of the eyeball is determined to be the user's area of interest.

4、根据权利要求 3所述的方法，其特征在于，所述关注条件包括关注时间和关注频次中的至少一项。 4. The method according to claim 3, wherein the attention condition includes at least one of attention time and attention frequency.

5、根据权利要求 1 ~4任一权利要求所述的方法，其特征在于，所述选择至少一个候选词条，以作为新词和 /或热词，包括： 5. The method according to any one of claims 1 to 4, wherein the selecting at least one candidate entry as a new word and/or hot word includes:

将没有出现在预先配置的输入法字典中的候选词条确定为新词。 Candidate entries that do not appear in the preconfigured input method dictionary are determined as new words.

6、根据权利要求 1 ~4任一权利要求所述的方法，其特征在于，所述选择至少一个候选词条，以作为新词和 /或热词，包括： 6. The method according to any one of claims 1 to 4, wherein the selecting at least one candidate entry as a new word and/or hot word includes:

将出现在预先配置的输入法字典中的候选词条，确定为候选热词；根据所述候选热词出现的词频，确定所述候选热词的热度值；将热度值大于或等于热度阔值的候选热词，确定为热词。 Determine the candidate entries that appear in the preconfigured input method dictionary as candidate hot words; determine the popularity value of the candidate hot word according to the word frequency of the candidate hot word; determine the popularity value to be greater than or equal to the popularity threshold value The candidate hot words are determined as hot words.

7、一种词条的获取装置，其特征在于，包括： 7. A device for acquiring entries, characterized by including:

跟踪单元，用于对用户的眼球进行跟踪操作，以获得所述用户的感兴趣区域； A tracking unit is used to track the user's eyeballs to obtain the user's area of interest;

获取单元，用于获取所述感兴趣区域内的文本信息； An acquisition unit, used to acquire text information within the area of interest;

切词单元，用于对所述文本信息进行切词操作，以获得候选词条；选择单元，用于选择至少一个候选词条，以作为新词和 /或热词。 The word segmentation unit is used to perform word segmentation operations on the text information to obtain candidate word entries; the selection unit is used to select at least one candidate word entry as a new word and/or a hot word.

8、根据权利要求 7所述的装置，其特征在于，所述跟踪单元，具体用于 8. The device according to claim 7, characterized in that the tracking unit is specifically used to

获取所述眼球的视频信息； Obtain video information of the eyeball;

根据所述视频信息，确定所述眼球的可运动路径，以及根据所述可运动路径，确定所述眼球的可移动区域；以及 Determine the movable path of the eyeball based on the video information, and determine the movable area of the eyeball based on the movable path; and

9、根据权利要求 8所述的装置，其特征在于，所述跟踪单元，具体用于 9. The device according to claim 8, characterized in that, the tracking unit, specifically used for

将所述眼球的位置区域在所述眼球的可移动区域内的部分，确定为所述眼球的关注区域；以及 Determine the portion of the eyeball's position area within the movable area of the eyeball as the area of interest for the eyeball; and

1 0、根据权利要求 9所述的装置，其特征在于，所述关注条件包括关注时间和关注频次中的至少一项。 10. The device according to claim 9, wherein the attention condition includes at least one of attention time and attention frequency.

1 1、根据权利要求 7~ 1 0任一权利要求所述的装置，其特征在于，所述选择单元，具体用于 11. The device according to any one of claims 7 to 10, characterized in that the selection unit is specifically used for

1 2、根据权利要求 7~ 1 0任一权利要求所述的装置，其特征在于，所述选择单元，具体用于 12. The device according to any one of claims 7 to 10, characterized in that the selection unit is specifically used for

将出现在预先配置的输入法字典中的候选词条，确定为候选热词；才艮据所述 4矣选热词出现的词频，确定所述 4矣选热词的热度值；以及将热度值大于或等于热度阔值的候选热词，确定为热词。 Determine the candidate entries that appear in the preconfigured input method dictionary as candidate hot words; determine the popularity value of the four selected hot words based on the frequency of occurrence of the four selected hot words; and determine the popularity of the four selected hot words. Candidate hot words whose value is greater than or equal to the popularity threshold are determined to be hot words.

1 3、一种计算机存储介质，所述计算机存储介质被编码有计算机程序，其特征在于，所述程序在被一个或多个计算机执行时使得所述一个或多个计算机执行如下操作： 13. A computer storage medium, the computer storage medium is encoded with a computer program, characterized in that, when executed by one or more computers, the program causes the one or more computers to perform the following operations:

1 4、一种设备，包括至少一个处理器、存储器以及至少一个计算机程序；所述至少一个计算机程序存储于所述存储器并被所述至少一个处理器执行；其特征在于，所述计算机程序包括执行以下操作的指令：对用户的眼球进行跟踪操作，以获得所述用户的感兴趣区域；获取所述感兴趣区域内的文本信息； 1 4. A device including at least one processor, memory and at least one computer Program; The at least one computer program is stored in the memory and executed by the at least one processor; It is characterized in that the computer program includes instructions for performing the following operations: tracking the user's eyeballs to obtain the The user's area of interest; Obtain text information within the area of interest;