CN111401083B - Name identification method and device, storage medium and processor - Google Patents

Name identification method and device, storage medium and processor Download PDF

Info

Publication number
CN111401083B
CN111401083B CN201910002379.2A CN201910002379A CN111401083B CN 111401083 B CN111401083 B CN 111401083B CN 201910002379 A CN201910002379 A CN 201910002379A CN 111401083 B CN111401083 B CN 111401083B
Authority
CN
China
Prior art keywords
interactive text
word segmentation
user
name
call database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910002379.2A
Other languages
Chinese (zh)
Other versions
CN111401083A (en
Inventor
徐光伟
李辰
包祖贻
刘恒友
李林琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201910002379.2A priority Critical patent/CN111401083B/en
Publication of CN111401083A publication Critical patent/CN111401083A/en
Application granted granted Critical
Publication of CN111401083B publication Critical patent/CN111401083B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a name identification method and device, a storage medium and a processor. Wherein the method comprises the following steps: in the process of communication of a plurality of users, acquiring interactive text in the process of communication; acquiring a call database, wherein the call database stores the user name of at least one user group; user names in the interactive text are identified based on a generic call database and a generic call database, wherein the generic call database stores a plurality of generic user names. The invention solves the technical problem of lower accuracy of name recognition caused by poor special name processing effect in real-time communication.

Description

Name identification method and device, storage medium and processor
Technical Field
The present invention relates to the field of information processing technologies, and in particular, to a method and apparatus for identifying a name, a storage medium, and a processor.
Background
The real-time communication translation is an innovative scene in the current internet real-time communication software, can relieve the communication barrier among people with different languages, and has the most wide application of Chinese-English translation. Machine translation based on machine learning has been successfully applied to many translation scenarios, and real-time communication translation also uses the technology. For 'Chinese-English' translation in real-time communication, various Chinese names are difficult to translate, and especially, special flower names or alias systems can be used in different enterprises, and the translation model has poor processing effects on the special names. Such as: "Liu Zong says that the present-day leave-on" translates to "Liu always said today is a holiday" and the name of "Liu Zong" is not indicated. At present, the main stream scheme can take Chinese name recognition as a pre-module alone, pre-recognize Chinese names, and translate the recognized Chinese names directly according to pinyin or corresponding conversion rules. Chinese name recognition is one of the important basic tasks in Chinese natural language processing, and the position of a Chinese name needs to be recognized in a text. At present, the method for identifying Chinese names mainly utilizes the supervised machine learning based on the sequence labeling of word levels to train a Chinese name identification model on a fixed manual labeling data set. However, supervised learning often has the problem that the generalization effect is not good enough when the application field is migrated, such as: the manual annotation data set used for model training is common news corpus, the actual application scene is real-time communication in enterprises, and the enterprise possibly uses a flower name or alias system, and has a large gap with common Chinese personal names, so that the model recognition capability is weak. Therefore, the translation model generally has poor processing effects on these unique names.
In view of the above problems, no effective solution has been proposed at present.
Disclosure of Invention
The embodiment of the invention provides a name identification method and device, a storage medium and a processor, which are used for at least solving the technical problem of low accuracy of name identification caused by poor special name processing effect in real-time communication.
According to an aspect of an embodiment of the present invention, there is provided a name identification method, including: in the process of communication of a plurality of users, acquiring interactive text in the process of communication; acquiring a call database, wherein the call database stores the user name of at least one user group; and identifying the user names in the interactive text based on a universal call database and the call database, wherein a plurality of universal user names are stored in the universal call database.
Further, based on the universal call database and the call database, identifying the user name in the interactive text includes: performing word segmentation on the interactive text based on the call database and the universal call database to obtain a word segmentation result, wherein the word segmentation result comprises at least one candidate user name; and identifying the user name in the interactive text from the at least one candidate user name by combining the context information of the interactive text.
Further, based on the universal call database and the call database, performing word segmentation on the interactive text to obtain a word segmentation result, wherein the step of obtaining the word segmentation result comprises the following steps of: performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the universal call database to obtain a first word segmentation result set; performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the call database to obtain a second word segmentation result set; and selecting a word segmentation result with highest probability from the first word segmentation result set and the second word segmentation result set by adopting a word segmentation model as a word segmentation result of the interactive text.
Further, identifying the user name in the interactive text from the at least one candidate user name in combination with the context information of the interactive text comprises: and identifying the at least one candidate user name by adopting a classification model combined with the context characteristics so as to determine whether the at least one candidate user name is the user name in the interactive text.
Further, the classification model includes a convolutional neural network model.
Further, after identifying the user name in the interactive text, the method further comprises: translating the identified user name in the interactive text by adopting a user name rule to obtain a first translation result; translating texts except the recognized user names in the interactive text by adopting a machine translation model to obtain a second translation result; and obtaining a translation result of the interactive text in the communication process based on the first translation result and the second translation result.
Further, obtaining the call database includes: if the plurality of users belong to the same user group, acquiring a call database corresponding to the user group; if the plurality of users belong to a plurality of user groups, acquiring a call database corresponding to each user group, acquiring a plurality of call databases, and taking the call databases as call databases of at least one user group.
According to another aspect of the embodiment of the present invention, there is also provided a name recognition apparatus, including: the first acquisition unit is used for acquiring interactive texts in the communication process of a plurality of users; a second obtaining unit, configured to obtain a name database, where the name database stores a user name of at least one user group; the identification unit is used for identifying the user names in the interactive text based on a universal call database and the call database, wherein the universal call database stores a plurality of universal user names.
Further, the identification unit includes: the processing module is used for carrying out word segmentation on the interactive text according to the title database and the universal title database to obtain a word segmentation result, wherein the word segmentation result comprises at least one candidate user name; and the identification module is used for identifying the user name in the interactive text from the at least one candidate user name in combination with the context information of the interactive text.
Further, the identification module includes: the first processing sub-module is used for performing word segmentation on the interactive text by adopting a forward maximum matching algorithm according to the universal call database to obtain a first word segmentation result set; the second processing sub-module is used for carrying out word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the call database to obtain a second word segmentation result set; and the determining submodule is used for selecting the word segmentation result with the highest probability from the first word segmentation result set and the second word segmentation result set by adopting a word segmentation model as the word segmentation result of the interactive text.
Further, the identification module includes: and the identification sub-module is used for identifying the at least one candidate user name by adopting a classification model combined with the context characteristics so as to determine whether the at least one candidate user name is the user name in the interactive text.
Further, the classification model includes a convolutional neural network model.
Further, the apparatus further comprises: the first translation unit is used for translating the user names in the identified interactive text by adopting a user name rule after the user names in the interactive text are identified, so as to obtain a first translation result; the second translation unit is used for translating texts except the recognized user names in the interactive text by adopting a machine translation model to obtain a second translation result; and the determining unit is used for obtaining the translation result of the interactive text in the communication process according to the first translation result and the second translation result.
Further, the second acquisition unit includes: the first acquisition module is used for acquiring a call database corresponding to the user group under the condition that the plurality of users belong to the same user group; the second obtaining module is configured to obtain a call database corresponding to each user group under the condition that the plurality of users belong to a plurality of user groups, obtain a plurality of call databases, and use the plurality of call databases as call databases of the at least one user group.
According to an aspect of an embodiment of the present invention, there is provided a storage medium including a stored program, where the program, when executed, controls a device in which the storage medium is located to perform the method for identifying a name of any one of the above.
According to an aspect of an embodiment of the present invention, there is provided a processor, configured to execute a program, where the program executes the method for identifying a name according to any one of the foregoing.
In the embodiment of the invention, the interactive text in the communication process is acquired by combining the call database and the universal call database in the communication process of a plurality of users; acquiring a call database, wherein the call database stores the user name of at least one user group; based on the call database and the universal call database, the user names in the interactive text are identified, wherein the universal call database stores a plurality of universal user names, so that the purpose of accurately identifying the user names is achieved, the technical effect of improving the accuracy of identifying the specific names in the real-time communication process is achieved, and the technical problem of lower accuracy of identifying the names due to poor processing effect on the specific names in the real-time communication is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:
fig. 1 is a block diagram of a hardware structure of a computer terminal according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method of identifying names according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a method of identifying names according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a name recognition device according to an embodiment of the present invention; and
fig. 5 is a block diagram of an alternative computer terminal according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
First, partial terms or terminology appearing in describing embodiments of the present application are applicable to the following explanation:
machine translation: machine-intelligent translation using machine learning algorithms.
Supervised learning: and training a machine learning model by using the data set with the manual annotation.
CNN: convolutional neural networks.
RNN: a recurrent neural network.
Self-Attention: self-attention mechanism, a neural network model.
End-to-end neural network: also called sentence-to-sentence, is a network structure based on sentence learning, and is widely used for language models and machine translation.
Example 1
There is also provided, in accordance with an embodiment of the present invention, an embodiment of a method of identifying names, it being noted that the steps shown in the flowchart of the figures may be performed in a computer system, such as a set of computer executable instructions, and, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in an order other than that shown or described herein.
The method embodiment provided in the first embodiment of the present application may be executed in a mobile terminal, a computer terminal or a similar computing device. Fig. 1 shows a hardware block diagram of a computer terminal (or mobile device) for a method of identification of names. As shown in fig. 1, the computer terminal 10 (or mobile device 10) may include one or more processors 102 (shown as 102a, 102b, … …,102 n) which may include, but are not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA, a memory 104 for storing data, and a transmission module 106 for communication functions. In addition, the method may further include: a display, an input/output interface (I/O interface), a Universal Serial Bus (USB) port (which may be included as one of the ports of the I/O interface), a network interface, a power supply, and/or a camera. It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 1 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
It should be noted that the one or more processors 102 and/or other data processing circuits described above may be referred to generally herein as "data processing circuits. The data processing circuit may be embodied in whole or in part in software, hardware, firmware, or any other combination. Furthermore, the data processing circuitry may be a single stand-alone processing module, or incorporated, in whole or in part, into any of the other elements in the computer terminal 10 (or mobile device). As referred to in the embodiments of the present application, the data processing circuit acts as a processor control (e.g., selection of the path of the variable resistor termination to interface).
In the above-described operating environment, the present application provides a method for identifying names as shown in fig. 2. Fig. 2 is a flowchart of a name recognition method according to a first embodiment of the present invention.
Step 201, in the process of communication of a plurality of users, acquiring interactive text in the process of communication.
For example, in the process of communication between the user a and the user B, interactive text in communication between the user a and the user B is obtained.
It should be noted that, the communication performed by the user may be instant communication.
Step 202, acquiring a call database, wherein the call database stores user names of at least one user group.
The user group may be a user group in an enterprise. For example, for enterprise A, there is one name for each employee, e.g., employee's name A is Lan Ni and employee's name B is orange. The title database may include the flower names of all employees in an enterprise.
And step 203, identifying the user names in the interactive text based on a universal call database and a call database, wherein a plurality of universal user names are stored in the universal call database.
All common user names in the history data stored in the above-mentioned common call database, for example, xiaoming, xiaohong, father, mother, girl, and the like.
And identifying the user name in the real-time communication process of the user based on the call database and the universal call database.
For example, the interactive text includes: the flight is delayed today, and the call database contains the call of the flight, so that the "flight" in the "flight delay today" in the interactive text is identified as the call of the user.
Optionally, in the method for identifying a name provided in the embodiment of the present application, identifying, based on the title database and the universal title database, a user name in the interactive text includes: performing word segmentation on the interactive text based on the call database and the universal call database to obtain a word segmentation result, wherein the word segmentation result comprises at least one candidate user name; and identifying the user name in the interactive text from the at least one candidate user name by combining the context information of the interactive text.
For example, the interactive text of a plurality of users is "the flight is late today", and word segmentation processing is performed on the interactive text, so as to obtain: today, flying, late, aliquoting results, wherein the candidate user names in the word segmentation results are "flying", and the "flying" is determined as the user name in the interactive text in combination with the context information of the interactive text.
Through the steps, the interactive text in the communication process is obtained in a mode of combining the call database and the universal call database in the communication process of a plurality of users; acquiring a call database, wherein the call database stores the user name of at least one user group; based on the call database and the universal call database, the user names in the interactive text are identified, wherein the universal call database stores a plurality of universal user names, so that the purpose of accurately identifying the user names is achieved, the technical effect of improving the accuracy of identifying the specific names in the real-time communication process is achieved, and the technical problem of lower accuracy of identifying the names due to poor processing effect on the specific names in the real-time communication is solved.
Optionally, in the method for identifying a name provided in the embodiment of the present application, performing word segmentation on the interactive text based on the title database and the universal title database, where obtaining a word segmentation result includes: performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the universal call database to obtain a first word segmentation result set; performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the call database to obtain a second word segmentation result set; and selecting a word segmentation result with highest probability from the first word segmentation result set and the second word segmentation result set by adopting a word segmentation model as a word segmentation result of the interactive text.
In the scheme, a possible segmentation path is listed on the text by utilizing a forward maximum matching algorithm according to a universal call database, and a first word segmentation result set is obtained. And listing possible segmentation paths on the text by using a forward maximum matching algorithm according to the call database to obtain a second segmentation result set. And selecting one path with highest probability from all candidate segmentation paths (namely the first segmentation result set and the second segmentation result set) as a segmentation result (corresponding to the segmentation result) by adopting a segmentation model.
Optionally, in the method for identifying a name provided in the embodiment of the present application, identifying, from the at least one candidate user name, a user name in the interactive text in combination with the context information of the interactive text includes: and identifying the at least one candidate user name by adopting a classification model combined with the context characteristics so as to determine whether the at least one candidate user name is the user name in the interactive text.
In the above scheme, the at least one candidate user name may be identified using a CNN (convolutional neural network) classifier that combines the context features to determine whether the at least one candidate user name is a user name in the interactive text.
It should be noted that, the CNN (convolutional neural network) or RNN (recurrent neural network) classifier combined with the context features is a classification model which trains out the personal name words through the supervised personal name recognition training data, and the model can generalize the context environment when the personal name words appear, thereby being effective for distinguishing the custom personal name candidate words.
Optionally, in the method for identifying a name provided in the embodiment of the present application, after identifying a user name in the interactive text, the method further includes: translating the identified user name in the interactive text by adopting a user name rule to obtain a first translation result; translating texts except the recognized user names in the interactive text by adopting a machine translation model to obtain a second translation result; and obtaining a translation result of the interactive text in the communication process based on the first translation result and the second translation result.
In real-time communication of multiple users, chinese-English translation is needed to be performed on communication content, after the user name in the interactive text is identified, the user name is translated according to the user name rule, for example, the interactive text in the communication content is "volitional description today is fake", wherein the volitional is identified as the user name, the volitional is translated into Feixiang in the interactive text, the volitional description today is translated into "said today is a holiday" in the interactive text, and the combination is that: feiXiang said today is a holiday. And obtaining the translation result of the interactive text in the communication process. The scheme can be applied to the Chinese-English translation scene in real-time communication, and solves the problem that the translation model has poor translation effect on Chinese names (especially customized names) in different user scenes.
It should be noted that, for most Chinese names, the "Chinese-English" translation can be directly performed by the rule of converting Chinese characters into Pinyin, if the customized name dictionary contains unique modes such as "Liu Zong", "Li Dong", etc., the corresponding rule is used for translation. The machine translation model can be an end-to-end neural network translation model based on Self-Attention, and is trained by using large-scale Chinese-English parallel corpus.
Optionally, in the method for identifying a name provided in the embodiment of the present application, acquiring a name database includes: if the plurality of users belong to the same user group, acquiring a call database corresponding to the user group; if the plurality of users belong to a plurality of user groups, acquiring a call database corresponding to each user group, acquiring a plurality of call databases, and taking the call databases as call databases of at least one user group.
For example, user a and user B are communicating in real time, and user a and user B may or may not belong to the same user group. If the user A and the user B belong to the same enterprise, a customized call database comprising the user A and the user B exists in the enterprise, the call database of the enterprise is obtained, and if the user A and the user B do not belong to the same user group, the customized call database A of the user group to which the user A belongs and the customized call database B of the user group to which the user B belongs are respectively obtained. And taking the customized call database A and the customized call database B as customized call databases for identifying user names in interactive texts in the real-time communication process of the user A and the user B.
As shown in fig. 2, for a plurality of users to communicate with a text in real time, different enterprises can sort out an enterprise customized personal name dictionary (corresponding to the above-mentioned name database) according to an internal flower name or alias system, and the customized personal name recognition module can recognize the customized personal name in the text by combining with the customized personal name dictionary.
And the identified Chinese name part is directly translated through rules such as Chinese character to pinyin. For the rest of the text, where the Chinese name is recognized, the translation is still performed by the normal machine translation model. And finally, combining the two translated parts according to the corresponding positions of the text to obtain a final machine translation result.
In the embodiment of the application, a customized Chinese name recognition method is provided, so that different users can customize a name recognition model suitable for themselves in a personalized manner. And then, under the translation scene in real-time communication, the name of the person can be accurately identified, so that the accuracy of text translation in the communication process is improved.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present invention is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present invention. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present invention.
From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.
Example 2
According to an embodiment of the present invention, there is also provided an apparatus for implementing the identification method of the above name, as shown in fig. 3, the apparatus includes: a first acquisition unit 300, a second acquisition unit 301, and an identification unit 302.
Specifically, the first obtaining unit 300 is configured to obtain, in a process of communication by a plurality of users, an interactive text in the process of communication;
a second obtaining unit 301, configured to obtain a call database, where the call database stores user names of at least one user group;
The identifying unit 302 is configured to identify the user names in the interactive text according to a generic call database and a generic call database, where the generic call database stores a plurality of generic user names.
According to the name recognition device provided by the embodiment of the application, in the process of communication of a plurality of users, the first acquisition unit 300 acquires the interactive text in the process of communication; the second obtaining unit 301 obtains a designation database, where the designation database stores user names of at least one user group; the recognition unit 302 recognizes the user names in the interactive text based on the designation database in which a plurality of common user names are stored and the common designation database. The method and the device achieve the aim of accurately identifying the user name, thereby realizing the technical effect of improving the accuracy of identifying the specific name in the real-time communication process, and further solving the technical problem of lower accuracy of identifying the name due to poor processing effect on the specific name in the real-time communication process.
Optionally, in the identification device for a name provided in the embodiment of the present application, the identification unit 302 includes: the processing module is used for carrying out word segmentation on the interactive text according to the call database and the universal call database to obtain a word segmentation result, wherein the word segmentation result comprises at least one candidate user name; and the identification module is used for identifying the user name in the interactive text from at least one candidate user name by combining the context information of the interactive text.
Optionally, in the identification device for a name provided in the embodiment of the present application, the identification module includes: the first processing sub-module is used for performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the universal call database to obtain a first word segmentation result set; the second processing sub-module is used for carrying out word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the call database to obtain a second word segmentation result set; and the determining submodule is used for selecting the word segmentation result with the highest probability from the first word segmentation result set and the second word segmentation result set by adopting the word segmentation model as the word segmentation result of the interactive text.
Optionally, in the identification device for a name provided in the embodiment of the present application, the identification module includes: and the identification sub-module is used for identifying at least one candidate user name by adopting a classification model combined with the context characteristics so as to determine whether the at least one candidate user name is the user name in the interactive text.
Optionally, in the identification device for a name provided in the embodiment of the present application, the classification model includes a convolutional neural network model.
Optionally, in the identification device for a name provided in the embodiment of the present application, the device further includes: the first translation unit is used for translating the user names in the identified interactive text by adopting a user name rule after the user names in the interactive text are identified, so as to obtain a first translation result; the second translation unit is used for translating texts except the recognized user names in the interactive texts by adopting a machine translation model to obtain a second translation result; and the determining unit is used for obtaining the translation result of the interactive text in the communication process according to the first translation result and the second translation result.
Optionally, in the identification device for a name provided in the embodiment of the present application, the second obtaining unit 301 includes: the first acquisition module is used for acquiring a call database corresponding to the user group under the condition that a plurality of users belong to the same user group; the second acquisition module is used for acquiring a call database corresponding to each user group under the condition that the plurality of users belong to a plurality of user groups, obtaining a plurality of call databases, and taking the plurality of call databases as call databases of at least one user group.
Here, it should be noted that the first acquiring unit 300, the second acquiring unit 301, and the identifying unit 302 correspond to steps S201 to S203 in embodiment 1, and the two modules are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the first embodiment. It should be noted that the above-described module may be operated as a part of the apparatus in the computer terminal 10 provided in the first embodiment.
Example 3
Embodiments of the present invention may provide a computer terminal, which may be any one of a group of computer terminals. Alternatively, in the present embodiment, the above-described computer terminal may be replaced with a terminal device such as a mobile terminal.
Alternatively, in this embodiment, the above-mentioned computer terminal may be located in at least one network device among a plurality of network devices of the computer network.
In this embodiment, the computer terminal may execute the program code of the following steps in the identification method of the name of the application program: in the process of communication of a plurality of users, acquiring interactive text in the process of communication; acquiring a call database, wherein the call database stores the user name of at least one user group; user names in the interactive text are identified based on a generic call database and a generic call database, wherein the generic call database stores a plurality of generic user names.
The above-mentioned computer terminal may further execute program code of the following steps in the identification method of the name of the application program: based on the call database and the universal call database, identifying the user name in the interactive text includes: based on the call database and the universal call database, performing word segmentation on the interactive text to obtain a word segmentation result, wherein the word segmentation result comprises at least one candidate user name; user names in the interactive text are identified from the at least one candidate user name in combination with the context information of the interactive text.
The above-mentioned computer terminal may further execute program code of the following steps in the identification method of the name of the application program: based on the call database and the universal call database, performing word segmentation processing on the interactive text, and obtaining word segmentation results comprises the following steps: performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to a universal call database to obtain a first word segmentation result set; performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the call database to obtain a second word segmentation result set; and selecting the word segmentation result with the highest probability from the first word segmentation result set and the second word segmentation result set by adopting the word segmentation model as the word segmentation result of the interactive text.
The above-mentioned computer terminal may further execute program code of the following steps in the identification method of the name of the application program: identifying a user name in the interactive text from the at least one candidate user name in combination with the context information of the interactive text comprises: at least one candidate user name is identified using a classification model that incorporates the contextual characteristics to determine whether the at least one candidate user name is a user name in the interaction text.
The above-mentioned computer terminal may further execute program code of the following steps in the identification method of the name of the application program: the classification model includes a convolutional neural network model.
The above-mentioned computer terminal may further execute program code of the following steps in the identification method of the name of the application program: after identifying the user name in the interactive text, the method further comprises: translating the user name in the identified interactive text by adopting a user name rule to obtain a first translation result; translating texts except the recognized user names in the interactive text by adopting a machine translation model to obtain a second translation result; and obtaining a translation result of the interactive text in the communication process based on the first translation result and the second translation result.
The above-mentioned computer terminal may further execute program code of the following steps in the identification method of the name of the application program: acquiring a title database comprises: if a plurality of users belong to the same user group, acquiring a call database corresponding to the user group; if a plurality of users belong to a plurality of user groups, acquiring a call database corresponding to each user group, acquiring a plurality of call databases, and taking the call databases as call databases of at least one user group.
Alternatively, fig. 4 is a block diagram of a computer terminal according to an embodiment of the present invention. As shown in fig. 4, the computer terminal a may include: one or more (only one shown) processors, memory.
The memory may be used to store software programs and modules, such as program instructions/modules corresponding to the name recognition method and apparatus in the embodiments of the present invention, and the processor executes the software programs and modules stored in the memory, thereby performing various functional applications and data processing, that is, implementing the recognition of the names. The memory may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located with respect to the processor, which may be connected to terminal a through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The processor may call the information and the application program stored in the memory through the transmission device to perform the following steps: in the process of communication of a plurality of users, acquiring interactive text in the process of communication; acquiring a call database, wherein the call database stores the user name of at least one user group; user names in the interactive text are identified based on a generic call database and a generic call database, wherein the generic call database stores a plurality of generic user names.
The storage medium is further configured to store program code for performing the steps of, based on the call database and the universal call database, identifying a user name in the interactive text comprising: based on the call database and the universal call database, performing word segmentation on the interactive text to obtain a word segmentation result, wherein the word segmentation result comprises at least one candidate user name; user names in the interactive text are identified from the at least one candidate user name in combination with the context information of the interactive text.
The storage medium is further configured to store program code for performing a word segmentation process on the interactive text based on the title database and the universal title database, the obtaining a word segmentation result comprising: performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to a universal call database to obtain a first word segmentation result set; performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the call database to obtain a second word segmentation result set; and selecting the word segmentation result with the highest probability from the first word segmentation result set and the second word segmentation result set by adopting the word segmentation model as the word segmentation result of the interactive text.
The storage medium is further arranged to store program code for performing the steps of identifying a user name in the interactive text from at least one candidate user name in combination with context information of the interactive text, comprising: at least one candidate user name is identified using a classification model that incorporates the contextual characteristics to determine whether the at least one candidate user name is a user name in the interaction text.
The storage medium is further arranged to store a program code classification model for performing the steps comprising a convolutional neural network model.
The storage medium is further arranged to store program code for performing the steps of, after identifying the user name in the interactive text, the method further comprising: translating the user name in the identified interactive text by adopting a user name rule to obtain a first translation result; translating texts except the recognized user names in the interactive text by adopting a machine translation model to obtain a second translation result; and obtaining a translation result of the interactive text in the communication process based on the first translation result and the second translation result.
The storage medium is further configured to store a program code acquisition designation database for performing the steps of: if a plurality of users belong to the same user group, acquiring a call database corresponding to the user group; if a plurality of users belong to a plurality of user groups, acquiring a call database corresponding to each user group, acquiring a plurality of call databases, and taking the call databases as call databases of at least one user group.
The embodiment of the invention provides a scheme of a name recognition method. The method comprises the steps that through a mode of combining a call database and a universal call database, interactive text in the communication process is obtained in the process of communication of a plurality of users; acquiring a call database, wherein the call database stores the user name of at least one user group; based on the call database and the universal call database, the user names in the interactive text are identified, wherein the universal call database stores a plurality of universal user names, so that the purpose of accurately identifying the user names is achieved, the technical effect of improving the accuracy of identifying the specific names in the real-time communication process is achieved, and the technical problem of lower accuracy of identifying the names due to poor processing effect on the specific names in the real-time communication is solved.
It will be appreciated by those skilled in the art that the configuration shown in fig. 4 is only illustrative, and the computer terminal may be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, a palm-phone computer, a mobile internet device (Mobile Internet Devices, MID), a PAD, etc. Fig. 4 is not limited to the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 4, or have a different configuration than shown in FIG. 4.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program for instructing a terminal device to execute in association with hardware, the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.
Example 4
The embodiment of the invention also provides a storage medium. Alternatively, in this embodiment, the storage medium may be used to store program code executed by the name recognition method provided in the first embodiment.
Alternatively, in this embodiment, the storage medium may be located in any one of the computer terminals in the computer terminal group in the computer network, or in any one of the mobile terminals in the mobile terminal group.
Alternatively, in the present embodiment, the storage medium is configured to store program code for performing the steps of: in the process of communication of a plurality of users, acquiring interactive text in the process of communication; acquiring a call database, wherein the call database stores the user name of at least one user group; user names in the interactive text are identified based on a generic call database and a generic call database, wherein the generic call database stores a plurality of generic user names.
The storage medium is further configured to store program code for performing the steps of, based on the call database and the universal call database, identifying a user name in the interactive text comprising: based on the call database and the universal call database, performing word segmentation on the interactive text to obtain a word segmentation result, wherein the word segmentation result comprises at least one candidate user name; user names in the interactive text are identified from the at least one candidate user name in combination with the context information of the interactive text.
The storage medium is further configured to store program code for performing a word segmentation process on the interactive text based on the title database and the universal title database, the obtaining a word segmentation result comprising: performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to a universal call database to obtain a first word segmentation result set; performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the call database to obtain a second word segmentation result set; and selecting the word segmentation result with the highest probability from the first word segmentation result set and the second word segmentation result set by adopting the word segmentation model as the word segmentation result of the interactive text.
The storage medium is further arranged to store program code for performing the steps of identifying a user name in the interactive text from at least one candidate user name in combination with context information of the interactive text, comprising: at least one candidate user name is identified using a classification model that incorporates the contextual characteristics to determine whether the at least one candidate user name is a user name in the interaction text.
The storage medium is further arranged to store a program code classification model for performing the steps comprising a convolutional neural network model.
The storage medium is further arranged to store program code for performing the steps of, after identifying the user name in the interactive text, the method further comprising: translating the user name in the identified interactive text by adopting a user name rule to obtain a first translation result; translating texts except the recognized user names in the interactive text by adopting a machine translation model to obtain a second translation result; and obtaining a translation result of the interactive text in the communication process based on the first translation result and the second translation result.
The storage medium is further configured to store a program code acquisition designation database for performing the steps of: if a plurality of users belong to the same user group, acquiring a call database corresponding to the user group; if a plurality of users belong to a plurality of user groups, acquiring a call database corresponding to each user group, acquiring a plurality of call databases, and taking the call databases as call databases of at least one user group.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed technology content may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims (14)

1. A method for identifying a name, comprising:
in the process of communication of a plurality of users, acquiring interactive text in the process of communication;
acquiring a call database, wherein the call database stores the user name of at least one user group;
identifying user names in the interactive text based on a universal call database and the call database, wherein a plurality of universal user names are stored in the universal call database;
wherein obtaining the call database comprises:
if the plurality of users belong to the same user group, acquiring a call database corresponding to the user group;
if the plurality of users belong to a plurality of user groups, acquiring a call database corresponding to each user group, acquiring a plurality of call databases, and taking the call databases as call databases of at least one user group.
2. The method of claim 1, wherein identifying the user name in the interactive text based on a universal title database and the title database comprises:
based on a universal call database and the call database, performing word segmentation on the interactive text to obtain a word segmentation result, wherein the word segmentation result comprises at least one candidate user name;
And identifying the user name in the interactive text from the at least one candidate user name by combining the context information of the interactive text.
3. The method for recognizing names according to claim 2, wherein performing word segmentation processing on the interactive text based on a common title database and the title database to obtain a word segmentation result comprises:
performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the universal call database to obtain a first word segmentation result set;
performing word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the call database to obtain a second word segmentation result set;
and selecting a word segmentation result with highest probability from the first word segmentation result set and the second word segmentation result set by adopting a word segmentation model as a word segmentation result of the interactive text.
4. The method of claim 2, wherein identifying the user name in the interactive text from the at least one candidate user name in combination with the contextual information of the interactive text comprises:
and identifying the at least one candidate user name by adopting a classification model combined with the context characteristics so as to determine whether the at least one candidate user name is the user name in the interactive text.
5. The method of claim 4, wherein the classification model comprises a convolutional neural network model.
6. The method of identifying a name according to claim 1, further comprising:
after the user names in the interactive text are identified, translating the identified user names in the interactive text by adopting a user name rule to obtain a first translation result;
translating texts except the recognized user names in the interactive text by adopting a machine translation model to obtain a second translation result;
and obtaining a translation result of the interactive text in the communication process based on the first translation result and the second translation result.
7. A name recognition apparatus, comprising:
the first acquisition unit is used for acquiring interactive texts in the communication process of a plurality of users;
a second obtaining unit, configured to obtain a name database, where the name database stores a user name of at least one user group;
the identification unit is used for identifying the user names in the interactive text according to a universal call database and the call database, wherein the universal call database stores a plurality of universal user names;
Wherein the second acquisition unit includes:
the first acquisition module is used for acquiring a call database corresponding to the user group under the condition that the plurality of users belong to the same user group;
the second obtaining module is configured to obtain a call database corresponding to each user group under the condition that the plurality of users belong to a plurality of user groups, obtain a plurality of call databases, and use the plurality of call databases as call databases of the at least one user group.
8. The apparatus for recognizing a name according to claim 7, wherein the recognition unit includes:
the processing module is used for carrying out word segmentation on the interactive text according to the title database and the universal title database to obtain a word segmentation result, wherein the word segmentation result comprises at least one candidate user name;
and the identification module is used for identifying the user name in the interactive text from the at least one candidate user name in combination with the context information of the interactive text.
9. The name recognition device of claim 8, wherein the recognition module comprises:
the first processing sub-module is used for performing word segmentation on the interactive text by adopting a forward maximum matching algorithm according to the universal call database to obtain a first word segmentation result set;
The second processing sub-module is used for carrying out word segmentation processing on the interactive text by adopting a forward maximum matching algorithm according to the call database to obtain a second word segmentation result set;
and the determining submodule is used for selecting the word segmentation result with the highest probability from the first word segmentation result set and the second word segmentation result set by adopting a word segmentation model as the word segmentation result of the interactive text.
10. The name recognition device of claim 8, wherein the recognition module comprises:
and the identification sub-module is used for identifying the at least one candidate user name by adopting a classification model combined with the context characteristics so as to determine whether the at least one candidate user name is the user name in the interactive text.
11. The apparatus for identifying names according to claim 10, wherein the classification model comprises a convolutional neural network model.
12. The apparatus for identifying a name according to claim 7, wherein the apparatus further comprises:
the first translation unit is used for translating the user names in the identified interactive text by adopting a user name rule after the user names in the interactive text are identified, so as to obtain a first translation result;
The second translation unit is used for translating texts except the recognized user names in the interactive text by adopting a machine translation model to obtain a second translation result;
and the determining unit is used for obtaining the translation result of the interactive text in the communication process according to the first translation result and the second translation result.
13. A storage medium comprising a stored program, wherein the program, when run, controls a device in which the storage medium is located to perform the method of identifying a name according to any one of claims 1 to 6.
14. A processor, characterized in that the processor is adapted to run a program, wherein the program, when run, performs the method of identifying names according to any one of claims 1 to 6.
CN201910002379.2A 2019-01-02 2019-01-02 Name identification method and device, storage medium and processor Active CN111401083B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910002379.2A CN111401083B (en) 2019-01-02 2019-01-02 Name identification method and device, storage medium and processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910002379.2A CN111401083B (en) 2019-01-02 2019-01-02 Name identification method and device, storage medium and processor

Publications (2)

Publication Number Publication Date
CN111401083A CN111401083A (en) 2020-07-10
CN111401083B true CN111401083B (en) 2023-05-02

Family

ID=71430188

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910002379.2A Active CN111401083B (en) 2019-01-02 2019-01-02 Name identification method and device, storage medium and processor

Country Status (1)

Country Link
CN (1) CN111401083B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111859970B (en) * 2020-07-23 2022-05-17 北京字节跳动网络技术有限公司 Method, apparatus, device and medium for processing information

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101093478A (en) * 2007-07-25 2007-12-26 中国科学院计算技术研究所 Method and system for identifying Chinese full name based on Chinese shortened form of entity
CN101101599A (en) * 2007-06-20 2008-01-09 精实万维软件(北京)有限公司 Method for extracting advertisement main information from web page
CN101727441A (en) * 2009-12-25 2010-06-09 北京工业大学 Evaluating method and evaluating system targeting Chinese name identifying system
CN103514165A (en) * 2012-06-15 2014-01-15 佳能株式会社 Method and device for identifying persons mentioned in conversation
CN103544139A (en) * 2012-07-13 2014-01-29 江苏新瑞峰信息科技有限公司 Forward word segmentation method and device based on Chinese retrieval
WO2014172428A2 (en) * 2013-04-19 2014-10-23 Google Inc. Name recognition
CN105320645A (en) * 2015-09-24 2016-02-10 天津海量信息技术有限公司 Recognition method for Chinese company name
CN105630765A (en) * 2015-12-21 2016-06-01 浙江万里学院 Place name address identifying method
CN105868184A (en) * 2016-05-10 2016-08-17 大连理工大学 Chinese name recognition method based on recurrent neural network
CN106055560A (en) * 2016-05-18 2016-10-26 上海申腾信息技术有限公司 Method for collecting data of word segmentation dictionary based on statistical machine learning method
CN107844477A (en) * 2017-10-25 2018-03-27 西安影视数据评估中心有限公司 A kind of extracting method and device of this person names of movie and television play
CN108255806A (en) * 2017-12-22 2018-07-06 北京奇艺世纪科技有限公司 A kind of name recognition methods and device
CN108491373A (en) * 2018-02-01 2018-09-04 北京百度网讯科技有限公司 A kind of entity recognition method and system
CN108595435A (en) * 2018-05-03 2018-09-28 鹏元征信有限公司 A kind of organization names identifying processing method, intelligent terminal and storage medium
CN108694168A (en) * 2018-05-11 2018-10-23 深圳云之家网络有限公司 A kind of address processing method and processing device, computer installation and readable storage medium storing program for executing

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7552053B2 (en) * 2005-08-22 2009-06-23 International Business Machines Corporation Techniques for aiding speech-to-speech translation
US7925507B2 (en) * 2006-07-07 2011-04-12 Robert Bosch Corporation Method and apparatus for recognizing large list of proper names in spoken dialog systems

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101101599A (en) * 2007-06-20 2008-01-09 精实万维软件(北京)有限公司 Method for extracting advertisement main information from web page
CN101093478A (en) * 2007-07-25 2007-12-26 中国科学院计算技术研究所 Method and system for identifying Chinese full name based on Chinese shortened form of entity
CN101727441A (en) * 2009-12-25 2010-06-09 北京工业大学 Evaluating method and evaluating system targeting Chinese name identifying system
CN103514165A (en) * 2012-06-15 2014-01-15 佳能株式会社 Method and device for identifying persons mentioned in conversation
CN103544139A (en) * 2012-07-13 2014-01-29 江苏新瑞峰信息科技有限公司 Forward word segmentation method and device based on Chinese retrieval
WO2014172428A2 (en) * 2013-04-19 2014-10-23 Google Inc. Name recognition
CN105320645A (en) * 2015-09-24 2016-02-10 天津海量信息技术有限公司 Recognition method for Chinese company name
CN105630765A (en) * 2015-12-21 2016-06-01 浙江万里学院 Place name address identifying method
CN105868184A (en) * 2016-05-10 2016-08-17 大连理工大学 Chinese name recognition method based on recurrent neural network
CN106055560A (en) * 2016-05-18 2016-10-26 上海申腾信息技术有限公司 Method for collecting data of word segmentation dictionary based on statistical machine learning method
CN107844477A (en) * 2017-10-25 2018-03-27 西安影视数据评估中心有限公司 A kind of extracting method and device of this person names of movie and television play
CN108255806A (en) * 2017-12-22 2018-07-06 北京奇艺世纪科技有限公司 A kind of name recognition methods and device
CN108491373A (en) * 2018-02-01 2018-09-04 北京百度网讯科技有限公司 A kind of entity recognition method and system
CN108595435A (en) * 2018-05-03 2018-09-28 鹏元征信有限公司 A kind of organization names identifying processing method, intelligent terminal and storage medium
CN108694168A (en) * 2018-05-11 2018-10-23 深圳云之家网络有限公司 A kind of address processing method and processing device, computer installation and readable storage medium storing program for executing

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张小衡,王玲玲.中文机构名称的识别与分析.中文信息学报.1997,(04),全文. *
贾品贵 ; 杨一平 ; 卢朋 ; .基于统计方法的中文姓名识别研究.计算机工程与应用.2006,(31),全文. *

Also Published As

Publication number Publication date
CN111401083A (en) 2020-07-10

Similar Documents

Publication Publication Date Title
CN111523306A (en) Text error correction method, device and system
CN111310440B (en) Text error correction method, device and system
CN114757176A (en) Method for obtaining target intention recognition model and intention recognition method
CN110348012B (en) Method, device, storage medium and electronic device for determining target character
CN106649294A (en) Training of classification models and method and device for recognizing subordinate clauses of classification models
CN110209781A (en) A kind of text handling method, device and relevant device
KR20150117914A (en) Language learning system by a plurality of Users
CN115858741A (en) Intelligent question answering method and device suitable for multiple scenes and storage medium
CN111401083B (en) Name identification method and device, storage medium and processor
CN113705164A (en) Text processing method and device, computer equipment and readable storage medium
CN110929519B (en) Entity attribute extraction method and device
CN110134920A (en) Draw the compatible display methods of text, device, terminal and computer readable storage medium
CN111291561B (en) Text recognition method, device and system
Tannert et al. FlowchartQA: the first large-scale benchmark for reasoning over flowcharts
CN116701604A (en) Question and answer corpus construction method and device, question and answer method, equipment and medium
CN114372383B (en) Scene fast switching method and system based on VR simulation scene
CN112084766A (en) Text processing method and device, storage medium and processor
CN110232328A (en) A kind of reference report analytic method, device and computer readable storage medium
CN106855854A (en) A kind of recognition methods of english information and device
CN111860526B (en) Image-based question judging method and device, electronic equipment and computer storage medium
CN111898387B (en) Translation method and device, storage medium and computer equipment
CN110956034B (en) Word acquisition method and device and commodity search method
CN110929504B (en) Statement diagnosis method, device and system
CN113705194A (en) Extraction method and electronic equipment for short
CN111428005A (en) Standard question and answer pair determining method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant