CN113807082B - Target user determining method and device for determining target user - Google Patents

Target user determining method and device for determining target user Download PDF

Info

Publication number
CN113807082B
CN113807082B CN202010544461.0A CN202010544461A CN113807082B CN 113807082 B CN113807082 B CN 113807082B CN 202010544461 A CN202010544461 A CN 202010544461A CN 113807082 B CN113807082 B CN 113807082B
Authority
CN
China
Prior art keywords
corpus
determining
user
rule expression
expression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010544461.0A
Other languages
Chinese (zh)
Other versions
CN113807082A (en
Inventor
张小川
孙琨
李洋
谢本银
居梦月
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN202010544461.0A priority Critical patent/CN113807082B/en
Publication of CN113807082A publication Critical patent/CN113807082A/en
Application granted granted Critical
Publication of CN113807082B publication Critical patent/CN113807082B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the application discloses a target user determining method, a target user determining device and a target user determining device. An embodiment of the method comprises: receiving a user-defined rule expression which is written by a user-defined grammar and is used for corpus matching; acquiring corpus generated by candidate users; and matching the corpus with the rule expression, and determining target users in the candidate users based on a matching result. The embodiment provides a mode for determining the target user based on the matching condition of the rule expression and the user corpus, improves the selection range of the target user, and reduces the labor cost in the process of determining the target user.

Description

Target user determining method and device for determining target user
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to a target user determining method and device and a device for determining a target user.
Background
In many scenarios, it is necessary to actively recommend functions, services, information, etc. to users satisfying certain conditions, or to acquire relevant information (such as corpus, etc.) of users satisfying certain conditions for analysis, model training, etc. In performing these operations, it is necessary to first determine the target user.
In the prior art, a target user can be determined by manually setting user tags for users and querying users with certain specific user tags by setting certain query conditions. However, for some users who have not set the user tag, it cannot be determined whether the user tag is the target user, which results in a smaller selection range of the target user. Meanwhile, when the user labels are set manually in the mode, professional codes are required to be written for each label to detect the information of the user so as to judge whether each user meets the label, and therefore labor cost is high.
Disclosure of Invention
The embodiment of the application provides a target user determining method, a target user determining device and a target user determining device, which are used for improving the selection range of a target user and simultaneously reducing the labor cost in the process of determining the target user.
In a first aspect, an embodiment of the present application provides a method for determining a target user, where the method includes: receiving a user-defined rule expression which is written by a user-defined grammar and is used for corpus matching; acquiring corpus generated by candidate users; and matching the corpus with the rule expression, and determining target users in the candidate users based on the matching result.
In a second aspect, an embodiment of the present application provides a target user determining apparatus, including: the receiving unit is configured to receive a user-defined rule expression which is written by a user-defined grammar and is used for corpus matching; an acquisition unit configured to acquire corpus generated by candidate users; and a determining unit configured to match the corpus with the rule expression, and determine a target user among the candidate users based on a matching result.
In a third aspect, embodiments of the present application provide an apparatus for determining a target user, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for: receiving a user-defined rule expression which is written by a user-defined grammar and is used for corpus matching; acquiring corpus generated by candidate users; and matching the corpus with the rule expression, and determining target users in the candidate users based on the matching result.
In a fourth aspect, embodiments of the present application provide a computer readable medium having stored thereon a computer program which, when executed by a processor, implements a method as described in the first aspect above.
The target user determining method, the target user determining device and the target user determining device provided by the embodiment of the application are used for receiving the user-defined rule expression, then obtaining the corpus generated by the candidate users, and then matching the corpus with the rule expression, so that the target users in the candidate users are determined based on the matching result, wherein the rule expression is written by adopting a user-defined grammar and is used for corpus matching. Since the rule expression can screen out some corpora meeting specific conditions, the user corpora can reflect the types, favorites and the like of the users, and therefore the target users can be effectively selected in a mode that the rule expression is matched with the user corpora. The process can determine the target user without the user tag, and improves the selection range of the target user. Meanwhile, through setting the rule expression, the user screening can be performed without writing professional codes, and the labor cost in the process of determining the target user is reduced.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the accompanying drawings in which:
FIG. 1 is a flow chart of one embodiment of a method of targeting users in accordance with the present application;
FIG. 2 is a flow chart of yet another embodiment of a method of targeting users in accordance with the present application;
FIG. 3 is a schematic diagram of an embodiment of a targeting user determination device in accordance with the present application;
FIG. 4 is a schematic diagram of an apparatus for determining a target user according to the present application;
fig. 5 is a schematic diagram of a server in some embodiments according to the application.
Detailed Description
The application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be noted that, for convenience of description, only the portions related to the present application are shown in the drawings.
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.
Referring to FIG. 1, a flow 100 of one embodiment of a method of targeting users in accordance with the present application is shown. The above-described target user determination method may be operable on a variety of electronic devices including, but not limited to: servers, smartphones, tablets, laptop portable computers, desktop computers, etc.
Various client applications, such as input method applications, instant messaging applications, social applications and the like, can be installed in the electronic equipment. The input method application mentioned in the embodiment of the application is software for realizing text input. May also be referred to as an input method editor (Input Method Editor), input method software, input method platform, input method framework, input method system, or the like. The user may conveniently enter the desired character or character string into the electronic device using an input method application. The input method application can support a plurality of input methods and input modes. The input method is an encoding method used for inputting various symbols into electronic devices such as a computer and a mobile phone. For example, in addition to supporting a common chinese input method (such as pinyin input method, wubi input method, zhuyin input method, phonetic input method, handwriting input method, etc.), other language input methods (such as english input method, japanese hiragana input method, korean input method, etc.) may be supported. The input means may include, but is not limited to, an encoding input means, a voice input means, and the like. The language type and input mode of the input method are not limited at all.
The target user determining method in this embodiment may include the following steps:
step 101, receiving a user-defined rule expression.
In this embodiment, the execution subject of the target user determination method (e.g., the electronic device described above) may receive a user-defined rule expression. The rule expression can be written in custom grammar and can be used for corpus matching. The execution logic of the rule expression can be implemented by a common computer programming language such as Java, C or C++.
In this embodiment, the rule expression may characterize the screening conditions, and different rule expressions may characterize different screening conditions. The process of matching the rule expression with a certain corpus is a process of detecting whether the corpus meets the screening condition represented by the rule expression. Because the corpus is generated by the user, the rule expression is matched with the corpus, and the target user meeting certain conditions can be screened out. The grammar of the rule expression can be set as required, and the embodiment is not limited to the grammar.
In some alternative implementations of the present embodiment, the rule expression may include at least one sub-rule expression. Different ones of the above sub-regular expressions may be separated by logical symbols, different logical symbols indicating different logic. Wherein, the logical symbols may include, but are not limited to "&", "| -! "," (",") ", and the like. Where "&" represents a logical relationship and "|" represents a logical relationship or, "+|! "indicates a logical relationship not," ("and") "is used in pairs for prioritizing the sub-rule expressions.
As one example, the rule expression is: (sub-rule expression 1) & (sub-rule expression 2) | (sub-rule expression 3), two types of users are selected, one type of users is the users satisfying the sub-rule expression 1 and satisfying the sub-rule expression 2, and the other type of users is the users satisfying the sub-rule expression 3.
As yet another example, the rule expression is: (sub-rule expression 1) & ((sub-rule expression 2) | (sub-rule expression 3)), then a user who screened out at least one of satisfying sub-rule expression 1, and satisfying both sub-rule expression 2 and sub-rule expression 3 is represented.
In some alternative implementations of the present embodiment, the sub-regular expressions may include a context rule expression and a vocabulary rule expression. The scene rule expressions and the vocabulary rule expressions are separated by a preset symbol. The predetermined symbol may be a colon ": "etc., and which is different from the logical symbols described above.
As one example, the sub-regular expression is: scene rule expression 1: the vocabulary rule expression 1 indicates that the user who generates the corpus satisfying the vocabulary rule expression 1 in the scene indicated by the scene rule expression 1 is screened out.
As yet another example, when a plurality of sub-rule expressions are included in a rule expression, the rule expression is: (scene rule expression 1: vocabulary rule expression 1) & (scene rule expression 2: vocabulary rule expression 2), then the user who screened out the condition that simultaneously satisfied: generating a corpus satisfying the vocabulary rule expression 1 in the scene indicated by the scene rule expression 1 and generating a corpus satisfying the vocabulary rule expression 2 in the scene indicated by the scene rule expression 2.
It should be noted that both the scene rule expression and the vocabulary rule expression may be null. The space-time representation is not limiting. If the vocabulary rule expression is null and the scene rule expression is not null, the method indicates that the users which generate corpus under the scene meeting the instruction of the scene rule expression are screened out.
In some optional implementations of the present embodiment, the scene rule expression, when not empty, may include at least one of: at least one scene identifier, at least one scene packet identifier.
The scene identifier may be used to indicate a scene generated by a corpus, for example, the corpus generated in different applications corresponds to different scenes. For each scene identifier, such as "app1", it is indicated whether there is a sentence corpus, which appears in the app1 environment. The app1 environment described above may indicate a certain application environment. The scene packet identity may indicate a scene packet, which may include one or more scene identities. For each scene packet identification, such as "{ appBag }" indicates that it occurred under at least one scene defined by appBag.
Where the scene rule expression includes at least two scene identifications, the different scene identifications are separated, represented, or otherwise related by a separator symbol (e.g., comma ","). As an example, the scene rule expression is: app1, app2. It is indicated whether there is a sentence corpus that appears in the environment of app1 or app2. Similarly, when a scene rule expression includes at least two scene packet identifications, the different scene packet identifications are separated, represented, or otherwise related by a separation symbol (e.g., comma ",").
It will be appreciated that where the context rule expression includes both at least one context identifier and at least one context package identifier, the relationships may also be represented or represented by separate symbols (e.g., commas ",").
In some alternative implementations of the present embodiment, the lexical rule expression may include at least one of: at least one vocabulary, at least one bag of words identifier, at least one regular expression. The term package indicated by the term package identifier may include one or more terms. The following description is made in terms of different cases, such as vocabulary, word package, regular expression, etc., respectively:
when the vocabulary rule expression is a vocabulary, such as word1, it indicates whether a sentence corpus exists, and the sentence corpus contains the vocabulary word1.
When the lexical rule expression includes at least two lexicons, different lexicons may be separated by a target symbol, the different target symbols being used to indicate different logical relationships. For example, the target symbols may include a plus sign "+", a minus sign "-", a split symbol (e.g., comma ","). Where "+" denotes a logical relationship and "-" denotes a logical relationship not, and a split symbol (e.g., comma ",") denotes a logical relationship or.
For example, the vocabulary rule expression is word2+word3-word4, which means that whether a sentence corpus exists is judged, and word2 and word3 simultaneously appear in the sentence corpus, but word4 is not included.
For another example, word1 and word2 represent word rule expressions, which indicate whether a sentence corpus exists, and at least one of word1 and word2 appears in the sentence corpus at the same time.
Comprehensive scene rule expressions are exemplified when the rule expression or sub-rule expression is app1, app2: when word1, word2 represents screening out users with word1 or word2 in the corpus under the scene app1 or app 2.
When the lexical rule expression includes a word package, it may be represented by the form { s } { min-max }. When the user is screened, min represents at least the minimum frequency of the vocabulary in the word package in the corpus of the user, and the user can defaults to 1 when min is empty. max represents the maximum frequency of occurrence of vocabulary in the word package at least in the corpus of the user. When max is empty, it defaults to 0, indicating that there is no upper limit. s is used to indicate a bundle of words, and may have two expressions. In one expression, s is a term package identifier (e.g., a name), e.g., { A1} {2- }, which may represent a user whose selected corpus includes at least 2 terms in term package A1. In another expression, s is a series of word package identifiers, such as { word2, word3} {0-5}, separated by segmentation symbols (e.g., comma ","), representing users who filtered out words in the corpus that do not include more than 5 times { word2, word3 }.
The comprehensive scene rule expression is exemplified by when the rule expression or sub-rule expression is app1: { word2, word3} {2- }, { word4, word5} {0-5}, means that the total frequency of words included in { word2, word3} in the corpus generated in the app1 scene is selected at least 2 times, or the total frequency of words included in { word4, word5} in the corpus generated in the app1 scene is selected not more than 5 times.
When the lexical rule expression includes a regular expression, a certain preset symbol (e.g., "/") may be set before and after the regular expression. For example: app 1:/i. Buy car/, which means screening out users who input "i. Buy car" meeting the regular condition in app1 scenario. Where "i' m. buy" means that the match starts with "i" and ends with "buy" the longest string. For example, if the corpus is "i do not want to buy a car," you will match the string "i do not want to buy a car".
It will be appreciated that the rule expressions, sub-rule expressions, logical symbols representing logical relationships of different sub-rule expressions, scene rule expressions, vocabulary rule expressions, and forms, relationships, etc. of the contents in the expressions in the above alternative implementations may all be implemented in other ways, which are not limited to the above description and examples, and the present embodiment is not limited thereto.
After customizing the grammar of the rule expression, a user (such as a technician, developer, etc.) can flexibly customize the required rule expression through the grammar to determine the target user meeting the required condition. Therefore, the target users can be flexibly and conveniently searched and screened, code writing is not needed, and labor cost and learning cost can be greatly reduced.
Step 102, obtaining corpus generated by candidate users.
In this embodiment, the execution body may acquire corpus generated by the candidate user. Candidate users herein may be some or all of the users in the system or platform that are in relationship. The corpus generated by the candidate user may include, but is not limited to, historical sentences entered, transmitted, or on-screen by the candidate user.
And step 103, matching the corpus with the rule expression, and determining target users in the candidate users based on the matching result.
In this embodiment, since the rule expression may represent the screening condition, the process of matching the rule expression with a certain corpus is a process of detecting whether the corpus satisfies the screening condition represented by the rule expression. Because the corpus is generated by the user, the rule expression is matched with the corpus, and the target user meeting certain conditions can be screened out.
In practice, for each candidate user, when matching the corpus of the candidate user with the rule expression, an accurate matching mode may be adopted, or a fuzzy matching mode may be adopted, which is not limited in this embodiment. Wherein, exact matching may refer to detecting whether the corpus satisfies the screening condition indicated by the rule expression. Fuzzy matching may refer to expanding a corpus first and then detecting whether the expanded corpus satisfies a screening condition indicated by a rule expression. If the corpus of the candidate user or the expanded corpus meets the screening condition indicated by the rule expression, the candidate user can be considered as the target user. Otherwise, the candidate user may be considered not the target user.
In some optional implementations of this embodiment, the executing entity may further obtain a target tag associated with the rule expression, where the target tag may be customized by a user sending the rule expression. After determining the target users, the target labels may be added to each target user. Therefore, the operation of automatically adding the labels to the users based on the corpus of the users can be realized, and compared with the mode of manually analyzing the user data to add the labels to the users, the convenience of the label adding operation and the accuracy of the added labels can be improved.
In some optional implementation manners of this embodiment, after the target user is determined, information such as the number of people, frequency, etc. of the target user may also be counted, and the information may be returned to the user (such as a technician, a developer, etc.) who sends the rule expression, so that the user may perform operations such as data statistics.
According to the method provided by the embodiment of the application, the user-defined rule expression is received, the corpus generated by the candidate user is obtained, and the corpus is matched with the rule expression, so that the target user in the candidate user is determined based on the matching result, wherein the rule expression is written by using a user-defined grammar and is used for corpus matching. Since the rule expression can screen out some corpora meeting specific conditions, the user corpora can reflect the types, favorites and the like of the users, and therefore the target users can be effectively selected in a mode that the rule expression is matched with the user corpora. The process can determine the target user without the user tag, and improves the selection range of the target user. Meanwhile, through setting the rule expression, the user screening can be performed without writing professional codes, and the labor cost in the process of determining the target user is reduced.
With further reference to fig. 2, a flow 200 of yet another embodiment of a target user determination method is shown. The process 200 of the target user determination method includes the steps of:
step 201, receiving a user-defined rule expression.
Step 202, obtaining corpus generated by candidate users.
Step 201 to step 202 in this embodiment can refer to step 101 to step 102 in the corresponding embodiment of fig. 1, and are not described herein.
And 203, matching the corpus with sub-regular expressions in the regular expression, and determining the score of the candidate user based on the matching result and the logic relationship indicated by the logic symbol in the regular expression.
In this embodiment, the rule expression may include at least one sub-rule expression, different ones of the sub-rule expressions being separated by logical symbols, different logical symbols indicating different logical relationships. The execution subject can adopt an accurate matching mode or a fuzzy matching mode, match the corpus with sub-regular expressions in the regular expressions, and determine the scores of candidate users based on the matching result and the logic relationship indicated by the logic symbols in the regular expressions.
In some scenarios, the rule expression contains only one sub-rule expression, such as app1: word1, the corpus of each candidate user can be directly matched with the rule expression. The matching mode can be an accurate matching mode or a fuzzy matching mode. If the corpus of a candidate user matches the rule expression, that is, the corpus of the candidate user includes word1 generated in app1, the score of the candidate user may be set to be a first numerical value (e.g., 1). If there is no match, the score for the candidate user may be set to a second value (e.g., 0).
Note that the score may be set otherwise based on the matching condition. For example, if a corpus is expanded and then can be matched with the rule expression, that is, the corpus can only meet the rule expression by fuzzy matching, but cannot meet the rule expression by exact matching, the score of the user who generates the corpus can be set to a third value (e.g., 0.8).
In other scenarios, a regular expression contains only at least two sub-regular expressions, with different sub-regular expressions being connected by logical symbols. At this time, the corpus of each candidate user may be first matched with the sub-regular expression, and then the score of each candidate user may be determined based on the logical relationship between the sub-regular expressions.
As one example, the rule expression is (app 1: word 1) & (app 2: word 2). The corpus of each candidate user may be matched with each sub-regular expression (app 1: word 1), (app 2: word 2), respectively. For a certain candidate user, if the corpus of the candidate user is matched with both sub-regular expressions (app 1: word 1) and (app 2: word 2), that is, the corpus of the candidate user includes word1 generated in app1 scene and word2 generated in app2 scene, the score of the candidate user may be set to be a first value (e.g., 1). If there is no match with at least one sub-rule expression, the score for the candidate user may be set to a second value (e.g., 0).
Similarly, in this case, the score may be set based on the matching condition. For example, if a corpus is expanded and then each sub-rule expression in (app 1: word 1) & (app 2: word 2) in the above example can be matched, that is, the corpus can only satisfy each sub-rule expression by fuzzy matching, but cannot satisfy each sub-rule expression simultaneously by precise matching, the score of the user who generates the corpus is set to a third value (e.g., 0.8).
In practice, for sub-rule expressions with logical relations (such as "& gt), in order to reduce the calculation amount, after detecting that a certain corpus does not meet the former sub-rule expression, the corpus is not matched with the latter sub-rule expression. Taking the rule expression in the above example as an example, after detecting that a certain corpus does not meet the sub-rule expression (app 1: word 1), it may not be detected whether the corpus matches the sub-rule expression (app 2: word 2) any more, thereby improving the processing efficiency.
As yet another example, the rule expression is (app 1: word 1) | (app 2: word 2). The corpus of each candidate user may be matched with each sub-regular expression (app 1: word 1), (app 2: word 2), respectively. For a certain candidate user, every time the corpus of the candidate user is detected to be matched with a sub-rule expression, the score of the candidate user is increased by a certain numerical value.
Taking the added value as 1 as an example, if word1 generated in the app1 scene is included in the corpus of a certain candidate user and word2 generated in the app2 scene is included, the candidate user is scored as 2. If word1 generated in app1 scene is included in the corpus of a certain candidate user, but word2 generated in app2 scene is not included, the candidate user is classified as 1. If word1 generated in app1 scene is not included in the corpus of a certain candidate user, but word2 generated in app2 scene is included, the candidate user is classified as 1. If word1 generated in app1 scene is not included in the corpus of a certain candidate user and word2 generated in app2 scene is not included, the candidate user is classified as 0.
It should be noted that the added value may be determined based on the matching condition, and is not limited to a fixed value. For example, if a corpus is expanded to match sub-rule expressions (app 1: word 1) in the above example, that is, the corpus can only satisfy each sub-rule expression by fuzzy matching, but cannot satisfy each sub-rule expression simultaneously by precise matching, then the score of the candidate user who generates the corpus can be increased by 0.8. If the corpus can meet another rule expression (app 2: word 2) in an accurate matching manner, the score of the candidate user generating the corpus can be increased by 1 again, and finally the score of the candidate user is 1.8.
The manner of determining the score of the user is not limited to the above description and examples, and various score rules may be set as needed.
In some alternative implementations of the present embodiment, the rule expression may first be converted to a tree structure when determining the score of the candidate user. Wherein leaf nodes of the tree structure are sub-regular expressions in the regular expression, and non-leaf nodes of the tree structure are logical symbols in the regular expression. And then matching the corpus with leaf nodes of the tree structure, and determining the scores of candidate users based on the matching result and the logic relationship indicated by non-leaf nodes of the tree structure. By parsing the regular expressions into a tree structure, it may be convenient to determine the logical relationship and order of the sub-regular expressions.
In some alternative implementations of the present embodiment, the sub-regular expressions may include a context rule expression and a vocabulary rule expression. The scene rule expression and the vocabulary rule expression may be separated by a preset symbol. The lexical rule expression is one comprising at least one of: at least one vocabulary, at least one bag of words identifier, at least one regular expression. The term package indicated by the term package identifier may include one or more terms.
At this time, the execution subject may match the corpus with a sub-regular expression of the regular expressions by the following sub-steps S11 to S14:
and S11, determining the vocabulary related to the vocabulary rule expression in the sub-rule expression as a target vocabulary, and performing word segmentation on the vocabulary to obtain a word segmentation result.
The vocabulary related to the vocabulary rule expression comprises the vocabulary in the vocabulary expression and the vocabulary in the vocabulary package indicated by the vocabulary package identification in the vocabulary expression. Various existing word segmentation modes can be adopted for word segmentation, and word segmentation can be performed based on target words preferentially during word segmentation, so that the completeness and independence of words which are the same as the target words in the corpus are guaranteed. The corpus of different candidate users may correspond to different word segmentation results.
And step S12, detecting whether the word segmentation result contains a target word or not, and generating a first detection result.
The first detection result here may indicate whether the word segmentation result includes the target vocabulary. For example, if the target word is "word1", and the word1 is also included in the word segmentation result of the corpus of a certain candidate user, the first detection result may indicate that the word segmentation result includes the target word.
It should be noted that if there are two or more target words, for example, the word expressions are "word1+word2", "word1, word2" or "word1-word2". The first detection result at this time may include two detection results, which are a detection result indicating whether the word segmentation result includes the target word1 and a detection result indicating whether the word segmentation result includes the target word2, respectively.
And step S13, detecting whether sentences consistent with the syntax indicated by the regular expression are contained in the corpus, and generating a second detection result.
The second detection result herein may indicate whether sentences consistent with the syntax indicated by the regular expression are contained in the corpus. For example, the regular expression is/i.e., buy/, for matching the longest string that starts with "i" and ends with "buy". If a certain corpus is "i did not want to buy a car," you'll you' will match the character string "i did not want to buy a car," then the second detection result may indicate that the corpus contains sentences consistent with the syntax indicated by the regular expression. If a certain corpus is "i like eating fruit", the second detection result may indicate that the middle of the corpus contains sentences consistent with the syntax indicated by the regular expression.
And S14, determining a matching result of the corpus and the sub-regular expression based on the first detection result, the second detection result and the scene generated by the corpus.
As an example, the lexical rule expression is "word1+word2,/me. For a certain corpus, if a first detection result of the corpus indicates that "word1" and "word2" are included in the corpus, and a second detection result of the corpus indicates that sentences consistent with the syntax indicated by regular expressions "/i.
It should be noted that, when the vocabulary and the vocabulary package identifier are not included in the vocabulary rule expression and only the regular expression is included, the above-mentioned sub-step S12 may not be performed, and the matching result with the sub-rule expression may be determined directly based on the second detection result in the sub-step S14. Similarly, when the vocabulary or the vocabulary package identifier is included in the vocabulary rule expression, but the regular expression is not included, the above-described sub-step S13 may not be performed, and the matching result with the sub-rule expression may be determined directly based on the first detection result in the sub-step S14.
In some optional implementations of this embodiment, the executing entity may further match the corpus with sub-regular expressions in the regular expressions by following sub-step S21 to sub-step S24:
And S21, determining the vocabulary related to the vocabulary rule expression in the sub-rule expression as a target vocabulary, and performing word segmentation on the vocabulary to obtain a word segmentation result. The substep S21 can be referred to the above substep S11, and will not be described herein.
And S22, detecting whether the word segmentation result contains similar words of the target word or not based on a pre-trained word similarity calculation model, and generating a third detection result.
The third detection result herein may indicate whether the word segmentation result includes similar words of the target word. The similar vocabulary of the target vocabulary refers to the vocabulary with the similarity larger than a certain preset value with the target vocabulary. The vocabulary similarity calculation model can be obtained by training an existing model (such as a word2vec model) by adopting a machine learning method. When training the vocabulary similarity calculation model, scene input can be added to distinguish vocabulary similarity under different scenes, for example, the similarity of word1 input under app1 and word1 input under app2 is not 1.
Alternatively, based on the pre-trained vocabulary similarity calculation model, a third detection result may be generated by: first, first scene information is determined based on a scene rule expression among the sub-rule expressions. The first scenario information may represent a scenario that needs to be satisfied as indicated by the scenario rule expression. Then, scene information of the corpus is obtained, and may be referred to as second scene information. The second scene information is the scene generated by the corpus. And then, taking the first scene information and the target vocabulary as first input information, taking the second scene information and the words in the word segmentation result as second input information, and inputting the first input information and the second input information into a pre-trained vocabulary similarity calculation model to obtain the first similarity of the target vocabulary and the words in the word segmentation result. The first similarity of the target vocabulary and the words in the word segmentation result is the similarity of the first input information and the second input information. Here, the first similarity of the target vocabulary and the words in the word segmentation result may be determined one by a traversal manner. Finally, based on the first similarity, whether the word segmentation result contains similar words of the target word or not can be determined, and a third detection result is generated. For example, for a certain target word, if the word segmentation result includes a word having a similarity greater than a certain preset value, the third detection result may indicate that the word segmentation result includes a similar word of the target word.
In the substep S23, based on the pre-trained sentence meaning similarity calculation model, whether the corpus contains sentences similar to the syntax indicated by the regular expression is detected, and a fourth detection result is generated.
The fourth detection result herein may indicate whether sentences similar to the syntax indicated by the regular expression are included in the corpus. Syntactical similarity with regular expression indication refers to sentences having a similarity with regular expression indication that is greater than a certain preset value. The sentence meaning similarity calculation model can be obtained through training by a machine learning method, for example, the machine learning method is adopted to carry out transfer learning on the existing model (such as BERT (Bidirectional Encoder Representation from Transformers, bidirectional encoder based on a transducer structure).
Optionally, based on a pre-trained sentence similarity calculation model, the fourth detection result may be obtained by: first, first scene information is determined based on a scene rule expression among the sub-rule expressions. And then, obtaining second scene information of the corpus. And then, taking the first scene information and the regular expression as third input information, taking sentences in the second scene information and the corpus as fourth input information, and inputting the third input information and the fourth input information into a pre-trained sentence meaning similarity calculation model to obtain second similarity of the regular expression and the sentences in the corpus. And finally, based on the second similarity, determining whether sentences similar to the syntax indicated by the regular expression are contained in the corpus, and generating a fourth detection result. For example, for a regular expression in the vocabulary rule expression, if a certain corpus contains sentences with similarity to the syntax indicated by the regular expression greater than a certain preset value, the fourth detection result may indicate similar vocabularies including the target vocabulary in the word segmentation result.
It should be noted that, when the vocabulary and the vocabulary package identifier are not included in the vocabulary rule expression and only the regular expression is included, the above-mentioned sub-step S22 may not be performed, and the matching result with the sub-rule expression may be determined directly based on the fourth detection result in the sub-step S24. Similarly, when the vocabulary or the vocabulary package identifier is included in the vocabulary rule expression, but the regular expression is not included, the above-described sub-step S23 may not be performed, and the matching result with the sub-rule expression may be determined directly based on the third detection result in the sub-step S24.
In some alternative implementations of the present embodiment, where the lexical rule expression includes at least two lexicons, the different lexicons are separated by a target symbol, the different target symbols being used to indicate different logical relationships.
In some optional implementations of the present embodiment, the context rule expression includes at least one of: at least one scene identifier and at least one scene packet identifier, wherein the scene packet indicated by the scene packet identifier comprises one or more scene identifiers; and, when the context rule expression includes at least two context identifications, the different context identifications are separated by a separation symbol.
Step 204, determining target users in the candidate users based on the scores of the candidate users.
In this embodiment, the execution subject may determine the target user among the candidate users in various ways based on the score of the candidate users.
As an example, a preset number of users may be selected from the candidate users in order of high-to-low scores as target users.
As yet another example, a user having a selection score higher than a preset value (e.g., 0) among the candidate users may be determined as the target user.
As can be seen from fig. 2, compared with the embodiment corresponding to fig. 1, the flow 200 of the target user determination method in this embodiment relates to the syntax of the rule expression. That is, a rule expression may include at least one sub-rule expression, different ones of the above sub-rule expressions being separated by logical symbols, different logical symbols indicating different logical relationships. Meanwhile, the method involves the steps of matching the corpus with sub-regular expressions in the regular expressions, determining the scores of candidate users based on the matching results and the logic relations indicated by logic symbols in the regular expressions, and finally determining target users based on the scores. Therefore, a simple and effective rule expression setting mode and an effective analysis mode for the rule expression are provided, and the rule expression is visual, so that the learning cost is low, and the labor cost in the process of determining a target user is greatly reduced.
With further reference to fig. 3, as an implementation of the method shown in the above figures, the present application provides an embodiment of a target user determining apparatus, which corresponds to the method embodiment shown in fig. 1, and which is particularly applicable to various electronic devices.
As shown in fig. 3, the target user determining apparatus 300 of the present embodiment includes: a receiving unit 301 configured to receive a rule expression customized by a user, where the rule expression is written in a custom grammar and is used for corpus matching; an obtaining unit 302 configured to obtain a corpus generated by a candidate user; and a determining unit 303 configured to match the corpus with the rule expression, and determine a target user among the candidate users based on a matching result.
In some optional implementations of this embodiment, the apparatus further includes: an adding unit configured to: acquiring a target label associated with the rule expression, wherein the target label is customized by a user who sends the rule expression; and adding the target label for the target user.
In some alternative implementations of this embodiment, the rule expressions include at least one sub-rule expression, different ones of the sub-rule expressions being separated by logical symbols, different logical symbols indicating different logical relationships.
In some optional implementations of this embodiment, the determining unit 303 is further configured to: matching the corpus with sub-regular expressions in the regular expressions, and determining the scores of the candidate users based on the matching result and the logic relationship indicated by the logic symbols in the regular expressions; and determining target users in the candidate users based on the scores of the candidate users.
In some optional implementations of this embodiment, the determining unit 303 is further configured to: converting the rule expression into a tree structure, wherein leaf nodes of the tree structure are sub-rule expressions in the rule expression, and non-leaf nodes of the tree structure are logic symbols in the rule expression; and matching the corpus with leaf nodes of the tree structure, and determining the scores of the candidate users based on the matching result and the logic relationship indicated by non-leaf nodes of the tree structure.
In some optional implementations of this embodiment, the determining unit 303 is further configured to: selecting a preset number of users from the candidate users as target users according to the order of the scores from high to low; or determining the user with the score higher than a preset value from the candidate users as a target user.
In some optional implementations of this embodiment, the sub-rule expressions include a scene rule expression and a vocabulary rule expression, the scene rule expression and the vocabulary rule expression being separated by a preset symbol.
In some optional implementations of this embodiment, the vocabulary rule expression includes at least one of: the word package comprises at least one word, at least one word package identifier and at least one regular expression, wherein the word package indicated by the word package identifier comprises one or more words.
In some optional implementations of this embodiment, the determining unit 303 is further configured to: determining the vocabulary related to the vocabulary rule expression in the sub-rule expression as a target vocabulary, and segmenting the corpus to obtain a segmentation result; detecting whether the word segmentation result contains the target word or not, and generating a first detection result; detecting whether the corpus contains sentences consistent with the syntax indicated by the regular expression or not, and generating a second detection result; and determining a matching result of the corpus and the sub-regular expression based on the first detection result, the second detection result and a scene generated by the corpus.
In some optional implementations of this embodiment, the determining unit 303 is further configured to: determining the vocabulary related to the vocabulary rule expression in the sub-rule expression as a target vocabulary, and segmenting the corpus to obtain a segmentation result; detecting whether the word segmentation result contains similar words of the target word or not based on a pre-trained word similarity calculation model, and generating a third detection result; detecting whether sentences similar to the syntax indicated by the regular expression are contained in the corpus or not based on a pre-trained sentence meaning similarity calculation model, and generating a fourth detection result; and determining a matching result of the corpus and the sub-regular expression based on the third detection result and the fourth detection result.
In some optional implementations of this embodiment, the determining unit 303 is further configured to: determining first scene information based on a scene rule expression in the sub-rule expressions; acquiring second scene information of the corpus; the first scene information and the target vocabulary are used as first input information, words in the second scene information and the word segmentation result are used as second input information, and the first input information and the second input information are input into a pre-trained vocabulary similarity calculation model to obtain first similarity of the target vocabulary and the words in the word segmentation result; and determining whether the word segmentation result contains similar words of the target word or not based on the first similarity, and generating a third detection result.
In some optional implementations of this embodiment, the determining unit 303 is further configured to: determining first scene information based on a scene rule expression in the sub-rule expressions; acquiring second scene information of the corpus; taking the first scene information and the regular expression as third input information, taking sentences in the second scene information and the corpus as fourth input information, and inputting the third input information and the fourth input information into a pre-trained sentence meaning similarity calculation model to obtain second similarity of the regular expression and sentences in the corpus; and determining whether sentences similar to the syntax indicated by the regular expression are contained in the corpus based on the second similarity, and generating a fourth detection result.
In some alternative implementations of this embodiment, when the vocabulary rule expression includes at least two vocabularies, the different vocabularies are separated by a target symbol, and the different target symbols are used to indicate different logical relationships.
In some optional implementations of this embodiment, the above-described context rule expression includes at least one of: at least one scene identifier and at least one scene packet identifier, wherein the scene packet indicated by the scene packet identifier comprises one or more scene identifiers; and when the scene rule expression comprises at least two scene identifications, the different scene identifications are separated by a separation symbol.
According to the device provided by the embodiment of the application, the user-defined rule expression is received, the corpus generated by the candidate user is obtained, and the corpus is matched with the rule expression, so that the target user in the candidate user is determined based on the matching result, wherein the rule expression is written by using a user-defined grammar and is used for corpus matching. Since the rule expression can screen out some corpora meeting specific conditions, the user corpora can reflect the types, favorites and the like of the users, and therefore the target users can be effectively selected in a mode that the rule expression is matched with the user corpora. The process can determine the target user without the user tag, and improves the selection range of the target user. Meanwhile, through setting the rule expression, the user screening can be performed without writing professional codes, and the labor cost in the process of determining the target user is reduced.
Fig. 4 is a block diagram illustrating an apparatus 400 for input, which apparatus 400 may be a smart terminal or a server, according to an exemplary embodiment. For example, apparatus 400 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, or the like.
Referring to fig. 4, apparatus 400 may include one or more of the following components: a processing component 402, a memory 404, a power supply component 406, a multimedia component 408, an audio component 410, an input/output (I/O) interface 412, a sensor component 414, and a communication component 416.
The processing component 402 generally controls the overall operation of the apparatus 400, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing element 402 may include one or more processors 420 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 402 can include one or more modules that facilitate interaction between the processing component 402 and other components. For example, the processing component 402 may include a multimedia module to facilitate interaction between the multimedia component 408 and the processing component 402.
Memory 404 is configured to store various types of data to support operations at apparatus 400. Examples of such data include instructions for any application or method operating on the apparatus 400, contact data, phonebook data, messages, pictures, videos, and the like. The memory 404 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
The power supply component 406 provides power to the various components of the apparatus 400. The power supply components 406 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 400.
The multimedia component 408 includes a screen that provides an output interface between the device 400 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only a boundary of a touch or a sliding action but also a duration and a pressure related to the touch or the sliding operation. In some embodiments, the multimedia component 408 includes a front camera and/or a rear camera. The front camera and/or the rear camera may receive external multimedia data when the device 400 is in an operational mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.
The audio component 410 is configured to output and/or input audio signals. For example, the audio component 410 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 400 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 404 or transmitted via the communication component 416. In some embodiments, audio component 410 further includes a speaker for outputting audio signals.
The I/O interface 412 provides an interface between the processing component 402 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.
The sensor assembly 414 includes one or more sensors for providing status assessment of various aspects of the apparatus 400. For example, the sensor assembly 414 may detect the on/off state of the device 400, the relative positioning of the components, such as the display and keypad of the apparatus 400, the sensor assembly 414 may also detect the change in position of the apparatus 400 or one component of the apparatus 400, the presence or absence of user contact with the apparatus 400, the orientation or acceleration/deceleration of the apparatus 400, and the change in temperature of the apparatus 400. The sensor assembly 414 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. The sensor assembly 414 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 414 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 416 is configured to facilitate communication between the apparatus 400 and other devices in a wired or wireless manner. The apparatus 400 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In one exemplary embodiment, the communication component 416 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 416 described above further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 400 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.
In an exemplary embodiment, a non-transitory computer-readable storage medium is also provided, such as memory 404, including instructions executable by processor 420 of apparatus 400 to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
Fig. 5 is a schematic diagram of a server in some embodiments of the application. The server 500 may vary considerably in configuration or performance and may include one or more central processing units (central processing units, CPUs) 522 (e.g., one or more processors) and memory 532, one or more storage mediums 530 (e.g., one or more mass storage devices) that store applications 542 or data 544. Wherein memory 532 and storage medium 530 may be transitory or persistent. The program stored in the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Still further, the central processor 522 may be configured to communicate with a storage medium 530 and execute a series of instruction operations in the storage medium 530 on the server 500.
The server 500 may also include one or more power supplies 526, one or more wired or wireless network interfaces 550, one or more input/output interfaces 558, one or more keyboards 556, and/or one or more operating systems 541, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, and the like.
A non-transitory computer readable storage medium, which when executed by a processor of an apparatus (smart terminal or server) causes the apparatus to perform a target user determination method, the method comprising: receiving a user-defined rule expression which is written by a user-defined grammar and is used for corpus matching; acquiring corpus generated by candidate users; matching the corpus with the rule expression, and determining target users in the candidate users based on a matching result
Optionally, the device is configured to execute the one or more programs by one or more processors including instructions for: obtaining a target label associated with the rule expression, wherein the target label is customized by a user who sends the rule expression; and adding the target label for the target user.
Optionally, the rule expression includes at least one sub-rule expression, different sub-rule expressions being separated by logical symbols, different logical symbols indicating different logical relationships.
Optionally, the matching the corpus with the rule expression, determining a target user in the candidate users based on a matching result, includes: matching the corpus with sub-regular expressions in the regular expressions, and determining the score of the candidate user based on a matching result and a logic relationship indicated by logic symbols in the regular expressions; a target user of the candidate users is determined based on the scores of the candidate users.
Optionally, the matching the corpus with sub-regular expressions in the regular expression, and determining the score of the candidate user based on the matching result and the logical relationship indicated by the logical symbol in the regular expression includes: converting the rule expression into a tree structure, wherein leaf nodes of the tree structure are sub-rule expressions in the rule expression, and non-leaf nodes of the tree structure are logic symbols in the rule expression; and matching the corpus with leaf nodes of the tree structure, and determining the scores of the candidate users based on the matching result and the logic relationship indicated by non-leaf nodes of the tree structure.
Optionally, the determining, based on the scores of the candidate users, a target user among the candidate users includes: selecting a preset number of users from the candidate users according to the order of the scores from high to low, and taking the users as target users; or determining the user with the score higher than a preset value from the candidate users as a target user.
Optionally, the sub-rule expression includes a scene rule expression and a vocabulary rule expression, and the scene rule expression and the vocabulary rule expression are separated by a preset symbol.
Optionally, the vocabulary rule expression includes at least one of the following: the word package comprises at least one word, at least one word package identifier and at least one regular expression, wherein the word package indicated by the word package identifier comprises one or more words.
Optionally, the matching the corpus with the sub-regular expression in the regular expression includes: determining the vocabulary related to the vocabulary rule expression in the sub-rule expression as a target vocabulary, and segmenting the corpus to obtain a segmentation result; detecting whether the word segmentation result contains the target word or not, and generating a first detection result; detecting whether the corpus contains sentences consistent with the syntax indicated by the regular expression or not, and generating a second detection result; and determining a matching result of the corpus and the sub-regular expression based on the first detection result, the second detection result and a scene generated by the corpus.
Optionally, the matching the corpus with the sub-regular expression in the regular expression includes: determining the vocabulary related to the vocabulary rule expression in the sub-rule expression as a target vocabulary, and segmenting the corpus to obtain a segmentation result; based on a pre-trained vocabulary similarity calculation model, detecting whether the word segmentation result contains similar vocabularies of the target vocabulary or not, and generating a third detection result; based on a pre-trained sentence meaning similarity calculation model, detecting whether sentences similar to the syntax indicated by the regular expression are contained in the corpus or not, and generating a fourth detection result; and determining a matching result of the corpus and the sub-regular expression based on the third detection result and the fourth detection result.
Optionally, the detecting whether the word segmentation result includes similar words of the target word or not based on a pre-trained word similarity calculation model, and generating a third detection result, including determining first scene information based on a scene rule expression in the sub-rule expression; acquiring second scene information of the corpus; taking the first scene information and the target vocabulary as first input information, taking words in the second scene information and the word segmentation result as second input information, and inputting the first input information and the second input information into a pre-trained vocabulary similarity calculation model to obtain first similarity of the target vocabulary and the words in the word segmentation result; and determining whether the word segmentation result contains similar words of the target word or not based on the first similarity, and generating a third detection result.
Optionally, the detecting whether the corpus includes sentences similar to the syntax indicated by the regular expression based on the pre-trained sentence meaning similarity calculation model, and generating a fourth detection result includes: determining first scene information based on a scene rule expression in the sub-rule expressions; acquiring second scene information of the corpus; taking the first scene information and the regular expression as third input information, taking sentences in the second scene information and the corpus as fourth input information, and inputting the third input information and the fourth input information into a pre-trained sentence meaning similarity calculation model to obtain second similarity of the regular expression and sentences in the corpus; based on the second similarity, determining whether sentences similar to the syntax indicated by the regular expression are contained in the corpus, and generating a fourth detection result.
Optionally, when the vocabulary rule expression includes at least two vocabularies, different vocabularies are separated by a target symbol, and different target symbols are used to indicate different logical relationships.
Optionally, the scene rule expression includes at least one of: at least one scene identifier and at least one scene packet identifier, wherein the scene packet indicated by the scene packet identifier comprises one or more scene identifiers; and when the scene rule expression includes at least two scene identifications, the different scene identifications are separated by a separation symbol.
Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.
The foregoing description of the preferred embodiments of the application is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the application.
The above description of a method and apparatus for determining a target user and a device for determining a target user provided by the present application are detailed, and specific examples are applied herein to illustrate the principles and embodiments of the present application, where the above examples are only used to help understand the method and core idea of the present application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims (8)

1. A method of determining a target user, the method comprising:
Receiving a user-defined rule expression which is written by a user-defined grammar and is used for corpus matching;
acquiring corpus generated by candidate users;
Matching the corpus with the rule expression, and determining target users in the candidate users based on a matching result;
the matching the corpus with the rule expression, and determining the target user in the candidate users based on the matching result comprises the following steps:
Matching the corpus with sub-regular expressions in the regular expressions, and determining the score of the candidate user based on a matching result and a logic relationship indicated by logic symbols in the regular expressions;
Determining a target user of the candidate users based on the scores of the candidate users;
the matching the corpus with sub-regular expressions in the regular expression, and determining the score of the candidate user based on the matching result and the logic relationship indicated by the logic symbol in the regular expression, including:
Converting the rule expression into a tree structure, wherein leaf nodes of the tree structure are sub-rule expressions in the rule expression, and non-leaf nodes of the tree structure are logic symbols in the rule expression;
And matching the corpus with leaf nodes of the tree structure, and determining the scores of the candidate users based on the matching result and the logic relationship indicated by non-leaf nodes of the tree structure.
2. The method according to claim 1, wherein the method further comprises:
obtaining a target label associated with the rule expression, wherein the target label is customized by a user who sends the rule expression;
and adding the target label for the target user.
3. The method of claim 1, wherein the rule expression comprises at least one sub-rule expression, different ones of the sub-rule expressions being separated by logical symbols, different logical symbols indicating different logical relationships.
4. The method of claim 1, wherein the determining a target user of the candidate users based on the scores of the candidate users comprises:
selecting a preset number of users from the candidate users according to the order of the scores from high to low, and taking the users as target users; or alternatively
And determining the user with the score higher than a preset value from the candidate users as a target user.
5. The method of claim 1, wherein the sub-regular expressions comprise a context rule expression and a vocabulary rule expression, the context rule expression and the vocabulary rule expression being separated by a preset symbol.
6. An apparatus for determining a target user, the apparatus comprising:
The receiving unit is configured to receive a user-defined rule expression which is written by a user-defined grammar and is used for corpus matching;
An acquisition unit configured to acquire corpus generated by candidate users;
A determining unit configured to match the corpus with the rule expression, and determine a target user among the candidate users based on a matching result;
The determining unit is further configured to: matching the corpus with sub-regular expressions in the regular expressions, and determining the score of the candidate user based on a matching result and a logic relationship indicated by logic symbols in the regular expressions; determining a target user of the candidate users based on the scores of the candidate users;
The determining unit is further configured to: converting the rule expression into a tree structure, wherein leaf nodes of the tree structure are sub-rule expressions in the rule expression, and non-leaf nodes of the tree structure are logic symbols in the rule expression; and matching the corpus with leaf nodes of the tree structure, and determining the scores of the candidate users based on the matching result and the logic relationship indicated by non-leaf nodes of the tree structure.
7. An apparatus for determining a target user, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for:
Receiving a user-defined rule expression which is written by a user-defined grammar and is used for corpus matching;
acquiring corpus generated by candidate users;
Matching the corpus with the rule expression, and determining target users in the candidate users based on a matching result;
the matching the corpus with the rule expression, and determining the target user in the candidate users based on the matching result comprises the following steps:
Matching the corpus with sub-regular expressions in the regular expressions, and determining the score of the candidate user based on a matching result and a logic relationship indicated by logic symbols in the regular expressions;
Determining a target user of the candidate users based on the scores of the candidate users;
the matching the corpus with sub-regular expressions in the regular expression, and determining the score of the candidate user based on the matching result and the logic relationship indicated by the logic symbol in the regular expression, including:
Converting the rule expression into a tree structure, wherein leaf nodes of the tree structure are sub-rule expressions in the rule expression, and non-leaf nodes of the tree structure are logic symbols in the rule expression;
And matching the corpus with leaf nodes of the tree structure, and determining the scores of the candidate users based on the matching result and the logic relationship indicated by non-leaf nodes of the tree structure.
8. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-5.
CN202010544461.0A 2020-06-15 2020-06-15 Target user determining method and device for determining target user Active CN113807082B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010544461.0A CN113807082B (en) 2020-06-15 2020-06-15 Target user determining method and device for determining target user

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010544461.0A CN113807082B (en) 2020-06-15 2020-06-15 Target user determining method and device for determining target user

Publications (2)

Publication Number Publication Date
CN113807082A CN113807082A (en) 2021-12-17
CN113807082B true CN113807082B (en) 2024-07-09

Family

ID=78944361

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010544461.0A Active CN113807082B (en) 2020-06-15 2020-06-15 Target user determining method and device for determining target user

Country Status (1)

Country Link
CN (1) CN113807082B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7779049B1 (en) * 2004-12-20 2010-08-17 Tw Vericept Corporation Source level optimization of regular expressions
CN109545202A (en) * 2018-11-08 2019-03-29 广东小天才科技有限公司 Method and system for adjusting corpus with semantic logic confusion

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376847A (en) * 2018-08-31 2019-02-22 深圳壹账通智能科技有限公司 User's intension recognizing method, device, terminal and computer readable storage medium
CN109388700A (en) * 2018-10-26 2019-02-26 广东小天才科技有限公司 Intention identification method and system
CN109271492A (en) * 2018-11-16 2019-01-25 广东小天才科技有限公司 Automatic generation method and system of corpus regular expression
CN110516175B (en) * 2019-08-29 2022-05-17 秒针信息技术有限公司 Method, device, equipment and medium for determining user label

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7779049B1 (en) * 2004-12-20 2010-08-17 Tw Vericept Corporation Source level optimization of regular expressions
CN109545202A (en) * 2018-11-08 2019-03-29 广东小天才科技有限公司 Method and system for adjusting corpus with semantic logic confusion

Also Published As

Publication number Publication date
CN113807082A (en) 2021-12-17

Similar Documents

Publication Publication Date Title
CN111368541B (en) Named entity identification method and device
CN107564526B (en) Processing method, apparatus and machine-readable medium
CN111831806B (en) Semantic integrity determination method, device, electronic equipment and storage medium
CN108304412B (en) Cross-language search method and device for cross-language search
CN110069624B (en) Text processing method and device
EP3790001B1 (en) Speech information processing method, device and storage medium
CN108345625B (en) Information mining method and device for information mining
CN107424612B (en) Processing method, apparatus and machine-readable medium
CN110069143B (en) Information error correction preventing method and device and electronic equipment
CN112735396A (en) Speech recognition error correction method, device and storage medium
CN111414766B (en) Translation method and device
CN112133295B (en) Speech recognition method, device and storage medium
CN113936697B (en) Voice processing method and device for voice processing
CN111324214B (en) Statement error correction method and device
CN112149403A (en) Method and device for determining confidential text
CN111832297A (en) Part-of-speech tagging method and device and computer-readable storage medium
CN111079421A (en) Text information word segmentation processing method, device, terminal and storage medium
CN109979435B (en) Data processing method and device for data processing
CN113807082B (en) Target user determining method and device for determining target user
CN108108356B (en) Character translation method, device and equipment
CN110633017A (en) Input method, input device and input device
CN110084065B (en) Data desensitization method and device
CN111103986A (en) User word stock management method and device and input method and device
CN109669549B (en) Candidate content generation method and device for candidate content generation
CN112528129B (en) Language searching method and device for multilingual translation system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant