CN111259158B - Text classification method, device and medium - Google Patents

Text classification method, device and medium Download PDF

Info

Publication number
CN111259158B
CN111259158B CN202010114084.7A CN202010114084A CN111259158B CN 111259158 B CN111259158 B CN 111259158B CN 202010114084 A CN202010114084 A CN 202010114084A CN 111259158 B CN111259158 B CN 111259158B
Authority
CN
China
Prior art keywords
word
target text
sample
words
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010114084.7A
Other languages
Chinese (zh)
Other versions
CN111259158A (en
Inventor
鲁骁
孟二利
王斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Pinecone Electronic Co Ltd
Original Assignee
Beijing Xiaomi Pinecone Electronic Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Pinecone Electronic Co Ltd filed Critical Beijing Xiaomi Pinecone Electronic Co Ltd
Priority to CN202010114084.7A priority Critical patent/CN111259158B/en
Publication of CN111259158A publication Critical patent/CN111259158A/en
Application granted granted Critical
Publication of CN111259158B publication Critical patent/CN111259158B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclosed herein are a text classification method, apparatus and medium, the method comprising: determining the formation mode of the target text according to the dictionary; generating word vectors of the target text according to the word vector generation method corresponding to the composition mode; classifying the target text according to the word vector and the sample mapping set of the target text; the sample word mapping set comprises a plurality of subsets, each subset comprises a one-to-one mapping relation between a plurality of sample words and word vectors, sample words contained in different subsets belong to different categories, and the dimensions of all word vectors in the sample word mapping set are the same; a dictionary is a subset of all sample words in the set of sample word mappings. The method maps target texts with different lengths onto vector spaces with the same dimension based on the dictionary and the sample mapping set, so that unification on a representation layer is realized, the target texts with different lengths can be classified by the same classification algorithm, and the classification accuracy can be effectively improved.

Description

Text classification method, device and medium
Technical Field
The disclosure relates to the field of text classification, and in particular relates to a text classification method, device and medium.
Background
At present, a large number of regular texts are often required to be set in the text processing service to manage the content, and the regular texts comprise keywords, phrases, sentences, regular expressions and the like. Meanwhile, the rule texts need to maintain corresponding classification information, which is used for representing the attribution of the category corresponding to the data obtained by filtering the rule texts, so that operators can conveniently carry out subsequent processing on the data, including log statistics, data report forms, rule error correction, processing on the data of different categories through different data processing channels, and the like, and the accuracy of rule text classification can influence the effect of subsequent data processing flows to a great extent and further influence the operation efficiency of a service system.
Currently, classification of rule tables can be achieved by two methods, manual classification and automatic classification.
The manual classification method depends on understanding of the rule table and the class system by service personnel, and needs unified classification standards, and the service personnel are required to be trained in time aiming at updating the rule table and adjusting the class system. In actual business operation, a plurality of business personnel often maintain the business at the same time, and classification errors caused by inconsistent knowledge of different business personnel on classification standards often occur. In the service operation process, after finding the data to be filtered through various modes such as user feedback, log tracking, system inspection and the like, operators manually update the rule table, and after long-time accumulation, the rule text is large in general scale, so that the rule classification is difficult to rely on manual work.
The automatic classification method is generally implemented by extracting text features of a rule table through text analysis and then adopting a proper classification algorithm. For short text and long text, different feature extraction methods need to be employed. Because the rule table has a complex composition structure and contains texts in various forms such as keywords, phrases, sentences, regular expressions and the like, the text is difficult to realize by a unified method, and the characteristic representation is inconsistent due to the rule texts with different lengths, so that the classification accuracy is affected. Meanwhile, in the traditional classification method, a certain scale of manual annotation data is needed, dictionary-based features are extracted through the annotated training data, however, because the rule table is different from the content of the long text articles, the word distribution is very sparse, so that most words are difficult to be covered by the annotated training data, and therefore, once new keywords appear in the rule table, the condition of unknown words can be caused, so that the feature expression of the rule text is invalid, and the accuracy of the classification algorithm is directly influenced.
In the related art, for an unregistered word, a category to which the unregistered word belongs may be determined by calculating a similarity between a context of the unregistered word and a context of each category. This approach requires reliance on a synonym dictionary and requires that the synonyms have category attributes. In business, the category system is complex and there is no corresponding synonym dictionary available. In addition, when determining the category to which the unregistered word belongs in the related art, the context information according to which is applicable in the service scene of continuous sentences, articles and the like, but is not applicable to the rule text, because the rule text is mostly a keyword and a phrase segment, and no corresponding context information exists.
Disclosure of Invention
To overcome the problems in the related art, a text classification method, apparatus, and medium are provided herein.
According to a first aspect of embodiments herein, there is provided a text classification method comprising:
determining the formation mode of the target text according to the dictionary;
generating word vectors of the target text according to the word vector generation method corresponding to the formation mode;
classifying the target text according to the word vector and the sample mapping set of the target text;
the sample word mapping set comprises a plurality of subsets, each subset comprises a one-to-one mapping relation between a plurality of sample words and word vectors, sample words contained in different subsets belong to different categories, and the dimensions of all word vectors in the sample word mapping set are the same; the dictionary is a subset of all sample words in the set of sample word mappings.
In another embodiment, the determining the construction mode of the target text according to the dictionary includes: when the target text is determined to be a sample word in the dictionary, determining that the construction mode of the target text is a first mode;
the word vector generation method corresponding to the first mode comprises the following steps: and inquiring a one-to-one mapping relation in the sample word mapping set to determine a word vector corresponding to the target text.
In another embodiment, the determining the construction mode of the target text according to the dictionary includes: determining that the target text is not a sample word in the dictionary, and determining that the construction mode of the target text is a second mode when the length of the target text is greater than or equal to a set length;
the word vector generation method corresponding to the second mode comprises the following steps: and obtaining at least one effective composition word after word segmentation operation is carried out on the target text, selecting the effective composition word belonging to the dictionary from the effective composition words, inquiring a one-to-one mapping relation in a sample word mapping set to determine word vectors corresponding to the selected effective composition word, and determining word vectors corresponding to the target text according to the word vectors corresponding to each selected effective composition word.
In another embodiment, the determining the construction mode of the target text according to the dictionary includes: determining that the target text is not a sample word in the dictionary, and determining that the formation mode of the target text is a third mode when the length of the target text is smaller than a set length;
the word vector generation method corresponding to the third mode comprises the following steps: and respectively carrying out at least one sliding window type splitting on the target text, wherein window lengths of different sliding windows are different, selecting unit words belonging to the dictionary from unit words obtained by splitting through each sliding window type, inquiring a one-to-one mapping relation in the sample word mapping set to determine word vectors corresponding to all selected unit words, and determining word vectors corresponding to the target text according to the word vectors corresponding to all selected unit words.
In another embodiment, the length of the sliding window used in the at least one sliding window split is N character lengths from 1 character length to M character lengths, M being an integer greater than 1, N being less than or equal to M.
In another embodiment, classifying the target text using a set of word vectors and sample mappings for the target text includes: and calculating the similarity between the word vector of the target text and the word vector in the sample mapping set, and determining the category to which the target text belongs according to the similarity.
According to a second aspect of embodiments herein, there is provided a text classification apparatus comprising:
the first determining module is used for determining the formation mode of the target text according to the dictionary;
the generating module is used for generating the word vector of the target text according to the word vector generating method corresponding to the composition mode;
the classification module is used for classifying the target text according to the word vector and the sample mapping set of the target text;
the sample word mapping set comprises a plurality of subsets, each subset comprises a one-to-one mapping relation between a plurality of sample words and word vectors, sample words contained in different subsets belong to different categories, and the dimensions of all word vectors in the sample word mapping set are the same; the dictionary is a subset of all sample words in the set of sample word mappings.
In one embodiment, the first determining module includes:
the second determining module is used for determining that the composition mode of the target text is a first mode when the target text is a sample word in the dictionary;
the generation module comprises:
the first execution module is used for generating the word vector of the target text according to the word vector generation method corresponding to the first mode;
the first execution module includes:
and the first query module is used for querying the one-to-one mapping relation in the sample word mapping set to determine the word vector corresponding to the target text.
In one embodiment, the first determining module includes:
a third determining module, configured to determine that the target text is not a sample word in the dictionary, and determine that a configuration mode of the target text is a second mode when a length of the target text is greater than or equal to a set length;
the generation module comprises:
the second execution module is used for generating the word vector of the target text according to the word vector generation method corresponding to the second mode;
the second execution module includes:
the word segmentation module is used for obtaining at least one effective composition word after carrying out word segmentation operation on the target text;
A first selection module, configured to select an effective component word belonging to the dictionary from the effective component words;
the second query module is used for querying one-to-one mapping relation in the sample word mapping set to determine word vectors corresponding to the selected effective component words;
and the fourth determining module is used for determining the word vector corresponding to the target text according to the word vector corresponding to each selected effective composition word.
In one embodiment, the first determining module includes:
a fourth determining module, configured to determine that the target text is not a sample word in the dictionary, and determine that a configuration mode of the target text is a third mode when a length of the target text is less than a set length;
the generation module comprises:
the third execution module is used for generating the word vector of the target text according to the word vector generation method corresponding to the third mode;
the third execution module includes:
the splitting module is used for respectively splitting the target text by at least one sliding window, and window lengths of different sliding windows are different;
the second selection module is used for selecting unit words belonging to the dictionary from the unit words obtained after splitting by using each sliding window;
The third query module is used for querying the one-to-one mapping relation in the sample word mapping set to determine word vectors corresponding to all selected unit words;
and a fifth determining module, configured to determine a word vector corresponding to the target text according to the word vectors corresponding to all the selected unit words.
In one embodiment, the length of the sliding window used in the at least one sliding window split is N character lengths from 1 character length to M character length, M being an integer greater than 1, N being less than or equal to M.
In one embodiment, the classification module comprises:
the computing module is used for computing the similarity between the word vector of the target text and the word vector in the sample mapping set;
and the determining module is used for determining the category to which the target text belongs according to the similarity.
According to a third aspect of embodiments herein, there is provided a non-transitory computer readable storage medium, which when executed by a processor of a mobile terminal, causes the mobile terminal to perform a text classification method, the method comprising:
determining the formation mode of the target text according to the dictionary;
generating word vectors of the target text according to the word vector generation method corresponding to the formation mode;
Classifying the target text according to the word vector and the sample mapping set of the target text;
the sample word mapping set comprises a plurality of subsets, each subset comprises a one-to-one mapping relation between a plurality of sample words and word vectors, sample words contained in different subsets belong to different categories, and the dimensions of all word vectors in the sample word mapping set are the same; the dictionary is a subset of all sample words in the set of sample word mappings.
The technical scheme provided by the embodiment of the invention can comprise the following beneficial effects: based on the dictionary and the sample mapping set, mapping the target texts with different lengths onto the vector space with the same dimension, and realizing unification on the representation layer, so that the target texts with different lengths can be classified by the same classification algorithm, and the classification accuracy can be effectively improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent herewith and together with the description, serve to explain the principles herein.
FIG. 1 is a flow chart illustrating a method of text classification according to an exemplary embodiment;
FIG. 2 is a flowchart illustrating a method of generating a word vector corresponding to the second mode in step S13 of FIG. 1, according to an exemplary embodiment;
FIG. 3 is a flowchart illustrating a method of generating a word vector corresponding to the third mode in step S13 of FIG. 1, according to an exemplary embodiment;
FIG. 4 is a flowchart illustrating a method of text classification according to an exemplary embodiment;
FIG. 5 is a block diagram of a text classification device according to an exemplary embodiment;
FIG. 6 is a block diagram of a text classification device according to an exemplary embodiment;
FIG. 7 is a block diagram of a text classification device according to an exemplary embodiment;
FIG. 8 is a block diagram of a text classification device according to an exemplary embodiment;
fig. 9 is a block diagram illustrating a text classification apparatus according to an exemplary embodiment.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with this document. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
The text is used for classifying target texts, wherein the target texts are business rules, business contents, technical contents, network interaction contents and the like.
Embodiments herein provide a text classification method. Referring to fig. 1, fig. 1 is a flowchart illustrating a text classification method according to an exemplary embodiment. As shown in fig. 1, the text classification method includes:
and S11, determining the constitution mode of the target text according to the dictionary.
Step S12, generating word vectors of the target text according to the word vector generation method corresponding to the composition mode.
Step S13, classifying the target text according to the word vector and the sample mapping set of the target text.
The sample word mapping set comprises a plurality of subsets, each subset comprises a one-to-one mapping relation between a plurality of sample words and word vectors, sample words contained in different subsets belong to different categories, and the dimensions of all word vectors in the sample word mapping set are the same; the dictionary is a subset of all sample words in the set of sample word mappings.
Wherein, the subset of all sample words in the sample word mapping set refers to a part of all sample words in the sample word mapping set or all sample words in the sample word mapping set.
The number of words contained in the sample words is 1, or 2 or more than 2, i.e. the sample words are words or phrases. The categories in the sample word mapping set are obtained according to preset general category division or are determined according to a self-defining mode.
Illustrating:
the sample mapping set includes 50 categories, illustrated by the following four:
the mobile phone screen category comprises the following one-to-one mapping relation:
touch screen-vector 101, folding screen-vector 102, curved screen-vector 103, flexible screen-vector 104, screen-vector 105, … …, etc.
Communication device class, which includes the following one-to-one mapping:
mobile terminal-vector 201, handset-vector 202, router-vector 203, base station-vector 204, set top box-vector 205, … …, etc.
Network hotness category, under which the following one-to-one mapping relationship is included:
traffic-vector 301, click-volume-vector 302, forward-volume-vector 303, fan number 304, praise-vector 305, … …, etc.
The dictionary includes sample words in a set of sample mappings, including, for example: touch screen, folding screen, curved screen, flexible screen, mobile terminal, cell-phone, router, basic station, flow, click volume, forward volume, vermicelli number.
According to the method, based on the dictionary and the sample mapping set, target texts with different lengths are mapped to the vector space with the same dimension, so that unification on a representation layer is realized, classification of the target texts with different lengths can be realized through the same classification algorithm, and classification accuracy can be effectively improved.
Embodiments herein provide a text classification method. The method further comprises a method for obtaining a sample mapping set, wherein the sample mapping set is a preset sample mapping set which is directly obtained, or the sample mapping set is obtained by combining manual intervention with an automatic expansion word, and the method specifically comprises the following steps:
step 1, manually selecting a plurality of seed words for each category, and selecting a first number (for example, 50) of seed words for each category in order to ensure the classification effect.
And 2, expanding according to the seed words to obtain the paraphrasing words with preset quantity.
And step 3, merging similar words of all seed words in each category, and removing repeated sample words to obtain a sample word set of each category.
And 4, determining word vectors of each sample word in the sample word set of each category, and finally forming a sample mapping set. The vectors of the sample words are set to have the same dimension, for example, the dimension is 40 dimensions.
Wherein, the expanding according to the seed words in the step 2 comprises any one of the following modes:
in the first mode, word vectors corresponding to all seed words are determined, the similarity of the seed words and other candidate words is calculated through the word vectors, then the similarity is ordered, and a preset number of similar words with larger similarity of each seed word are extracted to realize expansion.
In a second mode, a word list of similar meaning words (such as word list of similar meaning words disclosed by word forest, word net and the like) is queried, the word list is queried for the similar meaning words of seed words, a preset number of similar meaning words are selected for each seed word, and then the words are combined and de-duplicated.
The main difference between the second mode and the first mode is that the similarity calculation method of the candidate words in the seed word expansion process is different, and the calculation complexity is different. In the first mode, the similarity between the seed word and other candidate words is calculated based on the word vector, the calculation complexity of the method is high, but the word vector used in the calculation process and the word vector used in the subsequent classification algorithm are the same vector space, and the front-back consistency is high. In the second mode, the similar meaning word of the seed word is directly inquired in the similar meaning word list in a table look-up mode, the calculation complexity of the method is low, the similar meaning word list is manually maintained and is not consistent with the data distribution in the application scene of the method, and the effect stability is lower than that of the first mode.
In order to make the dimensions of the word vectors of all the sample words the same in both the first and second methods, it is necessary to make the dimensions of the word vector of each sample word the same when determining the word vector of each sample word in the sample word set of each category in step 4.
The embodiment also provides a text classification method. In this method, determining the formation of the target text according to the dictionary in step S11 shown in fig. 1 includes: and when the target text is determined to be a sample word in the dictionary, determining the construction mode of the target text as a first mode. The word vector generation method corresponding to the first mode in step S13 shown in fig. 1 includes: and determining word vectors corresponding to the target text according to the one-to-one mapping relation in the query sample word mapping set.
Wherein determining that the target text is a sample word in the dictionary comprises: and comparing the target text with each sample word in the dictionary one by one, wherein the target text is considered to be identical and is not considered to be identical, and the target text is considered not to be identical.
Examples are as follows:
the target text is a curved screen, and each sample word in the dictionary is compared with the target text one by one, so that the target text is matched with the dictionary to be identical, namely the dictionary comprises the curved screen. And determining a word vector corresponding to the target text as a vector 103 according to a one-to-one mapping relation in the query sample word mapping set, and taking the vector 103 as the word vector of the target text.
The embodiment also provides a text classification method. In this method, determining the formation of the target text according to the dictionary in step S11 shown in fig. 1 includes: and determining that the target text is not a sample word in the dictionary, and determining that the construction mode of the target text is a second mode when the length of the target text is greater than or equal to the set length.
Referring to fig. 2, fig. 2 is a flowchart of a word vector generating method corresponding to the second mode in step S13 shown in fig. 1 and provided in this embodiment, and as shown in fig. 2, the method includes:
s21, obtaining at least one effective composition word after word segmentation operation is carried out on the target text;
step S22, selecting effective composition words belonging to the dictionary from the effective composition words;
step S23, determining word vectors corresponding to the selected effective component words according to one-to-one mapping relation in the query sample word mapping set;
and step S24, determining the word vector corresponding to the target text according to the word vector corresponding to each selected effective component word.
The method for obtaining at least one effective composition word after word segmentation operation on the target text comprises the following steps: and performing word segmentation operation on the target text to obtain a composition word set, and removing invalid words (the invalid words are virtual words, auxiliary words and the like) from the composition word set to obtain effective composition words.
Determining the word vector corresponding to the target text according to the word vector corresponding to each selected effective component word, including: and calculating an average vector of word vectors corresponding to each selected effective component word, taking the average vector as a word vector corresponding to the target text, or calculating a weighted average vector of word vectors corresponding to each effective component word, and taking the weighted average vector as a word vector corresponding to the target text. When calculating the weighted average vector of the word vectors corresponding to each effective component word, the weights corresponding to different effective component words are different, for example, a part of sample words with relatively basic properties or high use frequency are preset to have larger weights.
Examples are as follows:
the length is set to be 6 words, and the target text is that the screen of the prototype is a curved screen. After judging that the formation mode of the target text is the second mode, performing word segmentation operation on the target text to obtain a composition word set, wherein the composition word set comprises: prototype, screen, yes, curved screen. Removing invalid words from the composition word set to obtain valid composition words, including: prototype, screen, curved screen. And finally determining the category of the target text according to the composition word as a mobile phone screen.
The embodiment also provides a text classification method. In this method, determining the formation of the target text according to the dictionary in step S11 shown in fig. 1 includes: and determining that the target text is not a sample word in the dictionary, and determining that the construction mode of the target text is a third mode when the length of the target text is smaller than the set length.
Referring to fig. 3, fig. 3 is a flowchart of a word vector generating method corresponding to the third mode in step S13 shown in fig. 1 provided in this embodiment, and as shown in fig. 3, the method includes:
step S31, at least one sliding window type splitting is respectively carried out on the target text, and window lengths of different sliding windows are different;
step S32, selecting unit words belonging to the dictionary from the unit words obtained after splitting by using each sliding window;
step S33, inquiring a one-to-one mapping relation in the sample word mapping set to determine word vectors corresponding to all selected unit words;
and step S34, determining the word vector corresponding to the target text according to the word vectors corresponding to all the selected unit words.
Wherein the length of the sliding window used in the at least one sliding window type splitting is from 1 character length to N character lengths in M character lengths, M is an integer greater than 1, and N is less than or equal to M. Wherein to N kinds of smoothing windows, i.e., N kinds of M kinds of smoothing windows ranging in length from 1 character to M characters. Examples include: a 1-character smoothing window, a 2-character smoothing window, and a 3-character smoothing window. For another example: including a 1-character smoothing window and a 3-character smoothing window.
Determining the word vector corresponding to the target text according to the word vectors corresponding to all the selected unit words, including: and calculating an average vector of word vectors corresponding to all the selected unit words, taking the average vector as a word vector corresponding to the target text, or calculating a weighted average vector of word vectors corresponding to all the selected unit words, and taking the weighted average vector as a word vector corresponding to the target text. When calculating the weighted average vector of the word vectors corresponding to all the selected unit words, the weights corresponding to different unit words are different, for example, a part of sample words with relatively basic properties or high use frequency are preset to have larger weights.
Examples are as follows:
the length is set to be 6 words, and the target text is that the screen is a curved screen. After judging that the formation mode of the target text is a third mode, word segmentation operation is carried out on the target text by using a 1-character smoothing window, a 2-character smoothing window and a 3-character smoothing window respectively. The unit words obtained after word segmentation operation by using the 1-character smooth window target text comprise: screen, curtain, yes, face, screen. The unit words obtained after word segmentation operation by using the 2-character smooth window target text comprise: the screen and the curtain are curved surfaces and surface screens. The unit words obtained after word segmentation operation by using the 3-character smooth window target text comprise: the screen is curved, or curved. Selecting unit words belonging to a dictionary from unit words obtained by splitting by using sliding windows, wherein the unit words comprise: a screen and a curved surface screen. And finally, determining the category of the target text according to the unit word as a mobile phone screen.
The method can realize vector representation of the text of the third mode on the vector space of the same dimension aiming at a plurality of smooth windows used by the third mode, thereby carrying out a subsequent classification method compatible with the first mode and the second mode.
The embodiment also provides a text classification method. In this method, classifying the target text according to the word vector and the sample mapping set of the target text in step S13 shown in fig. 1 includes: and calculating the similarity between the word vector of the target text and the word vector in the sample mapping set, determining the word vector meeting the similarity condition in the sample mapping set, and determining the category to which the target text belongs according to the category of the word vector meeting the similarity condition.
The method for calculating the similarity between the word vector of the target text and the word vector in the sample mapping set is various, for example, by calculating the distance between the word vector and the sample mapping set, wherein the distance is Euclidean distance, manhattan distance, chebyshev distance, minkowski distance, mahalanobis distance, hamming distance and the like; the similarity of the word vector of the target text to the word vectors in the set of sample mappings is calculated, for example, by a classification algorithm.
The classification algorithm is an algorithm for classifying vectors, such as a K nearest neighbor (kNN) classification algorithm. The K nearest neighbors are meant to be the K nearest neighbors, meaning that each sample can be represented by its nearest K neighbors. The core idea of the kNN algorithm is that if a sample belongs to a certain class for the most of the k nearest samples in the feature space, then that sample also belongs to that class and has the characteristics of the samples on that class. The method only determines the category to which the sample to be classified belongs according to the category of one or more samples which are nearest to each other in determining the classification decision. The kNN method is related to a very small number of neighboring samples when making a class decision. Since the kNN method relies mainly on surrounding limited neighboring samples, rather than on the method of discriminating class fields, the class to which it belongs is determined.
When a kNN classification algorithm is used, calculating the similarity between a word vector of the target text and each word vector in the sample mapping set, sorting the similarity, setting a K value in the kNN classification method to be a preset value (for example, 20), determining K word vectors with the forefront rank from the sorting, determining the category to which the sample word corresponding to the K word vectors belongs, and taking the category with the largest number of the same categories in the K categories corresponding to the K sample words as the category to which the target text belongs.
The following is a detailed description of one embodiment.
Specific examples:
in this embodiment, the target text to be processed is a rule to be processed, for example, the processing rule is a processing rule about the mobile terminal.
Referring to fig. 4, fig. 4 is a flow chart of a text classification method in a specific embodiment. As shown in FIG. 4, this text classification method includes a sample map set generation process S4-1 and a classification process S4-2.
The sample map set generation process S4-1 includes steps S411 to S415:
in step S411, a plurality of seed words are selected for each category, and 50 seed words are selected for each category in order to ensure the classification effect.
Step S412, expanding the seed words to obtain similar words, obtaining a plurality of (e.g. 10) paraphrasing words for each seed word, and using the similar words under each category as sample words to obtain a plurality of (e.g. 500) sample words corresponding to each category.
When the near-meaning words are obtained, the similarity of the seed words and other candidate words is calculated through word vectors, then the similarity is ordered, and a preset number of similar words with larger similarity of each seed word are extracted to realize expansion.
Step S413, merging the similar words of all the seed words in each category, and removing the repeated sample words to obtain a sample word set of each category.
In step S414, a word vector of each sample word in the sample word set of each category is determined, and finally a sample mapping set is formed, where the dimensions of the vector of each sample word are the same (e.g. 40 dimensions).
In step S415, a dictionary is determined, the dictionary including sample words in the set of sample mappings.
The classification process S4-2 includes steps S421 to S437:
step S421, matching the rule to be processed with each word in the dictionary.
Step S422, it is determined whether the rule to be processed hits the dictionary, i.e., whether the rule to be processed is a word in the dictionary, and if yes, go to step S423, and if no, go to step S425.
In step S423, it is determined that the constituent form of the rule to be processed is a first type constituent form (or referred to as a word type constituent form).
Step S424, generating a word vector of the rule to be processed, specifically, determining a word vector corresponding to the rule to be processed according to a one-to-one mapping relationship in the query sample word mapping set, and turning to step S436.
In step S425, the length of the rule to be processed, i.e. the number of words contained in the rule to be processed, is determined.
Step S426, judging whether the length of the rule to be processed is larger than the preset length (e.g. 6), if yes, turning to step S427, if no, turning to step S432.
In step S427, it is determined that the constituent form of the rule to be processed is a second type constituent form (or referred to as phrase type constituent form).
Step S428, word segmentation operation is carried out on the rule to be processed to obtain a plurality of effective composition words;
step S429, selecting the effective constituent words belonging to the dictionary from the plurality of effective constituent words.
Step S430, determining word vectors corresponding to the selected effective component words according to the one-to-one mapping relation in the query sample word mapping set.
Step S431, generating a word vector of the rule to be processed through a weighted average operation, namely calculating an average vector of the word vectors of the selected effective component words, taking the average vector as the word vector of the rule to be processed, and jumping to step S436.
In step S432, it is determined that the formation of the rule to be processed is a third type formation (or referred to as an unregistered word type formation).
Step S433, three sliding window type splitting is carried out on the rule to be processed; for example: the first sliding window has a length of 1 character, the second sliding window has a length of 2 characters, and the third sliding window has a length of 3 characters.
And S434, selecting unit words belonging to the dictionary from the unit words split through the three sliding windows, and discarding the unit words not belonging to the dictionary from the split unit words.
In step S435, a word vector of the rule to be processed is generated through a weighted average operation, that is, an average vector of word vectors corresponding to all the selected unit words is calculated, and the average vector is used as the word vector of the rule to be processed.
Step S436, determining the category of the rule to be processed by using the word vector and the sample mapping set of the rule to be processed according to the KNN classification mode.
Step S437, outputting the category.
The embodiment also provides a text classification device. Referring to fig. 5, fig. 5 is a block diagram illustrating a text classification apparatus according to an exemplary embodiment. As shown in fig. 5, the text classification apparatus includes:
a first determining module 501 determines a formation mode of the target text according to the dictionary;
a generating module 502, configured to generate a word vector of the target text according to a word vector generating method corresponding to the configuration mode;
a classification module 503, configured to classify the target text according to the word vector and the sample mapping set of the target text;
the sample word mapping set comprises a plurality of subsets, each subset comprises a one-to-one mapping relation between a plurality of sample words and word vectors, sample words contained in different subsets belong to different categories, and the dimensions of all word vectors in the sample word mapping set are the same; the dictionary is a subset of all sample words in the set of sample word mappings.
The embodiment also provides a text classification device. In this apparatus, a first determining module 501 includes: and the second determining module is used for determining that the composition mode of the target text is a first mode when the target text is determined to be a sample word in the dictionary.
The generating module 502 includes: and the first execution module is used for generating the word vector of the target text according to the word vector generation method corresponding to the first mode.
A first execution module comprising: and the first query module is used for querying the one-to-one mapping relation in the sample word mapping set to determine the word vector corresponding to the target text.
The embodiment also provides a text classification device. Referring to fig. 6, fig. 6 is a block diagram illustrating a text classification apparatus according to an exemplary embodiment. As shown in fig. 6, the first determining module 501 in this apparatus includes: and a third determining module 601, configured to determine that the target text is not a sample word in the dictionary, and determine that the configuration mode of the target text is the second mode when the length of the target text is greater than or equal to the set length.
The generating module 502 includes: and a second execution module 602, configured to generate a word vector of the target text according to the word vector generation method corresponding to the second mode.
A second execution module 602, comprising:
the word segmentation module 603 is configured to obtain at least one valid composition word after performing a word segmentation operation on the target text;
a first selection module 604, configured to select valid component words belonging to the dictionary from the valid component words;
a second query module 605, configured to query a one-to-one mapping relationship in the sample word mapping set to determine a word vector corresponding to the selected valid component word;
a fourth determining module 606, configured to determine a word vector corresponding to the target text according to the word vector corresponding to each selected valid component word.
The embodiment also provides a text classification device. Referring to fig. 7, fig. 7 is a block diagram illustrating a text classification apparatus according to an exemplary embodiment. As shown in fig. 7, the first determining module 501 in this apparatus includes: a fourth determining module 701, configured to determine that the target text is not a sample word in the dictionary, and determine that the configuration mode of the target text is a third mode when the length of the target text is less than a set length.
The generating module 502 includes: and a third execution module 702, configured to generate a word vector of the target text according to the word vector generation method corresponding to the third mode.
The third execution module 702 includes:
the splitting module 703 is configured to split the target text by at least one sliding window respectively, where window lengths of different sliding windows are different;
a second selection module 704, configured to select a unit word belonging to the dictionary from unit words obtained by splitting using each sliding window;
a third query module 705, configured to query a one-to-one mapping relationship in the sample word mapping set to determine word vectors corresponding to all the selected unit words;
and a fifth determining module 706, configured to determine a word vector corresponding to the target text according to the word vectors corresponding to all the selected unit words.
The length of the sliding window used in the at least one sliding window type split is N character lengths from 1 character length to M character length, M being an integer greater than 1, N being less than or equal to M.
The embodiment also provides a text classification device. Referring to fig. 8, fig. 8 is a block diagram illustrating a text classification apparatus according to an exemplary embodiment. As shown in fig. 8, the classification module 503 includes:
a calculating module 801, configured to calculate a similarity between a word vector of the target text and a word vector in the sample mapping set;
A sixth determining module 802 determines, according to the similarity, a category to which the target text belongs.
Embodiments herein also provide a non-transitory computer readable storage medium, which when executed by a processor of a mobile terminal, causes the mobile terminal to perform a text classification method, the method comprising:
determining the formation mode of the target text according to the dictionary;
generating word vectors of the target text according to the word vector generation method corresponding to the formation mode;
classifying the target text according to the word vector and the sample mapping set of the target text;
the sample word mapping set comprises a plurality of subsets, each subset comprises a one-to-one mapping relation between a plurality of sample words and word vectors, sample words contained in different subsets belong to different categories, and the dimensions of all word vectors in the sample word mapping set are the same; the dictionary is a subset of all sample words in the set of sample word mappings.
Fig. 9 is a block diagram illustrating a text classification device 900 according to an exemplary embodiment. For example, apparatus 900 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, or the like.
Referring to fig. 9, apparatus 900 may include one or more of the following components: a processing component 902, a memory 904, a power component 906, a multimedia component 908, an audio component 910, an input/output (I/O) interface 912, a sensor component 914, and a communication component 916.
The processing component 902 generally controls overall operations of the apparatus 900, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 902 may include one or more processors 920 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 902 can include one or more modules that facilitate interaction between the processing component 902 and other components. For example, the processing component 902 can include a multimedia module to facilitate interaction between the multimedia component 908 and the processing component 902.
The memory 904 is configured to store various types of data to support operations at the device 900. Examples of such data include instructions for any application or method operating on the device 900, contact data, phonebook data, messages, pictures, videos, and the like. The memory 904 may be implemented by any type of volatile or nonvolatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
The power component 906 provides power to the various components of the device 900. Power components 906 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for device 900.
The multimedia component 908 comprises a screen between the device 900 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 908 includes a front-facing camera and/or a rear-facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 900 is in an operational mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.
The audio component 910 is configured to output and/or input audio signals. For example, the audio component 910 includes a Microphone (MIC) configured to receive external audio signals when the device 900 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 904 or transmitted via the communication component 916. In some embodiments, the audio component 910 further includes a speaker for outputting audio signals.
The I/O interface 912 provides an interface between the processing component 902 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.
The sensor assembly 914 includes one or more sensors for providing status assessment of various aspects of the apparatus 900. For example, the sensor assembly 914 may detect the on/off state of the device 900, the relative positioning of the components, such as the display and keypad of the apparatus 900, the sensor assembly 914 may also detect the change in position of the apparatus 900 or one component of the apparatus 900, the presence or absence of user contact with the apparatus 900, the orientation or acceleration/deceleration of the apparatus 900, and the change in temperature of the apparatus 900. The sensor assembly 914 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 914 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 914 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 916 is configured to facilitate communication between the apparatus 900 and other devices in a wired or wireless manner. The device 900 may access a wireless network based on a communication standard, such as WiFi,2G, or 3G, or a combination thereof. In one exemplary embodiment, the communication component 916 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 916 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, apparatus 900 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as a memory 904 including instructions executable by the processor 920 of the apparatus 900 to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit herein being indicated by the following claims.

Claims (13)

1. A method of text classification, comprising:
determining the formation mode of the target text according to the dictionary;
generating word vectors of the target text according to the word vector generation method corresponding to the formation mode;
classifying the target text according to the word vector and the sample mapping set of the target text;
the sample word mapping set comprises a plurality of subsets, each subset comprises a one-to-one mapping relation between a plurality of sample words and word vectors, sample words contained in different subsets belong to different categories, and the dimensions of all word vectors in the sample word mapping set are the same; the dictionary is a subset of all sample words in the set of sample word mappings.
2. The text classification method of claim 1,
the method for determining the formation mode of the target text according to the dictionary comprises the following steps: when the target text is determined to be a sample word in the dictionary, determining that the construction mode of the target text is a first mode;
the word vector generation method corresponding to the first mode comprises the following steps: and inquiring a one-to-one mapping relation in the sample word mapping set to determine a word vector corresponding to the target text.
3. The text classification method of claim 1,
the method for determining the formation mode of the target text according to the dictionary comprises the following steps: determining that the target text is not a sample word in the dictionary, and determining that the construction mode of the target text is a second mode when the length of the target text is greater than or equal to a set length;
the word vector generation method corresponding to the second mode comprises the following steps: and obtaining at least one effective composition word after word segmentation operation is carried out on the target text, selecting the effective composition word belonging to the dictionary from the effective composition words, inquiring a one-to-one mapping relation in a sample word mapping set to determine word vectors corresponding to the selected effective composition word, and determining word vectors corresponding to the target text according to the word vectors corresponding to each selected effective composition word.
4. The text classification method of claim 1,
the method for determining the formation mode of the target text according to the dictionary comprises the following steps: determining that the target text is not a sample word in the dictionary, and determining that the formation mode of the target text is a third mode when the length of the target text is smaller than a set length;
The word vector generation method corresponding to the third mode comprises the following steps: and respectively carrying out at least one sliding window type splitting on the target text, wherein window lengths of different sliding windows are different, selecting unit words belonging to the dictionary from unit words obtained by splitting through each sliding window type, inquiring a one-to-one mapping relation in the sample word mapping set to determine word vectors corresponding to all selected unit words, and determining word vectors corresponding to the target text according to the word vectors corresponding to all selected unit words.
5. The text classification method of claim 4,
the length of the sliding window used in the at least one sliding window type split is N character lengths from 1 character length to M character length, M being an integer greater than 1, N being less than or equal to M.
6. The text classification method of claim 1,
classifying the target text using the word vector and sample mapping set of the target text, comprising: and calculating the similarity between the word vector of the target text and the word vector in the sample mapping set, and determining the category to which the target text belongs according to the similarity.
7. A text classification device, comprising:
the first determining module is used for determining the formation mode of the target text according to the dictionary;
the generating module is used for generating the word vector of the target text according to the word vector generating method corresponding to the composition mode;
the classification module is used for classifying the target text according to the word vector and the sample mapping set of the target text;
the sample word mapping set comprises a plurality of subsets, each subset comprises a one-to-one mapping relation between a plurality of sample words and word vectors, sample words contained in different subsets belong to different categories, and the dimensions of all word vectors in the sample word mapping set are the same; the dictionary is a subset of all sample words in the set of sample word mappings.
8. The text classification apparatus of claim 7,
the first determining module includes:
the second determining module is used for determining that the composition mode of the target text is a first mode when the target text is a sample word in the dictionary;
the generation module comprises:
the first execution module is used for generating the word vector of the target text according to the word vector generation method corresponding to the first mode;
The first execution module includes:
and the first query module is used for querying the one-to-one mapping relation in the sample word mapping set to determine the word vector corresponding to the target text.
9. The text classification apparatus of claim 7,
the first determining module includes:
a third determining module, configured to determine that the target text is not a sample word in the dictionary, and determine that a configuration mode of the target text is a second mode when a length of the target text is greater than or equal to a set length;
the generation module comprises:
the second execution module is used for generating the word vector of the target text according to the word vector generation method corresponding to the second mode;
the second execution module includes:
the word segmentation module is used for obtaining at least one effective composition word after carrying out word segmentation operation on the target text;
a first selection module, configured to select an effective component word belonging to the dictionary from the effective component words;
the second query module is used for querying one-to-one mapping relation in the sample word mapping set to determine word vectors corresponding to the selected effective component words;
and the fourth determining module is used for determining the word vector corresponding to the target text according to the word vector corresponding to each selected effective composition word.
10. The text classification apparatus of claim 7,
the first determining module includes:
a fourth determining module, configured to determine that the target text is not a sample word in the dictionary, and determine that a configuration mode of the target text is a third mode when a length of the target text is less than a set length;
the generation module comprises:
the third execution module is used for generating the word vector of the target text according to the word vector generation method corresponding to the third mode;
the third execution module includes:
the splitting module is used for respectively splitting the target text by at least one sliding window, and window lengths of different sliding windows are different;
the second selection module is used for selecting unit words belonging to the dictionary from the unit words obtained after splitting by using each sliding window;
the third query module is used for querying the one-to-one mapping relation in the sample word mapping set to determine word vectors corresponding to all selected unit words;
and a fifth determining module, configured to determine a word vector corresponding to the target text according to the word vectors corresponding to all the selected unit words.
11. The text classification apparatus of claim 10,
The length of the sliding window used in the at least one sliding window type split is N character lengths from 1 character length to M character length, M being an integer greater than 1, N being less than or equal to M.
12. The text classification apparatus of claim 7,
the classification module comprises:
the computing module is used for computing the similarity between the word vector of the target text and the word vector in the sample mapping set;
and the determining module is used for determining the category to which the target text belongs according to the similarity.
13. A non-transitory computer readable storage medium, which when executed by a processor of a mobile terminal, causes the mobile terminal to perform a text classification method, the method comprising:
determining the formation mode of the target text according to the dictionary;
generating word vectors of the target text according to the word vector generation method corresponding to the formation mode;
classifying the target text according to the word vector and the sample mapping set of the target text;
the sample word mapping set comprises a plurality of subsets, each subset comprises a one-to-one mapping relation between a plurality of sample words and word vectors, sample words contained in different subsets belong to different categories, and the dimensions of all word vectors in the sample word mapping set are the same; the dictionary is a subset of all sample words in the set of sample word mappings.
CN202010114084.7A 2020-02-25 2020-02-25 Text classification method, device and medium Active CN111259158B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010114084.7A CN111259158B (en) 2020-02-25 2020-02-25 Text classification method, device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010114084.7A CN111259158B (en) 2020-02-25 2020-02-25 Text classification method, device and medium

Publications (2)

Publication Number Publication Date
CN111259158A CN111259158A (en) 2020-06-09
CN111259158B true CN111259158B (en) 2023-06-02

Family

ID=70951311

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010114084.7A Active CN111259158B (en) 2020-02-25 2020-02-25 Text classification method, device and medium

Country Status (1)

Country Link
CN (1) CN111259158B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111797226B (en) 2020-06-30 2024-04-05 北京百度网讯科技有限公司 Conference summary generation method and device, electronic equipment and readable storage medium
CN111611394B (en) * 2020-07-03 2021-09-07 中国电子信息产业集团有限公司第六研究所 Text classification method and device, electronic equipment and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011093925A1 (en) * 2010-02-01 2011-08-04 Alibaba Group Holding Limited Method and apparatus of text classification
CN104820703A (en) * 2015-05-12 2015-08-05 武汉数为科技有限公司 Text fine classification method
CN110334209A (en) * 2019-05-23 2019-10-15 平安科技(深圳)有限公司 File classification method, device, medium and electronic equipment
CN110717040A (en) * 2019-09-18 2020-01-21 平安科技(深圳)有限公司 Dictionary expansion method and device, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011093925A1 (en) * 2010-02-01 2011-08-04 Alibaba Group Holding Limited Method and apparatus of text classification
CN104820703A (en) * 2015-05-12 2015-08-05 武汉数为科技有限公司 Text fine classification method
CN110334209A (en) * 2019-05-23 2019-10-15 平安科技(深圳)有限公司 File classification method, device, medium and electronic equipment
CN110717040A (en) * 2019-09-18 2020-01-21 平安科技(深圳)有限公司 Dictionary expansion method and device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郑开雨 ; 竹翠 ; .基于上下文语义的朴素贝叶斯文本分类算法.计算机与现代化.2018,(06),全文. *

Also Published As

Publication number Publication date
CN111259158A (en) 2020-06-09

Similar Documents

Publication Publication Date Title
CN107102746B (en) Candidate word generation method and device and candidate word generation device
US20170154104A1 (en) Real-time recommendation of reference documents
CN107870677B (en) Input method, input device and input device
CN107305438B (en) Method and device for sorting candidate items
CN107564526B (en) Processing method, apparatus and machine-readable medium
CN111832316B (en) Semantic recognition method, semantic recognition device, electronic equipment and storage medium
CN107844199A (en) A kind of input method, system and the device for input
CN111259158B (en) Text classification method, device and medium
CN110069624B (en) Text processing method and device
CN111831806A (en) Semantic integrity determination method and device, electronic equipment and storage medium
CN107424612B (en) Processing method, apparatus and machine-readable medium
CN111222316B (en) Text detection method, device and storage medium
CN110781689B (en) Information processing method, device and storage medium
CN110780749B (en) Character string error correction method and device
CN112199032A (en) Expression recommendation method and device and electronic equipment
WO2023092975A1 (en) Image processing method and apparatus, electronic device, storage medium, and computer program product
CN108073294B (en) Intelligent word forming method and device for intelligent word forming
CN107291259B (en) Information display method and device for information display
CN106959970B (en) Word bank, processing method and device of word bank and device for processing word bank
CN112987941B (en) Method and device for generating candidate words
CN108345590B (en) Translation method, translation device, electronic equipment and storage medium
CN113589954A (en) Data processing method and device and electronic equipment
CN108227952B (en) Method and system for generating custom word and device for generating custom word
CN107665206B (en) Method and system for cleaning user word stock and device for cleaning user word stock
CN107765884B (en) Sliding input method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
CB02 Change of applicant information

Address after: 100085 unit C, building C, lin66, Zhufang Road, Qinghe, Haidian District, Beijing

Applicant after: Beijing Xiaomi pinecone Electronic Co.,Ltd.

Address before: 100085 unit C, building C, lin66, Zhufang Road, Qinghe, Haidian District, Beijing

Applicant before: BEIJING PINECONE ELECTRONICS Co.,Ltd.

CB02 Change of applicant information
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant