CN106372063A - Information processing method and device and terminal - Google Patents
Information processing method and device and terminal Download PDFInfo
- Publication number
- CN106372063A CN106372063A CN201610940520.XA CN201610940520A CN106372063A CN 106372063 A CN106372063 A CN 106372063A CN 201610940520 A CN201610940520 A CN 201610940520A CN 106372063 A CN106372063 A CN 106372063A
- Authority
- CN
- China
- Prior art keywords
- word
- language material
- carry out
- frequency
- synonymous
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
Provided are an information processing method and device and a terminal. The information processing method comprises the steps of segmenting to-be-processed linguistic data, to obtain a plurality of words; performing synonym replacement on at least part of the words so as to obtain new linguistic data; and performing keyword extraction processing on the new linguistic data to obtain one or multiple keywords. According to the technical scheme, the keyword extraction accuracy is improved.
Description
Technical field
The present invention relates to natural language processing field, more particularly, to a kind of information processing method, device and terminal.
Background technology
When extracting key word now, it is mostly the method (as word frequency statisticses) based on statistical nature, or based on text row
The method of sequence (text rank).Algorithm operating based on statistical nature is simple.Method based on text sequence is according to word
Cooccurrence relation determines the contact between word.
But, the algorithm based on statistical nature can ignore the frequency of occurrences not high or in a document position inessential but for literary composition
Shelves have the word of critical significance.Based on text sequence method lack semantic understanding so that same subject but not in same window
The word of mouth cannot associate.
Therefore, the accuracy how improving keyword extraction is a problem demanding prompt solution.
Content of the invention
Present invention solves the technical problem that being the accuracy how improving keyword extraction.
For solving above-mentioned technical problem, the embodiment of the present invention provides a kind of information processing method, comprising:
Word segmentation processing is carried out to pending language material, to obtain multiple words;At least a portion of the plurality of word is entered
Row synonym is replaced, to obtain new language material;Keyword extraction process is carried out to described new language material, one or more to obtain
Key word.
Optionally, described at least a portion to the plurality of word carries out synonym replacement inclusion: according to default synonymous
Dictionary determines the synonymous phrase of at least one of at least a portion of the plurality of word, and wherein synonymous word is listed in together
In adopted phrase;For each synonymous phrase, it is chosen at word frequency highest word in described pending language material, and other words are replaced
It is changed to described word frequency highest word, to obtain described new language material.
Optionally, described at least a portion to the plurality of word carries out also including after synonym replaces it: to described
New language material carries out Screening Treatment, processes for keyword extraction.
Optionally, described at least a portion to the plurality of word carries out also including before synonym replaces it: to described
Multiple words carry out Screening Treatment, to obtain multiple candidate word.
Optionally, described at least a portion to the plurality of word carries out synonym replacement inclusion: according to default synonymous
Dictionary determines the synonymous phrase of at least one of the plurality of candidate word, and wherein synonymous candidate word lists same synonymous phrase in
In;For each synonymous phrase, it is chosen at word frequency highest candidate word in described pending language material, and other candidate word are replaced
For described word frequency highest candidate word, to obtain described new language material.
Optionally, Screening Treatment is carried out using one or more of mode: screened according to part of speech, reservation noun,
Adjective and verb;Screened according to the frequency, retained the word that the frequency is more than frequency threshold value.
Optionally, described keyword extraction that described new language material is carried out processes inclusion: described new language material is united
Meter, to obtain word frequency in described pending language material for the described new language material and positional information;By described new language material and its word
Frequency and positional information input text rank algorithm, carry out keyword extraction to described pending language material.
Optionally, described information processing method also includes: carries out standard to extracting the one or more of key words obtaining
Really property checking, is verified result;According to described the result, each parameter in text rank algorithm is adjusted;Using ginseng
Text rank algorithm after number adjustment extracts described key word again, until the result of described key word meets default wanting
Ask.
Optionally, also include before word segmentation processing being carried out to pending language material: pretreatment is carried out to described pending language material,
To obtain the described pending language material of uniform format.
Optionally, described pretreatment is carried out to described pending language material include: described pending language material is converted to text
Form, to obtain text data;Word is preset to described text data filtering, wherein said default word is one or more of:
Dirty word, sensitive word and stop words;Described text data after filtering is divided according to punctuate.
Optionally, participle: two-way maximum of dictionary is carried out using one or more of mode to described pending language material
Join algorithm, viterbi algorithm, hmm algorithm and crf algorithm.
For solving above-mentioned technical problem, the embodiment of the invention also discloses a kind of information processor, comprising:
Participle unit, is suitable to carry out word segmentation processing to pending language material, to obtain multiple words;Synonym replacement unit,
It is suitable to carry out synonym replacement at least a portion of the plurality of word, to obtain new language material;Keyword extracting unit, fits
In keyword extraction process is carried out to described new language material, to obtain one or more key words.
Optionally, described synonym replacement unit includes: the first synonymous phrase determination subelement, is suitable to according to default synonymous
Dictionary determines the synonymous phrase of at least one of at least a portion of the plurality of word, and wherein synonymous word is listed in together
In adopted phrase;First replacement subelement, is suitable to, for each synonymous phrase, be chosen at word frequency highest in described pending language material
Word, and other words are replaced with described word frequency highest word, to obtain described new language material.
Optionally, described information processing meanss also include: the first screening unit, are suitable to described new language material is screened
Process, process for keyword extraction.
Optionally, described information processing meanss also include: the second screening unit, are suitable to the plurality of word is screened
Process, to obtain multiple candidate word.
Optionally, described synonym replacement unit includes: the second synonymous phrase determination subelement, is suitable to according to default synonymous
Dictionary determines the synonymous phrase of at least one of the plurality of candidate word, and wherein synonymous candidate word lists same synonymous phrase in
In;Second replacement subelement, is suitable to for each synonymous phrase, is chosen at word frequency highest candidate in described pending language material
Word, and other candidate word are replaced with described word frequency highest candidate word, to obtain described new language material.
Optionally, Screening Treatment is carried out using one or more of mode: screened according to part of speech, reservation noun,
Adjective and verb;Screened according to the frequency, retained the word that the frequency is more than frequency threshold value.
Optionally, described keyword extracting unit includes: statistics subelement, is suitable to described new language material is counted,
To obtain word frequency in described pending language material for the described new language material and positional information;Extract subelement, being suitable to will be described new
Language material and its word frequency and positional information input text rank algorithm, keyword extraction is carried out to described pending language material.
Optionally, described information processing meanss also include: authentication unit, be suitable to extract obtain one or more of
Key word carries out Accuracy Verification, is verified result;Adjustment unit, is suitable to according to described the result, text rank be calculated
In method, each parameter is adjusted;Extraction unit, the text rank algorithm after being suitable to using parameter adjustment extracts described key again
Word, until the result of described key word meets preset requirement.
Optionally, described information processing meanss also include: pretreatment unit, are suitable to carry out pre- place to described pending language material
Reason, to obtain the described pending language material of uniform format.
Optionally, described pretreatment unit includes: form conversion subunit, is suitable to for described pending language material to be converted to literary composition
This form, to obtain text data;Filter subelement, be suitable to preset word, wherein said default word to described text data filtering
For one or more of: dirty word, sensitive word and stop words;Divide subelement, be suitable to by filter after described text data by
Sighting target point is divided.
Optionally, described participle unit carries out participle: word using one or more of mode to described pending language material
Allusion quotation self-reinforcing in double directions, viterbi algorithm, hmm algorithm and crf algorithm.
For solving above-mentioned technical problem, the embodiment of the invention also discloses a kind of terminal, described terminal includes described information
Processing meanss.
Compared with prior art, the technical scheme of the embodiment of the present invention has the advantages that
Technical solution of the present invention carries out word segmentation processing to pending language material, to obtain multiple words;To the plurality of word
At least a portion carry out synonym replacement, to obtain new language material;Keyword extraction process is carried out to described new language material, with
Obtain one or more key words.Technical solution of the present invention, before keyword extraction process, carries out information processing, also in advance
It is that at least a portion of the plurality of word after to participle carries out synonym replacement so that carrying out at keyword extraction
The semantic feature of synonymous vocabulary during reason, can be comprised, count synon contribution when determining key word, it is to avoid ignore appearance frequency
Rate is not high but word document to critical significance, and then improves the accuracy of keyword extraction.
Further, described new language material is counted, to obtain described new language material in described pending language material
Word frequency and positional information;By described new language material and its word frequency and positional information input textrank algorithm, to described pending
Language material carries out keyword extraction.Technical solution of the present invention carries out keyword extraction based on textrank algorithm, can avoid ignoring
Position is inessential in a document but word document to critical significance;Meanwhile, by comprising the semantic special of synonymous vocabulary
Levy so that same subject but can not associate in the word of the same window, realize automatically extracting the key word of pending language material, and
Accuracy rate is high.
Brief description
Fig. 1 is a kind of flow chart of information processing method of the embodiment of the present invention;
Fig. 2 is the flow chart of embodiment of the present invention another kind information processing method;
Fig. 3 is a kind of structural representation of information processor of the embodiment of the present invention;
Fig. 4 is the structural representation of embodiment of the present invention another kind information processor.
Specific embodiment
As described in the background art, to ignore the frequency of occurrences high or in literary composition for the algorithm based on statistical nature of prior art
In shelves, position is inessential but word document to critical significance.Semantic understanding is lacked based on the method for text sequence, makes
Obtain same subject but cannot not associate in the word of the same window.
Understandable for enabling the above objects, features and advantages of the present invention to become apparent from, below in conjunction with the accompanying drawings to the present invention
Specific embodiment be described in detail.
Fig. 1 is a kind of flow chart of information processing method of the embodiment of the present invention.
Information processing method shown in Fig. 1 may comprise steps of:
Step s101: word segmentation processing is carried out to pending language material, to obtain multiple words;
Step s102: synonym replacement is carried out at least a portion of the plurality of word, to obtain new language material;
Step s103: keyword extraction process is carried out to described new language material, to obtain one or more key words.
In being embodied as, described pending language material can be one or more texts.First in step s101, by right
Pending language material carries out participle, can obtain multiple words.Then in step s102, to the plurality of word at least one
Divide and carry out synonym replacement, obtain new language material.Finally in step s103, described new language material is carried out at keyword extraction
Reason, to obtain one or more key words.
The execution sequence that the synonym of the embodiment of the present invention is replaced before the execution sequence that keyword extraction is processed, also
It is to say, before keyword extraction is processed, carry out information processing in advance, that is, the plurality of word after to participle
At least a portion carries out synonym replacement so that when carrying out keyword extraction process, can comprise the semantic special of synonymous vocabulary
Levy, such that it is able to determine key word probability when count synon contribution, it is to avoid ignore the frequency of occurrences not high but for literary composition
Shelves have the word of critical significance, and then improve the accuracy of keyword extraction.
In being embodied as, in step s101, can be to be entered to described pending language material using one or more of mode
Row participle: dictionary self-reinforcing in double directions, viterbi algorithm, hidden Markov model (hidden markov model,
Hmm) algorithm and condition random field algorithm (conditional random field algorithm, crf).
It will be apparent to a skilled person that the algorithm that word segmentation processing adopts can be arbitrarily enforceable algorithm,
The embodiment of the present invention is without limitation.
In being embodied as, step s102 may comprise steps of: determines the plurality of word according to default thesaurus
The synonymous phrase of at least one of at least a portion, wherein synonymous word lists in same synonymous phrase;For often together
Adopted phrase, is chosen at word frequency highest word in described pending language material, and other words is replaced with described word frequency highest
Word, to obtain described new language material.Specifically, default thesaurus can be based on Harbin Institute of Technology's " Chinese thesaurus " extended edition
Build with each field thesaurus, or can also be according to the synonymicon of other public publications or self-defining synonym
Dictionary creation.According to default thesaurus, at least a portion of multiple words is traveled through, determine synonymous phrase;And each
Carry out synon replacement in synonymous phrase, will the unified word high for word frequency of all words in each synonymous phrase.For example,
" Fructus Lycopersici esculenti " and " Fructus Lycopersici esculenti " belongs to same synonymous phrase, and " Fructus Lycopersici esculenti " word frequency in described pending language material is higher, then this is same
" Fructus Lycopersici esculenti " in adopted phrase all replaces with " Fructus Lycopersici esculenti ".
It will be appreciated by those skilled in the art that the embodiment of the present invention, adopts when being replaced operation to synonymous phrase
Be " other words are replaced with described word frequency highest word ", thus can reduce the workload of replacement operation;And in tool
Arbitrary synonymous word in this synonymous phrase is replaced with during body is implemented or by the word in synonymous phrase, same to ensure
In adopted phrase, all words is identical, and the embodiment of the present invention is without limitation.
It should be noted that default thesaurus can also be other arbitrarily enforceable thesaurus, the present invention is implemented
Example is without limitation.
In being embodied as, may comprise steps of after step s102: Screening Treatment is carried out to described new language material,
Process for keyword extraction.Specifically, can be so that Screening Treatment be carried out using one or more of mode: according to part of speech
Screened, retained noun, adjective and verb;Screened according to the frequency, retained the word that the frequency is more than frequency threshold value.?
That is, the noun filtering out, adjective, verb or the frequency are more than to the word of frequency threshold value, above-mentioned word is as pass
The probability ratio of keyword is larger, and other parts of speech outside above-mentioned word are very little as the probability of key word;Therefore for pass
For keyword extraction process, can only consider above-mentioned word, in information processing, can be by other words outside above-mentioned word
Filter out, improve execution efficiency.By Screening Treatment, the amount of calculation of keyword extraction step can be reduced, and then improve crucial
The speed that word extracts.
In another specific embodiment of the present invention, may comprise steps of before step s102: to described new language material
Carry out Screening Treatment, to obtain multiple candidate word.Specifically, can be to be carried out at screening using one or more of mode
Reason: screened according to part of speech, retain noun, adjective and verb;Screened according to the frequency, retain the frequency and be more than frequency threshold
The word of value.That is, be more than the word of frequency threshold value for the noun filtering out, adjective, verb or the frequency, above-mentioned
Word is larger as the probability ratio of key word, and other parts of speech outside above-mentioned word as key word probability very
Little;Therefore for keyword extraction process for, can only consider above-mentioned word, in information processing, can by above-mentioned word it
Other outer words filter out, and improve execution efficiency.The embodiment of the present invention is passed through to carry out Screening Treatment before step s102, can
To reduce the amount of calculation of step s102 and step s103, improve the speed of keyword extraction further.
In being embodied as, after carrying out Screening Treatment, step s102 may comprise steps of: according to default thesaurus
Determine the synonymous phrase of at least one of the plurality of candidate word, wherein synonymous candidate word is listed in same synonymous phrase;Right
In each synonymous phrase, it is chosen at word frequency highest candidate word in described pending language material, and other candidate word are replaced with institute
Predicate frequency highest candidate word, to obtain described new language material.Due to having carried out Screening Treatment in advance, therefore enter in step s102
When row synonym is replaced, data volume to be processed greatly reduces, and execution efficiency is improved.
In being embodied as, step s103 may comprise steps of: described new language material is counted, described to obtain
Word frequency in described pending language material for the new language material and positional information;Will be defeated to described new language material and its word frequency and positional information
Enter text rank algorithm, keyword extraction is carried out to described pending language material.Specifically, text rank algorithm can be position
Put weighting text rank algorithm.Specifically, classical text rank algorithm be by mean of relation between adjacent word directly from
Text extracting keywords automatically itself, because without training process, therefore apply more convenient.Position weighting text rank calculates
Method, is on the basis of classical text rank, introduces the weighting covering power of influence, position power of influence and frequency power of influence, to calculate
Relation between adjacent word.The embodiment of the present invention passes through the synonym replacement operation of step s102, in text rank algorithm
On the basis of plus semantic feature extracting key word, can avoid ignoring position in a document inessential but for document, there is pass
The word of key meaning;Meanwhile, by comprise synonymous vocabulary semantic feature so that same subject but not in the word of the same window
Can associate, realize automatically extracting the key word of pending language material, and accuracy rate is high.
In being embodied as, may comprise steps of before step s101: pretreatment is carried out to described pending language material,
To obtain the described pending language material of uniform format.Specifically, described pending language material is converted to text formatting, to obtain literary composition
Notebook data;To described text data filtering preset word, wherein said default word be one or more of: dirty word, sensitive word and
Stop words;Described text data after filtering is divided according to punctuate.More specifically, can be by the text data after filtering
The punctuate of sentence ending in accordance with the instructions, for example, "?”、“!" and "." split and embark on journey and preserve.After the pretreatment of the present embodiment can be
It is convenient that the operation of continuous step provides.
In being embodied as, the information processing method shown in Fig. 1 can also comprise the following steps: described that extraction is obtained
Individual or multiple key words carry out Accuracy Verification, are verified result;According to described the result to each in text rank algorithm
Parameter is adjusted;Extract described key word using the text rank algorithm after parameter adjustment again, until described key word
The result meets preset requirement.The embodiment of the present invention, by verifying to keyword extraction result, then adjusts text
Each parameter in rank algorithm is so that obtained using the accuracy rate that the text rank algorithm after parameter adjustment carries out keyword extraction
Improve further, so that the text rank algorithm after parameter adjustment is applied in the application scenarios of reality.
It should be noted that described preset requirement can be accuracy rate, the concrete numerical value of described preset requirement can basis
Actual applied environment is custom-configured and adaptive modification, and the embodiment of the present invention is without limitation.
In a preferred embodiment, information processing method can refer to Fig. 2, and Fig. 2 is that the embodiment of the present invention is another kind of
The flow chart of information processing method.
Information processing method shown in Fig. 2 may comprise steps of:
Step s201: pretreatment is carried out to pending language material;
Step s202: word segmentation processing is carried out to pending language material, to obtain multiple words;
Step s203: whether the part of speech judging word is noun or verb or adjective, if it is, enter step
S204, otherwise no operates;
Step s204: word is added candidate's dictionary;
Step s205: build default thesaurus;
Step s206: judge whether word wi and word wj is synonym, if it is, entering step s207, otherwise no
Operation;
Step s207: judge whether the word frequency of word wi is more than the word frequency of word wj, if it is, entering step s208,
Otherwise enter step s209;
Step s208: word wj in pending language material is replaced with word wi;
Step s209: word wi in pending language material is replaced with word wj;
Step s210: weight textrank algorithm using position and keyword extraction is carried out to the pending language material after replacing.
In being embodied as, in step s201, unified to pending language material is text formatting, and filters invalid form,
Remove sensitive word, for example, dirty word, sensitive word and stop words;Then big punctuate is pressed to the language material after processing, for example, "?”、“!" and
“." split and embark on journey and preserve.
In being embodied as, in step s202, it is possible to use participle engine is carried out to the pending language material of text formatting point
Word is processed.Specifically, participle engine can adopt dictionary self-reinforcing in double directions, viterbi algorithm, hidden Markov mould
Type (hidden markov model, hmm) algorithm and condition random field algorithm (conditional random field
algorithm,crf).
It will be apparent to a skilled person that the algorithm that participle engine adopts can be arbitrarily enforceable algorithm,
The embodiment of the present invention is without limitation.
In being embodied as, due to part of speech be noun, the word of adjective and verb larger as the probability ratio of key word,
And other parts of speech outside above-mentioned word are very little as the probability of key word;Therefore for keyword extraction is processed, can
Only to consider above-mentioned word, thus in step s203, according to part of speech, the multiple words after participle can be screened, and
In step s204, by part of speech be noun, the word of adjective and verb add candidate's dictionary.Part of speech is not noun, is described
Word and the word of verb, then do not consider, namely no operate.
It is understood that present inventor's embodiment is by the way of being screened according to part of speech.In practical application, also
Can be being screened according to the frequency in step s203, and the word in step s204, the frequency being more than frequency threshold value adds
Candidate's dictionary.
In being embodied as, in step s205, build default thesaurus, synon for carrying out to candidate's dictionary
Judge.Specifically, default thesaurus can be built based on Chinese thesaurus and each field thesaurus.
In another specific embodiment of the present invention, step s205 can also be in the execution of coming of step s201.That is,
Before information processing method, build default thesaurus in advance, to reduce the workload of subsequent step.
In being embodied as, in step s206, judge whether word wi and word wj is synonym, if it is, in step
Judge in s207 whether the word frequency of word wi is more than the word frequency of word wj, if the word frequency of word wi is more than the word frequency of word wj,
Then in step s208, word wj in pending language material is replaced with word wi.If the word frequency of word wi is less than the word of word wj
Frequently, then in step s209, word wi in pending language material is replaced with word wj.That is, will synon word each other
The unified word high for word frequency of wi and word wj.If word wi and word wj is not for synonym, do not consider, Ye Jiwu
Operation.
Furthermore, step s206 to step s209 is a cyclic process, and its operation object is candidate's dictionary.To time
Dictionary is selected repeatedly to be circulated (step s206 is to step s209), until all words in traversal candidate's dictionary.So far, candidate
In dictionary, synon word is all replaced and completes each other, now can enter next step (step s210).
These parts of speech are larger as the probability ratio of text key word, other parts of speech as key word probability very
Little, so directly only considering the word of these parts of speech, improve the execution efficiency of program.
In being embodied as, in step s210, position weighting textrank is utilized to calculate by replacing the pending language material completing
Method carries out keyword extraction process, obtains one or more key words.
The embodiment of the present invention, before keyword extraction process, carries out information processing in advance, that is, after to participle
The plurality of word at least a portion carry out synonym replacement so that carry out keyword extraction process when, can comprise
The semantic feature of synonymous vocabulary, such that it is able to count synon contribution when determining the probability of key word, it is to avoid ignore appearance
Frequency is not high but word document to critical significance, and then improves the accuracy of keyword extraction.It is simultaneously based on
Text rank algorithm carries out keyword extraction, can avoid ignoring position in a document inessential but have key for document
The word of meaning;Meanwhile, by comprise synonymous vocabulary semantic feature so that same subject but can not in the word of the same window
To associate, realize automatically extracting the key word of pending language material, and accuracy rate is high.
In being embodied as, for step s201 shown in Fig. 2 to step s210, keyword extraction model can be set up
Process.That is, by execution step s201 to step s210, establishing keyword extraction model, this model can be to language
Material carries out keyword extraction operation.In order to improve the accuracy of keyword extraction further, after step s210, can also be right
Model further optimizes.Specifically, Accuracy Verification is carried out according to keyword extraction result, be verified result;According to
Described the result is adjusted to each parameter in text rank algorithm;Using the text rank algorithm after parameter adjustment again
Extract described key word, until the result of described key word meets preset requirement.So far, keyword extraction model stability,
It is used directly for the extraction to document key word in practical application scene.
The specific embodiment of the embodiment of the present invention can refer to the information processing method shown in Fig. 1, and here is omitted.
Fig. 3 is a kind of structural representation of information processor of the embodiment of the present invention.
Information processor 30 shown in Fig. 3 may include that participle unit 301, synonym replacement unit 302 and key word
Extraction unit 303.
Wherein, participle unit 301 is suitable to carry out word segmentation processing to pending language material, to obtain multiple words;
Synonym replacement unit 302 is suitable to carry out synonym replacement at least a portion of the plurality of word, to obtain
New language material;
Keyword extracting unit 303 is suitable to carry out keyword extraction process to described new language material, to obtain one or many
Individual key word.
In being embodied as, participle unit 301 can be to be carried out to described pending language material using one or more of mode
Participle: dictionary self-reinforcing in double directions, viterbi algorithm, hmm algorithm and crf algorithm.
It will be apparent to a skilled person that the algorithm that word segmentation processing adopts can be arbitrarily enforceable algorithm,
The embodiment of the present invention is without limitation.
In being embodied as, synonym replacement unit 302 can include the first synonymous phrase determination subelement (not shown) and
First replacement subelement (not shown).
Wherein, the first synonymous phrase determination subelement is suitable to determine the plurality of word at least according to default thesaurus
The synonymous phrase of at least one of part, wherein synonymous word is listed in same synonymous phrase;First replaces subelement fits
In for each synonymous phrase, being chosen at word frequency highest word in described pending language material, and other words are replaced with institute
Predicate frequency highest word, to obtain described new language material.Specifically, default thesaurus can be " synonymous based on Harbin Institute of Technology
Word word woods " extended edition and each field thesaurus build.According to default thesaurus, at least a portion of multiple words is carried out
Traversal, determines synonymous phrase;And carry out synon replacement in each synonymous phrase, will be all in each synonymous phrase
The unified word high for word frequency of word.For example, " Fructus Lycopersici esculenti " and " Fructus Lycopersici esculenti " belongs to same synonymous phrase, and " Fructus Lycopersici esculenti " waits to locate described
In reason language material, word frequency is higher, then " Fructus Lycopersici esculenti " in this synonymous phrase is all replaced with " Fructus Lycopersici esculenti ".
It will be appreciated by those skilled in the art that the embodiment of the present invention, adopts when being replaced operation to synonymous phrase
Be " other words are replaced with described word frequency highest word ", thus can reduce the workload of replacement operation;And in tool
Arbitrary synonymous word in this synonymous phrase is replaced with during body is implemented or by the word in synonymous phrase, same to ensure
In adopted phrase, all words is identical, and the embodiment of the present invention is without limitation.
It should be noted that default thesaurus can also be other arbitrarily enforceable thesaurus, the present invention is implemented
Example is without limitation.
In being embodied as, information processor 30 can also include the first screening unit (not shown), the first screening unit
It is suitable to carry out Screening Treatment to described new language material, process for keyword extraction.Specifically, the first screening unit is permissible
Screening Treatment is carried out using one or more of mode: screened according to part of speech, retain noun, adjective and verb;Root
Screened according to the frequency, retained the word that the frequency is more than frequency threshold value.That is, for the noun filtering out, adjective, moving
Word or the frequency are more than the word of frequency threshold value, and above-mentioned word is larger as the probability ratio of key word, and outside above-mentioned word
Other parts of speech very little as the probability of key word;Therefore for keyword extraction is processed, can only consider upper predicate
Other words outside above-mentioned word, in information processing, can be filtered out by language, improves execution efficiency.By Screening Treatment,
The amount of calculation of keyword extraction step can be reduced, and then improve the speed of keyword extraction.
Specifically, the first screening unit can export the new language material after screening to keyword extracting unit 303.Due to
Carry out Screening Treatment in advance, therefore when keyword extracting unit 303 carries out keyword extraction, data volume to be processed is significantly
Reduce, execution efficiency is improved.
In being embodied as, keyword extracting unit 303 can include counting subelement (not shown) and extract subelement (figure
Do not show).Wherein, count subelement, be suitable to described new language material is counted, wait to locate described to obtain described new language material
Word frequency in reason language material and positional information;Extract subelement, be suitable to described new language material and its word frequency and positional information input
Text rank algorithm, carries out keyword extraction to described pending language material.
The information processor 30 of the embodiment of the present invention, before the execution sequence that keyword extraction is processed, carries out letter in advance
Breath is processed, that is, at least a portion of the plurality of word after to participle carries out synonym replacement so that carrying out
When keyword extraction is processed, the semantic feature of synonymous vocabulary can be comprised, such that it is able to count when determining the probability of key word
Synon contribution, it is to avoid ignore that the frequency of occurrences is not high but word document to critical significance, and then improve key
The accuracy that word extracts.
In being embodied as, information processor 30 can also include pretreatment unit (not shown).Pretreatment unit is suitable to
Pretreatment is carried out to described pending language material, to obtain the described pending language material of uniform format.Specifically, pretreatment unit can
So that pretreated language material to be exported to participle unit 301.More specifically, pretreatment unit can include form conversion subunit
(not shown), filtration subelement (not shown) and division subelement (not shown).
Wherein, form conversion subunit is suitable to described pending language material is converted to text formatting, to obtain text data;
Filter subelement to be suitable to preset word to described text data filtering, wherein said default word is one or more of: dirty word, quick
Sense word and stop words;Divide subelement to be suitable to be divided the described text data after filtering according to punctuate.
The specific embodiment of the embodiment of the present invention can refer to the information processing method shown in Fig. 1 or Fig. 2, no longer superfluous herein
State.
Fig. 4 is the structural representation of embodiment of the present invention another kind information processor.
Information processor shown in Fig. 4 can include pretreatment unit 401, participle unit 402, the second screening unit
403rd, synonym replacement unit 404, keyword extracting unit 405, authentication unit 406, adjustment unit 407 and extraction unit 408;
Wherein, pretreatment unit 401 can include form conversion subunit 4011, filter subelement 4012 and divide subelement 4013;
Synonym replacement unit 404 can include the second same phrase determination subelement 4041 and the second replacement subelement 4042;Crucial
Word extraction unit 405 can include counting subelement 4051 and extract subelement 4052.
In being embodied as, form conversion subunit 4011 is suitable to for described pending language material to be converted to text formatting, with
To text data;Filter subelement 4012 to be suitable to preset word to described text data filtering, wherein said default word is with next
Plant or multiple: dirty word, sensitive word and stop words;Divide subelement 4013 to be suitable to the described text data after filtering according to punctuate
Divided.
In being embodied as, the pre-processed results based on pretreatment unit 401 for the participle unit 402 carry out participle, obtain multiple
Word.Specifically, participle unit 402 can be to carry out participle: word using one or more of mode to described pending language material
Allusion quotation self-reinforcing in double directions, viterbi algorithm, hmm algorithm and crf algorithm.
In being embodied as, the word segmentation result based on participle unit 402 for second screening unit 403 is carried out to the plurality of word
Screening Treatment, to obtain multiple candidate word.That is, being more than frequency for the noun filtering out, adjective, verb or the frequency
The word of subthreshold, above-mentioned word is larger as the probability ratio of key word, and other parts of speech outside above-mentioned word are as pass
The probability of keyword is very little;Therefore for keyword extraction is processed, can only consider above-mentioned word, in information processing,
Other words outside above-mentioned word can be filtered out, improve execution efficiency.The embodiment of the present invention is passed through to replace in synonym
Carry out Screening Treatment before unit 404, synonym replacement unit 404 and the amount of calculation of keyword extracting unit 405 can be reduced,
Improve the speed of keyword extraction further.
In being embodied as, the second synonymous phrase determination subelement 4041 is suitable to be determined according to default thesaurus the plurality of
The synonymous phrase of at least one of candidate word, wherein synonymous candidate word is listed in same synonymous phrase;Second replacement subelement
4042 are suitable to for each synonymous phrase, are chosen at word frequency highest candidate word in described pending language material, and by other candidates
Word replaces with described word frequency highest candidate word, to obtain described new language material.Due to having carried out Screening Treatment in advance, therefore exist
When second synonymous phrase determination subelement 4041 and the second replacement subelement 4042 carry out synonym replacement, data volume to be processed
Greatly reduce, execution efficiency is improved.
In being embodied as, statistics subelement 4051 is suitable to described new language material is counted, to obtain described new language
Expect the word frequency in described pending language material and positional information;Extract subelement 4051 to be suitable to described new language material and its word frequency
Input text rank algorithm with positional information, keyword extraction is carried out to described pending language material.
In the present embodiment, one or more of key words that authentication unit 406 is suitable to extraction is obtained carry out accuracy
Checking, is verified result;Adjustment unit 407 is suitable to according to described the result, each parameter in textrank algorithm be adjusted
Whole;Text rank algorithm after extraction unit 408 is suitable to using parameter adjustment extracts described key word again, until described pass
The result of keyword meets preset requirement.The embodiment of the present invention, by verifying to keyword extraction result, then adjusts
Each parameter in text rank algorithm so that carry out the accuracy rate of keyword extraction using the text rank algorithm after parameter adjustment
It is further enhanced, so that the text rank algorithm after parameter adjustment is applied in the application scenarios of reality.
It should be noted that described preset requirement can be accuracy rate, the concrete numerical value of described preset requirement can basis
Actual applied environment is custom-configured and adaptive modification, and the embodiment of the present invention is without limitation.
The specific embodiment of the embodiment of the present invention can refer to the information processing method shown in Fig. 1 or Fig. 2, no longer superfluous herein
State.
The embodiment of the invention also discloses a kind of terminal, described terminal can include the information processor 30 shown in Fig. 3
Or the information processor 40 shown in Fig. 4.Information processor 30 or information processor 40 can be internally integrated in described end
End is it is also possible to outside is coupled to described terminal.Described terminal can be robot, smart mobile phone, tablet device etc..
One of ordinary skill in the art will appreciate that all or part of step in the various methods of above-described embodiment is can
Completed with the hardware instructing correlation by program, this program can be stored in, in computer-readable recording medium, to store
Medium may include that rom, ram, disk or CD etc..
Although present disclosure is as above, the present invention is not limited to this.Any those skilled in the art, without departing from this
In the spirit and scope of invention, all can make various changes or modifications, therefore protection scope of the present invention should be with claim institute
The scope limiting is defined.
Claims (23)
1. a kind of information processing method is it is characterised in that include:
Word segmentation processing is carried out to pending language material, to obtain multiple words;
Synonym replacement is carried out at least a portion of the plurality of word, to obtain new language material;
Keyword extraction process is carried out to described new language material, to obtain one or more key words.
2. information processing method according to claim 1 is it is characterised in that described to the plurality of word at least one
Divide and carry out synonym replacement inclusion:
Determine the synonymous phrase of at least one of at least a portion of the plurality of word according to default thesaurus, wherein synonymous
Word list in same synonymous phrase;
For each synonymous phrase, it is chosen at word frequency highest word in described pending language material, and other words are replaced with
Described word frequency highest word, to obtain described new language material.
3. information processing method according to claim 1 is it is characterised in that described to the plurality of word at least one
Divide and carry out also including after synonym replaces it:
Screening Treatment is carried out to described new language material, processes for keyword extraction.
4. information processing method according to claim 1 is it is characterised in that described to the plurality of word at least one
Divide and carry out also including before synonym replaces it:
Screening Treatment is carried out to the plurality of word, to obtain multiple candidate word.
5. information processing method according to claim 4 is it is characterised in that described to the plurality of word at least one
Divide and carry out synonym replacement inclusion:
The synonymous phrase of at least one of the plurality of candidate word is determined according to default thesaurus, wherein synonymous candidate word arranges
Enter in same synonymous phrase;
For each synonymous phrase, it is chosen at word frequency highest candidate word in described pending language material, and other candidate word are replaced
It is changed to described word frequency highest candidate word, to obtain described new language material.
6. the information processing method according to claim 3 or 4 is it is characterised in that entered using one or more of mode
Row Screening Treatment:
Screened according to part of speech, retained noun, adjective and verb;
Screened according to the frequency, retained the word that the frequency is more than frequency threshold value.
7. information processing method according to claim 1 is it is characterised in that described carry out key word to described new language material
Extraction process includes:
Described new language material is counted, to obtain word frequency in described pending language material for the described new language material and position letter
Breath;
By described new language material and its word frequency and positional information input text rank algorithm, described pending language material is closed
Keyword extracts.
8. information processing method according to claim 7 is it is characterised in that also include:
Carry out Accuracy Verification to extracting the one or more of key words obtaining, be verified result;
According to described the result, each parameter in text rank algorithm is adjusted;
Extract described key word using the text rank algorithm after parameter adjustment again, until the result of described key word
Meet preset requirement.
9. before information processing method according to claim 1 is it is characterised in that carry out word segmentation processing to pending language material
Also include:
Pretreatment is carried out to described pending language material, to obtain the described pending language material of uniform format.
10. information processing method according to claim 9 it is characterised in that described described pending language material is carried out pre-
Process and include:
Described pending language material is converted to text formatting, to obtain text data;
Word is preset to described text data filtering, wherein said default word is one or more of: dirty word, sensitive word and deactivation
Word;
Described text data after filtering is divided according to punctuate.
11. information processing methods according to claim 1 are it is characterised in that adopt one or more of mode to institute
State pending language material and carry out participle:
Dictionary self-reinforcing in double directions, viterbi algorithm, hmm algorithm and crf algorithm.
A kind of 12. information processors are it is characterised in that include:
Participle unit, is suitable to carry out word segmentation processing to pending language material, to obtain multiple words;
Synonym replacement unit, is suitable to carry out synonym replacement at least a portion of the plurality of word, to obtain new language
Material;
Keyword extracting unit, is suitable to carry out keyword extraction process to described new language material, to obtain one or more keys
Word.
13. information processors according to claim 12 are it is characterised in that described synonym replacement unit includes:
First synonymous phrase determination subelement, is suitable to be determined at least a portion of the plurality of word according to default thesaurus
At least one synonymous phrase, wherein synonymous word lists in same synonymous phrase;
First replacement subelement, is suitable to for each synonymous phrase, is chosen at word frequency highest word in described pending language material,
And other words are replaced with described word frequency highest word, to obtain described new language material.
14. information processors according to claim 12 are it is characterised in that also include:
First screening unit, is suitable to carry out Screening Treatment to described new language material, processes for keyword extraction.
15. information processors according to claim 12 are it is characterised in that also include:
Second screening unit, is suitable to carry out Screening Treatment to the plurality of word, to obtain multiple candidate word.
16. information processors according to claim 15 are it is characterised in that described synonym replacement unit includes:
Second synonymous phrase determination subelement, is suitable to determine at least one of the plurality of candidate word according to default thesaurus
Synonymous phrase, wherein synonymous candidate word is listed in same synonymous phrase;
Second replacement subelement, is suitable to for each synonymous phrase, is chosen at word frequency highest candidate in described pending language material
Word, and other candidate word are replaced with described word frequency highest candidate word, to obtain described new language material.
17. information processors according to claims 14 or 15 are it is characterised in that adopt one or more of mode
Carry out Screening Treatment:
Screened according to part of speech, retained noun, adjective and verb;
Screened according to the frequency, retained the word that the frequency is more than frequency threshold value.
18. information processors according to claim 12 are it is characterised in that described keyword extracting unit includes:
Statistics subelement, is suitable to described new language material is counted, to obtain described new language material in described pending language material
In word frequency and positional information;
Extract subelement, be suitable to, by described new language material and its word frequency and positional information input text rank algorithm, treat to described
Process language material and carry out keyword extraction.
19. information processors according to claim 18 are it is characterised in that also include:
Authentication unit, the one or more of key words being suitable to extraction is obtained carry out Accuracy Verification, are verified result;
Adjustment unit, is suitable to according to described the result, each parameter in text rank algorithm is adjusted;
Extraction unit, the text rank algorithm after being suitable to using parameter adjustment extracts described key word again,
Until the result of described key word meets preset requirement.
20. information processors according to claim 12 are it is characterised in that also include:
Pretreatment unit, is suitable to carry out pretreatment to described pending language material, to obtain the described pending language material of uniform format.
21. information processors according to claim 20 are it is characterised in that described pretreatment unit includes:
Form conversion subunit, is suitable to described pending language material is converted to text formatting, to obtain text data;
Filter subelement, be suitable to preset word to described text data filtering, wherein said default word is one or more of: dirty
Word, sensitive word and stop words;
Divide subelement, be suitable to be divided the described text data after filtering according to punctuate.
22. information processors according to claim 12 it is characterised in that described participle unit adopt following a kind of or
Various ways carry out participle to described pending language material:
Dictionary self-reinforcing in double directions, viterbi algorithm, hmm algorithm and crf algorithm.
A kind of 23. terminals are it is characterised in that include the information processor as described in any one of claim 12 to 22.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610940520.XA CN106372063A (en) | 2016-11-01 | 2016-11-01 | Information processing method and device and terminal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610940520.XA CN106372063A (en) | 2016-11-01 | 2016-11-01 | Information processing method and device and terminal |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106372063A true CN106372063A (en) | 2017-02-01 |
Family
ID=57892989
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610940520.XA Pending CN106372063A (en) | 2016-11-01 | 2016-11-01 | Information processing method and device and terminal |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106372063A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107590124A (en) * | 2017-09-06 | 2018-01-16 | 陈飞 | The method replaced to synonym by scene and compared according to the standard phrase sorted out by scene |
CN107679241A (en) * | 2017-10-27 | 2018-02-09 | 周燕红 | A kind of similar document searching method and device |
CN108090169A (en) * | 2017-12-14 | 2018-05-29 | 上海智臻智能网络科技股份有限公司 | Question sentence extended method and device, storage medium, terminal |
CN108418962A (en) * | 2018-02-13 | 2018-08-17 | 广东欧珀移动通信有限公司 | Information response's method based on brain wave and Related product |
CN108416644A (en) * | 2017-02-09 | 2018-08-17 | 富士通株式会社 | Information output method and information output apparatus |
CN110427613A (en) * | 2019-07-16 | 2019-11-08 | 深圳供电局有限公司 | A kind of near synonym discovery method and its system, computer readable storage medium |
CN110532547A (en) * | 2019-07-31 | 2019-12-03 | 厦门快商通科技股份有限公司 | Building of corpus method, apparatus, electronic equipment and medium |
CN110781296A (en) * | 2019-09-16 | 2020-02-11 | 中国平安人寿保险股份有限公司 | Data classification method based on deep learning and related equipment thereof |
CN111274369A (en) * | 2020-01-09 | 2020-06-12 | 广东小天才科技有限公司 | English word recognition method and device |
CN111291560A (en) * | 2020-03-06 | 2020-06-16 | 深圳前海微众银行股份有限公司 | Sample expansion method, terminal, device and readable storage medium |
CN111709234A (en) * | 2020-05-28 | 2020-09-25 | 北京百度网讯科技有限公司 | Training method and device of text processing model and electronic equipment |
CN113822051A (en) * | 2020-06-19 | 2021-12-21 | 北京彩智科技有限公司 | Data processing method and device and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011022809A (en) * | 2009-07-16 | 2011-02-03 | Dainippon Printing Co Ltd | Important word extraction method, device, program, and recording medium |
CN103823857A (en) * | 2014-02-21 | 2014-05-28 | 浙江大学 | Space information searching method based on natural language processing |
CN104536881A (en) * | 2014-11-28 | 2015-04-22 | 南京慕测信息科技有限公司 | Public testing error report priority sorting method based on natural language analysis |
CN105426361A (en) * | 2015-12-02 | 2016-03-23 | 上海智臻智能网络科技股份有限公司 | Keyword extraction method and device |
-
2016
- 2016-11-01 CN CN201610940520.XA patent/CN106372063A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011022809A (en) * | 2009-07-16 | 2011-02-03 | Dainippon Printing Co Ltd | Important word extraction method, device, program, and recording medium |
CN103823857A (en) * | 2014-02-21 | 2014-05-28 | 浙江大学 | Space information searching method based on natural language processing |
CN104536881A (en) * | 2014-11-28 | 2015-04-22 | 南京慕测信息科技有限公司 | Public testing error report priority sorting method based on natural language analysis |
CN105426361A (en) * | 2015-12-02 | 2016-03-23 | 上海智臻智能网络科技股份有限公司 | Keyword extraction method and device |
Non-Patent Citations (1)
Title |
---|
张莉婧 等: ""基于改进 TextRank的关键词抽取算法"", 《北京印刷学院学报》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108416644A (en) * | 2017-02-09 | 2018-08-17 | 富士通株式会社 | Information output method and information output apparatus |
CN107590124B (en) * | 2017-09-06 | 2020-12-04 | 耀灵人工智能(浙江)有限公司 | Method for replacing synonyms according to scenes and comparing standard phrases classified according to scenes |
CN107590124A (en) * | 2017-09-06 | 2018-01-16 | 陈飞 | The method replaced to synonym by scene and compared according to the standard phrase sorted out by scene |
CN107679241A (en) * | 2017-10-27 | 2018-02-09 | 周燕红 | A kind of similar document searching method and device |
CN108090169A (en) * | 2017-12-14 | 2018-05-29 | 上海智臻智能网络科技股份有限公司 | Question sentence extended method and device, storage medium, terminal |
CN108418962A (en) * | 2018-02-13 | 2018-08-17 | 广东欧珀移动通信有限公司 | Information response's method based on brain wave and Related product |
CN110427613A (en) * | 2019-07-16 | 2019-11-08 | 深圳供电局有限公司 | A kind of near synonym discovery method and its system, computer readable storage medium |
CN110427613B (en) * | 2019-07-16 | 2022-12-13 | 深圳供电局有限公司 | Method and system for finding similar meaning words and computer readable storage medium |
CN110532547A (en) * | 2019-07-31 | 2019-12-03 | 厦门快商通科技股份有限公司 | Building of corpus method, apparatus, electronic equipment and medium |
CN110781296A (en) * | 2019-09-16 | 2020-02-11 | 中国平安人寿保险股份有限公司 | Data classification method based on deep learning and related equipment thereof |
CN111274369A (en) * | 2020-01-09 | 2020-06-12 | 广东小天才科技有限公司 | English word recognition method and device |
CN111291560A (en) * | 2020-03-06 | 2020-06-16 | 深圳前海微众银行股份有限公司 | Sample expansion method, terminal, device and readable storage medium |
CN111709234A (en) * | 2020-05-28 | 2020-09-25 | 北京百度网讯科技有限公司 | Training method and device of text processing model and electronic equipment |
CN113822051A (en) * | 2020-06-19 | 2021-12-21 | 北京彩智科技有限公司 | Data processing method and device and electronic equipment |
CN113822051B (en) * | 2020-06-19 | 2024-01-30 | 北京彩智科技有限公司 | Data processing method and device and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106372063A (en) | Information processing method and device and terminal | |
CN106528532B (en) | Text error correction method, device and terminal | |
TWI664540B (en) | Search word error correction method and device, and weighted edit distance calculation method and device | |
CN105095204B (en) | The acquisition methods and device of synonym | |
CN106610951A (en) | Improved text similarity solving algorithm based on semantic analysis | |
CN104142915A (en) | Punctuation adding method and system | |
CN105068997B (en) | The construction method and device of parallel corpora | |
CN103365838A (en) | Method for automatically correcting syntax errors in English composition based on multivariate features | |
WO2020119432A1 (en) | Speech recognition method and apparatus, and device and storage medium | |
KR101627428B1 (en) | Method for establishing syntactic analysis model using deep learning and apparatus for perforing the method | |
CN110942763B (en) | Speech recognition method and device | |
CN106649253B (en) | Auxiliary control method and system based on rear verifying | |
CN102122297A (en) | Semantic-based Chinese network text emotion extracting method | |
CN108133014B (en) | Triple generation method and device based on syntactic analysis and clustering and user terminal | |
CN104391837A (en) | Intelligent grammatical analysis method based on case semantics | |
CN105869622B (en) | Chinese hot word detection method and device | |
CN110245361B (en) | Phrase pair extraction method and device, electronic equipment and readable storage medium | |
Zeng et al. | Improving N-gram language modeling for code-switching speech recognition | |
CN107424612A (en) | Processing method, device and machine readable media | |
CN104572619A (en) | Application of intelligent robot interaction system in field of investing and financing | |
CN106682642A (en) | Multi-language-oriented behavior identification method and multi-language-oriented behavior identification system | |
CN109213988A (en) | Barrage subject distillation method, medium, equipment and system based on N-gram model | |
CN116665674A (en) | Internet intelligent recruitment publishing method based on voice and pre-training model | |
CN111046168A (en) | Method, apparatus, electronic device, and medium for generating patent summary information | |
CN114254628A (en) | Method and device for quickly extracting hot words by combining user text in voice transcription, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170201 |
|
RJ01 | Rejection of invention patent application after publication |