CN110033236A - A kind of project duplicate checking method and system based on concurrent tasks - Google Patents
A kind of project duplicate checking method and system based on concurrent tasks Download PDFInfo
- Publication number
- CN110033236A CN110033236A CN201910287630.4A CN201910287630A CN110033236A CN 110033236 A CN110033236 A CN 110033236A CN 201910287630 A CN201910287630 A CN 201910287630A CN 110033236 A CN110033236 A CN 110033236A
- Authority
- CN
- China
- Prior art keywords
- word
- participle
- duplicate checking
- num
- project
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/313—Selection or weighting of terms for indexing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3346—Query execution using probabilistic model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
- G06Q10/103—Workflow collaboration or project management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Databases & Information Systems (AREA)
- Human Resources & Organizations (AREA)
- Entrepreneurship & Innovation (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Economics (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention discloses a kind of project duplicate checking method and system based on concurrent tasks, including four steps, relies on Internet technology that internet hot word, everyday expressions are carried out dynamic analysis, forms cloud dictionary.The text information in declaration material is matched with cloud dictionary by characters matching method, is that best participle scheme is obtained by weighted calculation, word frequency is counted and excludes high-frequency " monosyllabic word " with the semantic participle factor by declaration material cutting.The participle subset of the participle subset of current duplicate checking project and history item is returned to the similar value of current duplicate checking project and history item by cosine similarity algorithm CosineSimilar.When big data calculates, using high-capacity and high-speed memory, reasonable employment memory management reduces the frequent read and write access of hard disk, opens parallel multi-thread task, makes full use of system resource, CPU maximum frequency is played, to improve duplicate checking efficiency.
Description
Technical field
Repeatedly or calculating side similar with sundry item whether the present invention relates to judge during approving and initiate a project declaration material
Law technology field, specially a kind of project duplicate checking method and system based on concurrent tasks.
Background technique
It usually requires to fill in a large amount of text declaration material, these materials during project, achievement and reward are declared
Can exist and repeat report, plagiarize the problems such as other people achievements, cause the waste of human and material resources.Previous inspection text repeated work
All it is to be carried out by the way of manual read, since the project information accumulated over a long period is more and more, the job requirement of audit is got over
Carry out higher, a large amount of reading project information of related personnel's needs, and possess superpower memory capability to grasp this work
Technical ability, and comparison workload is big, low efficiency, so that desk checking work is more and more difficult, is difficult to exclude in review process
Repetition reports, plagiarizes the problems such as other people achievements.Although having relevant detection system on the net at present, so obtained from duplicate checking result all
Be it is multifarious, very different, not only duplicate checking efficiency is slow, charge it is also very high, sometimes cannot effectively be tied spending
Fruit.
Summary of the invention
(1) the technical issues of solving
In view of the deficiencies of the prior art, the project duplicate checking method and system based on concurrent tasks that the present invention provides a kind of,
Have many advantages, such as to reduce the frequent read and write access of hard disk, make full use of system resource, solves and examine work more and more difficult, be difficult
It excludes to repeat report, plagiarize the problems such as other people achievements in review process.
(2) technical solution
To achieve the above object, the invention provides the following technical scheme: a kind of project duplicate checking method based on concurrent tasks
And system, comprising the following steps:
Step 1: handling by distributed way, " electron cloud " (Electron Cloud) skill in quantum physics is borrowed
Art, since collecting the everyday expressions and temperature on internet using characteristics such as property, diffusivity, the simultaneities of electron cloud, being transmitted to
Cloud server carries out dynamic analysis, and the word of parsing is saved as cloud dictionary according to temperature arrangement.
Step 2: opening parallel multi-thread task, pass through the details of processor, the utilization rate of CPU, memory usage
And combine concurrent parameter (default=2), calculate openable concurrent thread quantity Num_Threads, retain kernel thread with
The normal operation of guarantee system, system all will be automatically using concurrent multi-thread when for occurring the calculating of high-volume data in subsequent step
Journey task makes full use of system resource, CPU maximum frequency is played, to improve duplicate checking efficiency.
Step 3: the declaration material of current duplicate checking is split as paragraph set, wherein Cur_Sen is the paragraph of declaration material
Set;Sen_1, Sen_2 ..., Sen_n are the paragraphs split.Pass through positive matching method and combine cloud dictionary, by each paragraph
The set with semantic participle is resolved to, wherein Cur_Sen_i_F is the paragraph participle set of positive matching method;Word_1,
Word_2 ..., Word_n are the participles for splitting paragraph;I=1,2 ..., n are the index index of paragraph.By in dictionary
Temperature calculates matching weight score, and wherein Cur_FScore is the average weighted score number of positive matching method paragraph;sum{hot(Word_
I) ^2 } it is the weighted score that participle is calculated by hot function, total number is then calculated by sum function;I=1,2 ..., n are
The index index of participle.It is 0 that matching weight score, which is then arranged, for the word being not present in dictionary.Simultaneously by reverse matching method
In conjunction with cloud dictionary, each paragraph is resolved into the set with semantic participle, wherein Cur_Sen_i_R is reverse matching method
Paragraph participle set;Word_1, Word_2 ..., Word_n are the participles for splitting paragraph;I=1,2 ..., n are the ropes of paragraph
Draw index.Matching weight score is calculated by the temperature in dictionary, wherein Cur_RScore is the weighting of reverse matching method paragraph
Gross score;Sum { hot (Word_i) ^2 } is the weighted score that participle is calculated by hot function, is then calculated by sum function
Total number;I=1,2 ..., n are the index index of participle.Matching weight score is then arranged for the word being not present in dictionary
It is 0.Finally take the participle scheme that the former is taken when participle score is maximum or score value is identical.Max_Score=max Cur_FScore,
Cur_RScore }, cycle calculations are finished until the calculating of all paragraphs, participle set are saved in database to repeat benefit from now on
With.Similarly, if declaration material word segmentation result in history item is empty, also using the method for step 3 to declaration material into
Row parsing calculates best participle scheme and stores into database.
Step 4: current duplicate checking project to be segmented to the participle factor marker of the factor and history item by statistics segmentation methods
Index finds out set, statistics word frequency and exclude high-frequency " monosyllabic word " (as " ", " ", " " etc.).Wherein Cur_Word_
Index is the participle word frequency set to duplicate checking project;
W_ID_1, W_ID_2 ..., W_IN_n are participles because of subindex;Num_1, Num_2 ..., Num_n are the words of participle
Frequently.Wherein His_Word_Index is the participle word frequency set of history item;W_ID_1, W_ID_2 ..., W_IN_n be participle because
Subindex;Num_1, Num_2 ..., Num_n are the word frequency of participle.Current duplicate checking project is calculated by the Map interface of Hash table
Word frequency vector c0=[Num_1, Num_2 ..., Num_n] and history item word frequency vector c1=[Num_1, Num_2 ...,
Num_n], word frequency vector result is constructed into union, wherein Index is the call number of each participle factor;Pass through cosine similarity
Algorithm CosineSimilar returns to the similar value of current duplicate checking project and history item, and similar value is got over closer to 1 similarity
It is high.
Preferably, " electron cloud " (Electron Cloud) is connect with cloud server by Ethernet in step 1.
Preferably, the core number of CPU is more than or equal to two in step 2.
Preferably, its Max_Score of step 3 is maximum score value;Max { Cur_FScore, Cur_RScore } is to pass through
Max returns to maximum value.
Preferably, in step 4 c0 be current duplicate checking project word frequency vector;C1 is the word frequency vector of history item.
(3) beneficial effect
Compared with prior art, the project duplicate checking method and system based on concurrent tasks that the present invention provides a kind of, have
Below the utility model has the advantages that
1, project duplicate checking method and system of this kind based on concurrent tasks rely on Internet technology by internet hot word, often
Dynamic analysis is carried out with word, forms cloud dictionary.By characters matching method to the text information and cloud word in declaration material
Library is matched, and is to obtain best participle scheme, system by weighted calculation with the semantic participle factor by declaration material cutting
Meter word frequency simultaneously excludes high-frequency " monosyllabic word ".The participle subset of the participle subset of current duplicate checking project and history item is passed through
Cosine similarity algorithm CosineSimilar returns to the similar value of current duplicate checking project and history item.When big data calculates,
Using high-capacity and high-speed memory, reasonable employment memory management reduces the frequent read and write access of hard disk, opens parallel multi-thread task,
System resource is made full use of, CPU maximum frequency is played, to improve duplicate checking efficiency.
2, project duplicate checking method and system of this kind based on concurrent tasks, taking full advantage of system resource realizes high efficiency
Duplicate checking function, in addition to this, the hot word collected by cloud technology and everyday expressions provide strong branch for characters matching method
Support, improves the participle accuracy of declaration material, while its scalability is strong, supports to choose plurality of articles duplicate checking simultaneously.
Detailed description of the invention
Fig. 1 is flow chart of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
Referring to Fig. 1, the present invention provides a kind of technical solution: Step 1: handling by distributed way, borrowing quantum object
" electron cloud " (Electron Cloud) technology in reason, since being received using characteristics such as property, diffusivity, the simultaneities of electron cloud
Collect the everyday expressions and temperature on internet, is transmitted to cloud server and carries out dynamic analysis, by the word of parsing according to temperature
Arrangement saves as cloud dictionary.
Step 2: opening parallel multi-thread task, pass through the details of processor, the utilization rate of CPU, memory usage
And combine concurrent parameter (default=2), calculate openable concurrent thread quantity Num_Threads, retain kernel thread with
The normal operation of guarantee system, system all will be automatically using concurrent multi-thread when for occurring the calculating of high-volume data in subsequent step
Journey task makes full use of system resource, CPU maximum frequency is played, to improve duplicate checking efficiency.
Step 3: the declaration material of current duplicate checking is split as paragraph set, wherein Cur_Sen is the paragraph of declaration material
Set;Sen_1, Sen_2 ..., Sen_n are the paragraphs split.Pass through positive matching method and combine cloud dictionary, by each paragraph
The set with semantic participle is resolved to, wherein Cur_Sen_i_F is the paragraph participle set of positive matching method;Word_1,
Word_2 ..., Word_n are the participles for splitting paragraph;I=1,2 ..., n are the index index of paragraph.By in dictionary
Temperature calculates matching weight score, and wherein Cur_FScore is the average weighted score number of positive matching method paragraph;sum{hot(Word_
I) ^2 } it is the weighted score that participle is calculated by hot function, total number is then calculated by sum function;I=1,2 ..., n are
The index index of participle.It is 0 that matching weight score, which is then arranged, for the word being not present in dictionary.Simultaneously by reverse matching method
In conjunction with cloud dictionary, each paragraph is resolved into the set with semantic participle, wherein Cur_Sen_i_R is reverse matching method
Paragraph participle set;Word_1, Word_2 ..., Word_n are the participles for splitting paragraph;I=1,2 ..., n are the ropes of paragraph
Draw index.Matching weight score is calculated by the temperature in dictionary, wherein Cur_RScore is the weighting of reverse matching method paragraph
Gross score;Sum { hot (Word_i) ^2 } is the weighted score that participle is calculated by hot function, is then calculated by sum function
Total number;I=1,2 ..., n are the index index of participle.Matching weight score is then arranged for the word being not present in dictionary
It is 0.Finally take the participle scheme that the former is taken when participle score is maximum or score value is identical.Max_Score=max Cur_FScore,
Cur_RScore }, cycle calculations are finished until the calculating of all paragraphs, participle set are saved in database to repeat benefit from now on
With.Similarly, if declaration material word segmentation result in history item is empty, also using the method for step 3 to declaration material into
Row parsing calculates best participle scheme and stores into database.
Step 4: current duplicate checking project to be segmented to the participle factor marker of the factor and history item by statistics segmentation methods
Index finds out set, statistics word frequency and exclude high-frequency " monosyllabic word " (as " ", " ", " " etc.).Wherein Cur_Word_
Index is the participle word frequency set to duplicate checking project;
W_ID_1, W_ID_2 ..., W_IN_n are participles because of subindex;Num_1, Num_2 ..., Num_n are the words of participle
Frequently.Wherein His_Word_Index is the participle word frequency set of history item;W_ID_1, W_ID_2 ..., W_IN_n be participle because
Subindex;Num_1, Num_2 ..., Num_n are the word frequency of participle.Current duplicate checking project is calculated by the Map interface of Hash table
Word frequency vector c0=[Num_1, Num_2 ..., Num_n] and history item word frequency vector c1=[Num_1, Num_2 ...,
Num_n], word frequency vector result is constructed into union, wherein Index is the call number of each participle factor;Pass through cosine similarity
Algorithm CosineSimilar returns to the similar value of current duplicate checking project and history item, and similar value is got over closer to 1 similarity
It is high.
It is further improved ground, " electron cloud " (Electron Cloud) and cloud server are connected by Ethernet in step 1
It connects.
It is further improved ground, the core number of CPU is more than or equal to two in step 2.
It is further improved ground, its Max_Score of step 3 is maximum score value;Max { Cur_FScore, Cur_RScore } is
Maximum value is returned by max.
It is further improved ground, c0 is the word frequency vector of current duplicate checking project in step 4;C1 be history item word frequency to
Amount.
The electric elements occurred in this article are electrically connected with extraneous main controller and 220V alternating current, and main controller can be meter
Calculation machine etc. plays the conventionally known equipment of control.
In conclusion project duplicate checking method and system of this kind based on concurrent tasks, rely on Internet technology by internet
Hot word, everyday expressions carry out dynamic analysis, form cloud dictionary.By characters matching method in declaration material text information with
Cloud dictionary is matched, and is to obtain best participle by weighted calculation with the semantic participle factor by declaration material cutting
Scheme counts word frequency and excludes high-frequency " monosyllabic word ".By the participle of the participle subset of current duplicate checking project and history item
Subset returns to the similar value of current duplicate checking project and history item by cosine similarity algorithm CosineSimilar.It is counting greatly
When according to calculating, using high-capacity and high-speed memory, reasonable employment memory management reduces the frequent read and write access of hard disk, opens concurrent more
Thread task makes full use of system resource, CPU maximum frequency is played, to improve duplicate checking efficiency.Take full advantage of system resource reality
Efficient duplicate checking function is showed, in addition to this, the hot word collected by cloud technology and everyday expressions are that characters matching method mentions
Powerful support has been supplied, has improved the participle accuracy of declaration material, while its scalability is strong, has supported that choose plurality of articles looks into simultaneously
Weight.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to
Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.
It although an embodiment of the present invention has been shown and described, for the ordinary skill in the art, can be with
A variety of variations, modification, replacement can be carried out to these embodiments without departing from the principles and spirit of the present invention by understanding
And modification, the scope of the present invention is defined by the appended.
Claims (5)
1. a kind of project duplicate checking method and system based on concurrent tasks, which comprises the following steps:
Step 1: handling by distributed way, " electron cloud " (Electron Cloud) technology in quantum physics, benefit are borrowed
Since characteristics such as property, diffusivity, simultaneities with electron cloud, the everyday expressions and temperature on internet are collected, cloud is transmitted to
Server carries out dynamic analysis, and the word of parsing is saved as cloud dictionary according to temperature arrangement.
Step 2: opening parallel multi-thread task, by the details of processor, the utilization rate of CPU, memory usage is simultaneously tied
Merge hair parameter (default=2), calculate openable concurrent thread quantity Num_Threads, retains kernel thread to guarantee
The normal operation of system, system will all be appointed using parallel multi-thread automatically when for occurring the calculating of high-volume data in subsequent step
Business makes full use of system resource, CPU maximum frequency is played, to improve duplicate checking efficiency.
Step 3: the declaration material of current duplicate checking is split as paragraph set, wherein Cur_Sen is the paragraph collection of declaration material
It closes;Sen_1, Sen_2 ..., Sen_n are the paragraphs split.Pass through positive matching method and combine cloud dictionary, by each paragraph solution
Analysis is the set with semanteme participle, and wherein Cur_Sen_i_F is the paragraph participle set of positive matching method;Word_1, Word_
2 ..., Word_n are the participles for splitting paragraph;I=1,2 ..., n are the index index of paragraph.Pass through the fever thermometer in dictionary
Matching weight score is calculated, wherein Cur_FScore is the average weighted score number of positive matching method paragraph;sum{hot(Word_i)^2}
It is the weighted score for calculating participle by hot function, total number is then calculated by sum function;I=1,2 ..., n are participles
Index index.It is 0 that matching weight score, which is then arranged, for the word being not present in dictionary.Pass through reverse matching method and combines cloud
Each paragraph is resolved to the set with semantic participle by terminal word library, and wherein Cur_Sen_i_R is the paragraph point of reverse matching method
Set of words;Word_1, Word_2 ..., Word_n are the participles for splitting paragraph;I=1,2 ..., n are the indexes of paragraph
index.Matching weight score is calculated by the temperature in dictionary, wherein Cur_RScore is that the weighting of reverse matching method paragraph is total
Score;Sum { hot (Word_i) ^2 } is the weighted score that participle is calculated by hot function, is then calculated and is closed by sum function
It counts;I=1,2 ..., n are the index index of participle.Matching weight score, which is then arranged, for the word being not present in dictionary is
0.Finally take the participle scheme that the former is taken when participle score is maximum or score value is identical.Max_Score=max Cur_FScore,
Cur_RScore }, cycle calculations are finished until the calculating of all paragraphs, participle set are saved in database to repeat benefit from now on
With.Similarly, if declaration material word segmentation result in history item is empty, also using the method for step 3 to declaration material into
Row parsing calculates best participle scheme and stores into database.
Step 4: the participle factor marker that current duplicate checking project segments the factor and history item is indexed by statistics segmentation methods
Find out set, count word frequency and exclude high-frequency " monosyllabic word " (as " ", " ", " " etc.).Wherein Cur_Word_
Index is the participle word frequency set to duplicate checking project;W_ID_1, W_ID_2 ..., W_IN_n are participles because of subindex;Num_1,
Num_2 ..., Num_n are the word frequency of participle.Wherein His_Word_Index is the participle word frequency set of history item;W_ID_1,
W_ID_2 ..., W_IN_n are participles because of subindex;Num_1, Num_2 ..., Num_n are the word frequency of participle.Pass through Hash table
Map interface calculate current duplicate checking project word frequency vector c0=[Num_1, Num_2 ..., Num_n] and history item word frequency to
It measures c1=[Num_1, Num_2 ..., Num_n], word frequency vector result is constructed into union, wherein Index is each participle factor
Call number;The similar value of current duplicate checking project and history item is returned by cosine similarity algorithm CosineSimilar, it is similar
It is worth higher closer to 1 similarity.
2. a kind of project duplicate checking method and system based on concurrent tasks according to claim 1, it is characterised in that: step
" electron cloud " (Electron Cloud) is connect with cloud server by Ethernet in one.
3. a kind of project duplicate checking method and system based on concurrent tasks according to claim 1, it is characterised in that: step
The core number of CPU is more than or equal to two in two.
4. a kind of project duplicate checking method and system based on concurrent tasks according to claim 1, it is characterised in that: step
Three its Max_Score are maximum score values;Max { Cur_FScore, Cur_RScore } is to return to maximum value by max.
5. a kind of project duplicate checking method and system based on concurrent tasks according to claim 1, it is characterised in that: step
C0 is the word frequency vector of current duplicate checking project in four;C1 is the word frequency vector of history item.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910287630.4A CN110033236A (en) | 2019-04-11 | 2019-04-11 | A kind of project duplicate checking method and system based on concurrent tasks |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910287630.4A CN110033236A (en) | 2019-04-11 | 2019-04-11 | A kind of project duplicate checking method and system based on concurrent tasks |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110033236A true CN110033236A (en) | 2019-07-19 |
Family
ID=67238022
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910287630.4A Pending CN110033236A (en) | 2019-04-11 | 2019-04-11 | A kind of project duplicate checking method and system based on concurrent tasks |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110033236A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112001161A (en) * | 2020-08-25 | 2020-11-27 | 上海新炬网络信息技术股份有限公司 | Text duplicate checking method |
CN112541057A (en) * | 2019-09-04 | 2021-03-23 | 上海晶赞融宣科技有限公司 | Distributed new word discovery method and device, computer equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103778110A (en) * | 2012-10-25 | 2014-05-07 | 三星电子(中国)研发中心 | Method and system for converting simplified Chinese characters into traditional Chinese characters |
CN104102626A (en) * | 2014-07-07 | 2014-10-15 | 厦门推特信息科技有限公司 | Method for computing semantic similarities among short texts |
US20160358094A1 (en) * | 2015-06-02 | 2016-12-08 | International Business Machines Corporation | Utilizing Word Embeddings for Term Matching in Question Answering Systems |
CN107122340A (en) * | 2017-03-30 | 2017-09-01 | 浙江省科技信息研究院 | A kind of similarity detection method for the science and technology item return analyzed based on synonym |
CN107153658A (en) * | 2016-03-03 | 2017-09-12 | 常州普适信息科技有限公司 | A kind of public sentiment hot word based on weighted keyword algorithm finds method |
CN108536530A (en) * | 2018-04-02 | 2018-09-14 | 北京中电普华信息技术有限公司 | A kind of multithreading method for scheduling task and device |
-
2019
- 2019-04-11 CN CN201910287630.4A patent/CN110033236A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103778110A (en) * | 2012-10-25 | 2014-05-07 | 三星电子(中国)研发中心 | Method and system for converting simplified Chinese characters into traditional Chinese characters |
CN104102626A (en) * | 2014-07-07 | 2014-10-15 | 厦门推特信息科技有限公司 | Method for computing semantic similarities among short texts |
US20160358094A1 (en) * | 2015-06-02 | 2016-12-08 | International Business Machines Corporation | Utilizing Word Embeddings for Term Matching in Question Answering Systems |
CN107153658A (en) * | 2016-03-03 | 2017-09-12 | 常州普适信息科技有限公司 | A kind of public sentiment hot word based on weighted keyword algorithm finds method |
CN107122340A (en) * | 2017-03-30 | 2017-09-01 | 浙江省科技信息研究院 | A kind of similarity detection method for the science and technology item return analyzed based on synonym |
CN108536530A (en) * | 2018-04-02 | 2018-09-14 | 北京中电普华信息技术有限公司 | A kind of multithreading method for scheduling task and device |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112541057A (en) * | 2019-09-04 | 2021-03-23 | 上海晶赞融宣科技有限公司 | Distributed new word discovery method and device, computer equipment and storage medium |
CN112001161A (en) * | 2020-08-25 | 2020-11-27 | 上海新炬网络信息技术股份有限公司 | Text duplicate checking method |
CN112001161B (en) * | 2020-08-25 | 2024-01-19 | 上海新炬网络信息技术股份有限公司 | Text duplicate checking method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105335496B (en) | Customer service based on cosine similarity text mining algorithm repeats call processing method | |
CN105117426B (en) | A kind of intellectual coded searching method of customs | |
CN104077407B (en) | A kind of intelligent data search system and method | |
CN109684422A (en) | A kind of single index prediction of the big data platform based on artificial intelligence and method for early warning | |
WO2022156328A1 (en) | Restful-type web service clustering method fusing service cooperation relationships | |
CN107577724A (en) | A kind of big data processing method | |
CN106095951B (en) | Data space multi-dimensional indexing method based on load balancing and inquiry log | |
CN110188122A (en) | A kind of association relationship analysis method between different line loss behaviors | |
CN110033236A (en) | A kind of project duplicate checking method and system based on concurrent tasks | |
CN107341199A (en) | A kind of recommendation method based on documentation & info general model | |
CN111461521A (en) | Residential housing vacancy rate analysis method based on electric power big data | |
CN106649687A (en) | Method and device for on-line analysis and processing of large data | |
CN109492022A (en) | The searching method of semantic-based improved k-means algorithm | |
CN110244099A (en) | Stealing detection method based on user's voltage | |
CN105740434B (en) | Network information methods of marking and device | |
Bhattacharya et al. | High utility itemset mining | |
CN107766500A (en) | The auditing method of fixed assets card | |
Zhang et al. | Application of data mining techniques in the analysis of fire incidents | |
Li et al. | Does firm's value matter with firm's patent quality in technology-intensive industries? | |
Shah et al. | Performance study of time series databases | |
CN114240041A (en) | Lean line loss analysis method and system for distribution network distribution area | |
CN110287114A (en) | A kind of method and device of database script performance test | |
CN110134646A (en) | The storage of knowledge platform service data and integrated approach and system | |
Wang et al. | Speed up big data analytics by unveiling the storage distribution of sub-datasets | |
CN106844539A (en) | Real-time data analysis method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20191219 Address after: 250001, No. two, No. 150, Ji'nan, Shandong Applicant after: STATE GRID SHANDONG ELECTRIC POWER Co. Applicant after: SHANDONG LUNENG SOFTWARE TECHNOLOGY Co.,Ltd. Address before: 250001, No. two, No. 150, Ji'nan, Shandong Applicant before: STATE GRID SHANDONG ELECTRIC POWER Co. |
|
TA01 | Transfer of patent application right | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190719 |
|
RJ01 | Rejection of invention patent application after publication |