CN110033236A - A kind of project duplicate checking method and system based on concurrent tasks - Google Patents

A kind of project duplicate checking method and system based on concurrent tasks Download PDF

Info

Publication number
CN110033236A
CN110033236A CN201910287630.4A CN201910287630A CN110033236A CN 110033236 A CN110033236 A CN 110033236A CN 201910287630 A CN201910287630 A CN 201910287630A CN 110033236 A CN110033236 A CN 110033236A
Authority
CN
China
Prior art keywords
word
participle
duplicate checking
num
project
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910287630.4A
Other languages
Chinese (zh)
Inventor
李�荣
白万建
李冬
李勇
李庆文
何召慧
于展鹏
邢宏伟
王刚
戚鲁凤
王宗光
夏光明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Shandong Electric Power Co Ltd
Shandong Luneng Software Technology Co Ltd
Original Assignee
State Grid Shandong Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Shandong Electric Power Co Ltd filed Critical State Grid Shandong Electric Power Co Ltd
Priority to CN201910287630.4A priority Critical patent/CN110033236A/en
Publication of CN110033236A publication Critical patent/CN110033236A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Databases & Information Systems (AREA)
  • Human Resources & Organizations (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses a kind of project duplicate checking method and system based on concurrent tasks, including four steps, relies on Internet technology that internet hot word, everyday expressions are carried out dynamic analysis, forms cloud dictionary.The text information in declaration material is matched with cloud dictionary by characters matching method, is that best participle scheme is obtained by weighted calculation, word frequency is counted and excludes high-frequency " monosyllabic word " with the semantic participle factor by declaration material cutting.The participle subset of the participle subset of current duplicate checking project and history item is returned to the similar value of current duplicate checking project and history item by cosine similarity algorithm CosineSimilar.When big data calculates, using high-capacity and high-speed memory, reasonable employment memory management reduces the frequent read and write access of hard disk, opens parallel multi-thread task, makes full use of system resource, CPU maximum frequency is played, to improve duplicate checking efficiency.

Description

A kind of project duplicate checking method and system based on concurrent tasks
Technical field
Repeatedly or calculating side similar with sundry item whether the present invention relates to judge during approving and initiate a project declaration material Law technology field, specially a kind of project duplicate checking method and system based on concurrent tasks.
Background technique
It usually requires to fill in a large amount of text declaration material, these materials during project, achievement and reward are declared Can exist and repeat report, plagiarize the problems such as other people achievements, cause the waste of human and material resources.Previous inspection text repeated work All it is to be carried out by the way of manual read, since the project information accumulated over a long period is more and more, the job requirement of audit is got over Carry out higher, a large amount of reading project information of related personnel's needs, and possess superpower memory capability to grasp this work Technical ability, and comparison workload is big, low efficiency, so that desk checking work is more and more difficult, is difficult to exclude in review process Repetition reports, plagiarizes the problems such as other people achievements.Although having relevant detection system on the net at present, so obtained from duplicate checking result all Be it is multifarious, very different, not only duplicate checking efficiency is slow, charge it is also very high, sometimes cannot effectively be tied spending Fruit.
Summary of the invention
(1) the technical issues of solving
In view of the deficiencies of the prior art, the project duplicate checking method and system based on concurrent tasks that the present invention provides a kind of, Have many advantages, such as to reduce the frequent read and write access of hard disk, make full use of system resource, solves and examine work more and more difficult, be difficult It excludes to repeat report, plagiarize the problems such as other people achievements in review process.
(2) technical solution
To achieve the above object, the invention provides the following technical scheme: a kind of project duplicate checking method based on concurrent tasks And system, comprising the following steps:
Step 1: handling by distributed way, " electron cloud " (Electron Cloud) skill in quantum physics is borrowed Art, since collecting the everyday expressions and temperature on internet using characteristics such as property, diffusivity, the simultaneities of electron cloud, being transmitted to Cloud server carries out dynamic analysis, and the word of parsing is saved as cloud dictionary according to temperature arrangement.
Step 2: opening parallel multi-thread task, pass through the details of processor, the utilization rate of CPU, memory usage And combine concurrent parameter (default=2), calculate openable concurrent thread quantity Num_Threads, retain kernel thread with The normal operation of guarantee system, system all will be automatically using concurrent multi-thread when for occurring the calculating of high-volume data in subsequent step Journey task makes full use of system resource, CPU maximum frequency is played, to improve duplicate checking efficiency.
Step 3: the declaration material of current duplicate checking is split as paragraph set, wherein Cur_Sen is the paragraph of declaration material Set;Sen_1, Sen_2 ..., Sen_n are the paragraphs split.Pass through positive matching method and combine cloud dictionary, by each paragraph The set with semantic participle is resolved to, wherein Cur_Sen_i_F is the paragraph participle set of positive matching method;Word_1, Word_2 ..., Word_n are the participles for splitting paragraph;I=1,2 ..., n are the index index of paragraph.By in dictionary Temperature calculates matching weight score, and wherein Cur_FScore is the average weighted score number of positive matching method paragraph;sum{hot(Word_ I) ^2 } it is the weighted score that participle is calculated by hot function, total number is then calculated by sum function;I=1,2 ..., n are The index index of participle.It is 0 that matching weight score, which is then arranged, for the word being not present in dictionary.Simultaneously by reverse matching method In conjunction with cloud dictionary, each paragraph is resolved into the set with semantic participle, wherein Cur_Sen_i_R is reverse matching method Paragraph participle set;Word_1, Word_2 ..., Word_n are the participles for splitting paragraph;I=1,2 ..., n are the ropes of paragraph Draw index.Matching weight score is calculated by the temperature in dictionary, wherein Cur_RScore is the weighting of reverse matching method paragraph Gross score;Sum { hot (Word_i) ^2 } is the weighted score that participle is calculated by hot function, is then calculated by sum function Total number;I=1,2 ..., n are the index index of participle.Matching weight score is then arranged for the word being not present in dictionary It is 0.Finally take the participle scheme that the former is taken when participle score is maximum or score value is identical.Max_Score=max Cur_FScore, Cur_RScore }, cycle calculations are finished until the calculating of all paragraphs, participle set are saved in database to repeat benefit from now on With.Similarly, if declaration material word segmentation result in history item is empty, also using the method for step 3 to declaration material into Row parsing calculates best participle scheme and stores into database.
Step 4: current duplicate checking project to be segmented to the participle factor marker of the factor and history item by statistics segmentation methods Index finds out set, statistics word frequency and exclude high-frequency " monosyllabic word " (as " ", " ", " " etc.).Wherein Cur_Word_ Index is the participle word frequency set to duplicate checking project;
W_ID_1, W_ID_2 ..., W_IN_n are participles because of subindex;Num_1, Num_2 ..., Num_n are the words of participle Frequently.Wherein His_Word_Index is the participle word frequency set of history item;W_ID_1, W_ID_2 ..., W_IN_n be participle because Subindex;Num_1, Num_2 ..., Num_n are the word frequency of participle.Current duplicate checking project is calculated by the Map interface of Hash table Word frequency vector c0=[Num_1, Num_2 ..., Num_n] and history item word frequency vector c1=[Num_1, Num_2 ..., Num_n], word frequency vector result is constructed into union, wherein Index is the call number of each participle factor;Pass through cosine similarity Algorithm CosineSimilar returns to the similar value of current duplicate checking project and history item, and similar value is got over closer to 1 similarity It is high.
Preferably, " electron cloud " (Electron Cloud) is connect with cloud server by Ethernet in step 1.
Preferably, the core number of CPU is more than or equal to two in step 2.
Preferably, its Max_Score of step 3 is maximum score value;Max { Cur_FScore, Cur_RScore } is to pass through Max returns to maximum value.
Preferably, in step 4 c0 be current duplicate checking project word frequency vector;C1 is the word frequency vector of history item.
(3) beneficial effect
Compared with prior art, the project duplicate checking method and system based on concurrent tasks that the present invention provides a kind of, have Below the utility model has the advantages that
1, project duplicate checking method and system of this kind based on concurrent tasks rely on Internet technology by internet hot word, often Dynamic analysis is carried out with word, forms cloud dictionary.By characters matching method to the text information and cloud word in declaration material Library is matched, and is to obtain best participle scheme, system by weighted calculation with the semantic participle factor by declaration material cutting Meter word frequency simultaneously excludes high-frequency " monosyllabic word ".The participle subset of the participle subset of current duplicate checking project and history item is passed through Cosine similarity algorithm CosineSimilar returns to the similar value of current duplicate checking project and history item.When big data calculates, Using high-capacity and high-speed memory, reasonable employment memory management reduces the frequent read and write access of hard disk, opens parallel multi-thread task, System resource is made full use of, CPU maximum frequency is played, to improve duplicate checking efficiency.
2, project duplicate checking method and system of this kind based on concurrent tasks, taking full advantage of system resource realizes high efficiency Duplicate checking function, in addition to this, the hot word collected by cloud technology and everyday expressions provide strong branch for characters matching method Support, improves the participle accuracy of declaration material, while its scalability is strong, supports to choose plurality of articles duplicate checking simultaneously.
Detailed description of the invention
Fig. 1 is flow chart of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
Referring to Fig. 1, the present invention provides a kind of technical solution: Step 1: handling by distributed way, borrowing quantum object " electron cloud " (Electron Cloud) technology in reason, since being received using characteristics such as property, diffusivity, the simultaneities of electron cloud Collect the everyday expressions and temperature on internet, is transmitted to cloud server and carries out dynamic analysis, by the word of parsing according to temperature Arrangement saves as cloud dictionary.
Step 2: opening parallel multi-thread task, pass through the details of processor, the utilization rate of CPU, memory usage And combine concurrent parameter (default=2), calculate openable concurrent thread quantity Num_Threads, retain kernel thread with The normal operation of guarantee system, system all will be automatically using concurrent multi-thread when for occurring the calculating of high-volume data in subsequent step Journey task makes full use of system resource, CPU maximum frequency is played, to improve duplicate checking efficiency.
Step 3: the declaration material of current duplicate checking is split as paragraph set, wherein Cur_Sen is the paragraph of declaration material Set;Sen_1, Sen_2 ..., Sen_n are the paragraphs split.Pass through positive matching method and combine cloud dictionary, by each paragraph The set with semantic participle is resolved to, wherein Cur_Sen_i_F is the paragraph participle set of positive matching method;Word_1, Word_2 ..., Word_n are the participles for splitting paragraph;I=1,2 ..., n are the index index of paragraph.By in dictionary Temperature calculates matching weight score, and wherein Cur_FScore is the average weighted score number of positive matching method paragraph;sum{hot(Word_ I) ^2 } it is the weighted score that participle is calculated by hot function, total number is then calculated by sum function;I=1,2 ..., n are The index index of participle.It is 0 that matching weight score, which is then arranged, for the word being not present in dictionary.Simultaneously by reverse matching method In conjunction with cloud dictionary, each paragraph is resolved into the set with semantic participle, wherein Cur_Sen_i_R is reverse matching method Paragraph participle set;Word_1, Word_2 ..., Word_n are the participles for splitting paragraph;I=1,2 ..., n are the ropes of paragraph Draw index.Matching weight score is calculated by the temperature in dictionary, wherein Cur_RScore is the weighting of reverse matching method paragraph Gross score;Sum { hot (Word_i) ^2 } is the weighted score that participle is calculated by hot function, is then calculated by sum function Total number;I=1,2 ..., n are the index index of participle.Matching weight score is then arranged for the word being not present in dictionary It is 0.Finally take the participle scheme that the former is taken when participle score is maximum or score value is identical.Max_Score=max Cur_FScore, Cur_RScore }, cycle calculations are finished until the calculating of all paragraphs, participle set are saved in database to repeat benefit from now on With.Similarly, if declaration material word segmentation result in history item is empty, also using the method for step 3 to declaration material into Row parsing calculates best participle scheme and stores into database.
Step 4: current duplicate checking project to be segmented to the participle factor marker of the factor and history item by statistics segmentation methods Index finds out set, statistics word frequency and exclude high-frequency " monosyllabic word " (as " ", " ", " " etc.).Wherein Cur_Word_ Index is the participle word frequency set to duplicate checking project;
W_ID_1, W_ID_2 ..., W_IN_n are participles because of subindex;Num_1, Num_2 ..., Num_n are the words of participle Frequently.Wherein His_Word_Index is the participle word frequency set of history item;W_ID_1, W_ID_2 ..., W_IN_n be participle because Subindex;Num_1, Num_2 ..., Num_n are the word frequency of participle.Current duplicate checking project is calculated by the Map interface of Hash table Word frequency vector c0=[Num_1, Num_2 ..., Num_n] and history item word frequency vector c1=[Num_1, Num_2 ..., Num_n], word frequency vector result is constructed into union, wherein Index is the call number of each participle factor;Pass through cosine similarity Algorithm CosineSimilar returns to the similar value of current duplicate checking project and history item, and similar value is got over closer to 1 similarity It is high.
It is further improved ground, " electron cloud " (Electron Cloud) and cloud server are connected by Ethernet in step 1 It connects.
It is further improved ground, the core number of CPU is more than or equal to two in step 2.
It is further improved ground, its Max_Score of step 3 is maximum score value;Max { Cur_FScore, Cur_RScore } is Maximum value is returned by max.
It is further improved ground, c0 is the word frequency vector of current duplicate checking project in step 4;C1 be history item word frequency to Amount.
The electric elements occurred in this article are electrically connected with extraneous main controller and 220V alternating current, and main controller can be meter Calculation machine etc. plays the conventionally known equipment of control.
In conclusion project duplicate checking method and system of this kind based on concurrent tasks, rely on Internet technology by internet Hot word, everyday expressions carry out dynamic analysis, form cloud dictionary.By characters matching method in declaration material text information with Cloud dictionary is matched, and is to obtain best participle by weighted calculation with the semantic participle factor by declaration material cutting Scheme counts word frequency and excludes high-frequency " monosyllabic word ".By the participle of the participle subset of current duplicate checking project and history item Subset returns to the similar value of current duplicate checking project and history item by cosine similarity algorithm CosineSimilar.It is counting greatly When according to calculating, using high-capacity and high-speed memory, reasonable employment memory management reduces the frequent read and write access of hard disk, opens concurrent more Thread task makes full use of system resource, CPU maximum frequency is played, to improve duplicate checking efficiency.Take full advantage of system resource reality Efficient duplicate checking function is showed, in addition to this, the hot word collected by cloud technology and everyday expressions are that characters matching method mentions Powerful support has been supplied, has improved the participle accuracy of declaration material, while its scalability is strong, has supported that choose plurality of articles looks into simultaneously Weight.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.
It although an embodiment of the present invention has been shown and described, for the ordinary skill in the art, can be with A variety of variations, modification, replacement can be carried out to these embodiments without departing from the principles and spirit of the present invention by understanding And modification, the scope of the present invention is defined by the appended.

Claims (5)

1. a kind of project duplicate checking method and system based on concurrent tasks, which comprises the following steps:
Step 1: handling by distributed way, " electron cloud " (Electron Cloud) technology in quantum physics, benefit are borrowed Since characteristics such as property, diffusivity, simultaneities with electron cloud, the everyday expressions and temperature on internet are collected, cloud is transmitted to Server carries out dynamic analysis, and the word of parsing is saved as cloud dictionary according to temperature arrangement.
Step 2: opening parallel multi-thread task, by the details of processor, the utilization rate of CPU, memory usage is simultaneously tied Merge hair parameter (default=2), calculate openable concurrent thread quantity Num_Threads, retains kernel thread to guarantee The normal operation of system, system will all be appointed using parallel multi-thread automatically when for occurring the calculating of high-volume data in subsequent step Business makes full use of system resource, CPU maximum frequency is played, to improve duplicate checking efficiency.
Step 3: the declaration material of current duplicate checking is split as paragraph set, wherein Cur_Sen is the paragraph collection of declaration material It closes;Sen_1, Sen_2 ..., Sen_n are the paragraphs split.Pass through positive matching method and combine cloud dictionary, by each paragraph solution Analysis is the set with semanteme participle, and wherein Cur_Sen_i_F is the paragraph participle set of positive matching method;Word_1, Word_ 2 ..., Word_n are the participles for splitting paragraph;I=1,2 ..., n are the index index of paragraph.Pass through the fever thermometer in dictionary Matching weight score is calculated, wherein Cur_FScore is the average weighted score number of positive matching method paragraph;sum{hot(Word_i)^2} It is the weighted score for calculating participle by hot function, total number is then calculated by sum function;I=1,2 ..., n are participles Index index.It is 0 that matching weight score, which is then arranged, for the word being not present in dictionary.Pass through reverse matching method and combines cloud Each paragraph is resolved to the set with semantic participle by terminal word library, and wherein Cur_Sen_i_R is the paragraph point of reverse matching method Set of words;Word_1, Word_2 ..., Word_n are the participles for splitting paragraph;I=1,2 ..., n are the indexes of paragraph index.Matching weight score is calculated by the temperature in dictionary, wherein Cur_RScore is that the weighting of reverse matching method paragraph is total Score;Sum { hot (Word_i) ^2 } is the weighted score that participle is calculated by hot function, is then calculated and is closed by sum function It counts;I=1,2 ..., n are the index index of participle.Matching weight score, which is then arranged, for the word being not present in dictionary is 0.Finally take the participle scheme that the former is taken when participle score is maximum or score value is identical.Max_Score=max Cur_FScore, Cur_RScore }, cycle calculations are finished until the calculating of all paragraphs, participle set are saved in database to repeat benefit from now on With.Similarly, if declaration material word segmentation result in history item is empty, also using the method for step 3 to declaration material into Row parsing calculates best participle scheme and stores into database.
Step 4: the participle factor marker that current duplicate checking project segments the factor and history item is indexed by statistics segmentation methods Find out set, count word frequency and exclude high-frequency " monosyllabic word " (as " ", " ", " " etc.).Wherein Cur_Word_ Index is the participle word frequency set to duplicate checking project;W_ID_1, W_ID_2 ..., W_IN_n are participles because of subindex;Num_1, Num_2 ..., Num_n are the word frequency of participle.Wherein His_Word_Index is the participle word frequency set of history item;W_ID_1, W_ID_2 ..., W_IN_n are participles because of subindex;Num_1, Num_2 ..., Num_n are the word frequency of participle.Pass through Hash table Map interface calculate current duplicate checking project word frequency vector c0=[Num_1, Num_2 ..., Num_n] and history item word frequency to It measures c1=[Num_1, Num_2 ..., Num_n], word frequency vector result is constructed into union, wherein Index is each participle factor Call number;The similar value of current duplicate checking project and history item is returned by cosine similarity algorithm CosineSimilar, it is similar It is worth higher closer to 1 similarity.
2. a kind of project duplicate checking method and system based on concurrent tasks according to claim 1, it is characterised in that: step " electron cloud " (Electron Cloud) is connect with cloud server by Ethernet in one.
3. a kind of project duplicate checking method and system based on concurrent tasks according to claim 1, it is characterised in that: step The core number of CPU is more than or equal to two in two.
4. a kind of project duplicate checking method and system based on concurrent tasks according to claim 1, it is characterised in that: step Three its Max_Score are maximum score values;Max { Cur_FScore, Cur_RScore } is to return to maximum value by max.
5. a kind of project duplicate checking method and system based on concurrent tasks according to claim 1, it is characterised in that: step C0 is the word frequency vector of current duplicate checking project in four;C1 is the word frequency vector of history item.
CN201910287630.4A 2019-04-11 2019-04-11 A kind of project duplicate checking method and system based on concurrent tasks Pending CN110033236A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910287630.4A CN110033236A (en) 2019-04-11 2019-04-11 A kind of project duplicate checking method and system based on concurrent tasks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910287630.4A CN110033236A (en) 2019-04-11 2019-04-11 A kind of project duplicate checking method and system based on concurrent tasks

Publications (1)

Publication Number Publication Date
CN110033236A true CN110033236A (en) 2019-07-19

Family

ID=67238022

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910287630.4A Pending CN110033236A (en) 2019-04-11 2019-04-11 A kind of project duplicate checking method and system based on concurrent tasks

Country Status (1)

Country Link
CN (1) CN110033236A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112001161A (en) * 2020-08-25 2020-11-27 上海新炬网络信息技术股份有限公司 Text duplicate checking method
CN112541057A (en) * 2019-09-04 2021-03-23 上海晶赞融宣科技有限公司 Distributed new word discovery method and device, computer equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103778110A (en) * 2012-10-25 2014-05-07 三星电子(中国)研发中心 Method and system for converting simplified Chinese characters into traditional Chinese characters
CN104102626A (en) * 2014-07-07 2014-10-15 厦门推特信息科技有限公司 Method for computing semantic similarities among short texts
US20160358094A1 (en) * 2015-06-02 2016-12-08 International Business Machines Corporation Utilizing Word Embeddings for Term Matching in Question Answering Systems
CN107122340A (en) * 2017-03-30 2017-09-01 浙江省科技信息研究院 A kind of similarity detection method for the science and technology item return analyzed based on synonym
CN107153658A (en) * 2016-03-03 2017-09-12 常州普适信息科技有限公司 A kind of public sentiment hot word based on weighted keyword algorithm finds method
CN108536530A (en) * 2018-04-02 2018-09-14 北京中电普华信息技术有限公司 A kind of multithreading method for scheduling task and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103778110A (en) * 2012-10-25 2014-05-07 三星电子(中国)研发中心 Method and system for converting simplified Chinese characters into traditional Chinese characters
CN104102626A (en) * 2014-07-07 2014-10-15 厦门推特信息科技有限公司 Method for computing semantic similarities among short texts
US20160358094A1 (en) * 2015-06-02 2016-12-08 International Business Machines Corporation Utilizing Word Embeddings for Term Matching in Question Answering Systems
CN107153658A (en) * 2016-03-03 2017-09-12 常州普适信息科技有限公司 A kind of public sentiment hot word based on weighted keyword algorithm finds method
CN107122340A (en) * 2017-03-30 2017-09-01 浙江省科技信息研究院 A kind of similarity detection method for the science and technology item return analyzed based on synonym
CN108536530A (en) * 2018-04-02 2018-09-14 北京中电普华信息技术有限公司 A kind of multithreading method for scheduling task and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541057A (en) * 2019-09-04 2021-03-23 上海晶赞融宣科技有限公司 Distributed new word discovery method and device, computer equipment and storage medium
CN112001161A (en) * 2020-08-25 2020-11-27 上海新炬网络信息技术股份有限公司 Text duplicate checking method
CN112001161B (en) * 2020-08-25 2024-01-19 上海新炬网络信息技术股份有限公司 Text duplicate checking method

Similar Documents

Publication Publication Date Title
CN105335496B (en) Customer service based on cosine similarity text mining algorithm repeats call processing method
CN105117426B (en) A kind of intellectual coded searching method of customs
CN104077407B (en) A kind of intelligent data search system and method
CN109684422A (en) A kind of single index prediction of the big data platform based on artificial intelligence and method for early warning
WO2022156328A1 (en) Restful-type web service clustering method fusing service cooperation relationships
CN107577724A (en) A kind of big data processing method
CN106095951B (en) Data space multi-dimensional indexing method based on load balancing and inquiry log
CN110188122A (en) A kind of association relationship analysis method between different line loss behaviors
CN110033236A (en) A kind of project duplicate checking method and system based on concurrent tasks
CN107341199A (en) A kind of recommendation method based on documentation & info general model
CN111461521A (en) Residential housing vacancy rate analysis method based on electric power big data
CN106649687A (en) Method and device for on-line analysis and processing of large data
CN109492022A (en) The searching method of semantic-based improved k-means algorithm
CN110244099A (en) Stealing detection method based on user's voltage
CN105740434B (en) Network information methods of marking and device
Bhattacharya et al. High utility itemset mining
CN107766500A (en) The auditing method of fixed assets card
Zhang et al. Application of data mining techniques in the analysis of fire incidents
Li et al. Does firm's value matter with firm's patent quality in technology-intensive industries?
Shah et al. Performance study of time series databases
CN114240041A (en) Lean line loss analysis method and system for distribution network distribution area
CN110287114A (en) A kind of method and device of database script performance test
CN110134646A (en) The storage of knowledge platform service data and integrated approach and system
Wang et al. Speed up big data analytics by unveiling the storage distribution of sub-datasets
CN106844539A (en) Real-time data analysis method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20191219

Address after: 250001, No. two, No. 150, Ji'nan, Shandong

Applicant after: STATE GRID SHANDONG ELECTRIC POWER Co.

Applicant after: SHANDONG LUNENG SOFTWARE TECHNOLOGY Co.,Ltd.

Address before: 250001, No. two, No. 150, Ji'nan, Shandong

Applicant before: STATE GRID SHANDONG ELECTRIC POWER Co.

TA01 Transfer of patent application right
RJ01 Rejection of invention patent application after publication

Application publication date: 20190719

RJ01 Rejection of invention patent application after publication