CN110580279A - Information classification method, system, equipment and storage medium - Google Patents

Information classification method, system, equipment and storage medium Download PDF

Info

Publication number
CN110580279A
CN110580279A CN201910762982.0A CN201910762982A CN110580279A CN 110580279 A CN110580279 A CN 110580279A CN 201910762982 A CN201910762982 A CN 201910762982A CN 110580279 A CN110580279 A CN 110580279A
Authority
CN
China
Prior art keywords
data object
chinese
module
content
information classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910762982.0A
Other languages
Chinese (zh)
Inventor
刘跃华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Zheng Yu Software Technology Development Co Ltd
Original Assignee
Hunan Zheng Yu Software Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Zheng Yu Software Technology Development Co Ltd filed Critical Hunan Zheng Yu Software Technology Development Co Ltd
Priority to CN201910762982.0A priority Critical patent/CN110580279A/en
Publication of CN110580279A publication Critical patent/CN110580279A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an information classification method, an information classification system, information classification equipment and a storage medium, and belongs to the technical field of information. The system comprises a theme database module, a keyword database module, a Chinese word segmentation module and a Chinese name word segmentation module, wherein the theme database module is used for dividing the content of a data object into a certain theme; the keyword database module is used for extracting keywords related to the data object content; the Chinese word segmentation module is used for extracting Chinese names in the data object contents; and the Chinese noun scoring module is used for calculating the weight of each Chinese name in the data object content. The topics and the keywords related to the data objects can be extracted through the judgment of the contents of the data objects, and are pushed to different users through the topics and the keywords so as to perform subsequent reading or processing.

Description

Information classification method, system, equipment and storage medium
Technical Field
The present invention relates to the field of information technology, and in particular, to an information classification method, system, device, and storage medium.
Background
Although the resources on the internet are very rich and the ways of acquiring information are diversified, it is very difficult for people without the background of knowledge in the relevant field to acquire information in a specific field, and it is more difficult to acquire the required accurate information. The redundancy level of information acquisition is high. In daily life, it is not difficult to find the same kind of information or the information such as the network news and the information with the same expression repeatedly appearing in each large internet site. Due to the repeated occurrence of a large amount of similar information, the time for a user to acquire required accurate information is too long, and the experience psychology of the user is further influenced. At the present that the informatization level is rapidly increased, the value of the information can be effectively and fully utilized to quickly and accurately grasp the accurate information, and the information consumption is promoted. Therefore, the method helps users to remove network redundant information, improves information retrieval speed, obtains accurate information, and saves time. The problem also exists when some organization departments filter different information documents, and the redundancy degree of information acquisition is high. This problem is urgently to be solved. In addition, even if information resources in related fields are acquired, the public is difficult to identify the effectiveness, accuracy and the like of the information, which all affect the depth of acquiring the information by people to different degrees.
For a server for pushing information to a user, the obtained data is messy, the data information is redundant, the storage space of a database is occupied, the content depth of a data object is insufficient, and the viscosity between the data object and the user cannot be deeply established.
Disclosure of Invention
1. Technical problem to be solved by the invention
In order to overcome the technical problems, the invention provides an information classification method, an information classification system, information classification equipment and a storage medium. The topics and the keywords related to the data objects can be extracted through the judgment of the contents of the data objects, and are pushed to different users through the topics and the keywords so as to perform subsequent reading or processing.
2. Technical scheme
In order to solve the problems, the technical scheme provided by the invention is as follows:
In a first aspect, the invention provides an information classification system, which comprises a theme database module, a keyword database module, a Chinese word segmentation module and a Chinese name word segmentation module, wherein the theme database module is used for dividing the content of a data object into a certain theme; the keyword database module is used for extracting keywords related to the data object content; the Chinese word segmentation module is used for extracting Chinese names in the data object contents; and the Chinese noun scoring module is used for calculating the weight of each Chinese name in the data object content.
In a further refinement, the theme database module is configured to divide the data object content into a theme.
The improvement is that the main subject is the protection and entertainment of teenagers and children.
In a further improvement, the keyword database module is used for warehousing latest hot keywords.
In a further improvement, the Chinese word segmentation module: the Chinese participles IKAnalyzer are adopted and matched with a Chinese word stock in the field of the collected data object, and the Chinese name in the content of the data object is extracted.
The Chinese noun scoring module calculates the weight of each word in the data object content through a TF-IDF weight scoring algorithm, and screens out the keywords in the Chinese noun scoring module.
In a second aspect, the invention provides an information classification method, wherein a data object content classifier selects data object content to be classified and submits the data object content to the information classification system for analysis, the system automatically selects a Chinese word segmentation module to perform Chinese word segmentation on the data object content, a Chinese noun scoring module is used for scoring all Chinese nouns and then ranking, a word with high weight is selected as a keyword list, a topic and a keyword which the data object content belongs to are obtained according to the keyword list, a topic database and a keyword database, a related user is judged through the topic and the keyword, the data object content is recommended to a concerned user, and the data object content is divided according to the topic and the keyword and then stored in the database in a Key-Value form.
In a further improvement, the recommending the data object content to the concerned user further comprises: and ranking the data object contents according to the matched weights and recommending the data object contents to concerned users.
In a third aspect, the present invention provides an apparatus comprising: one or more processors; memory for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to perform a method as described above.
in a fourth aspect, the invention provides a storage medium storing a computer program which, when executed by a processor, performs a method as claimed in any one of the preceding claims.
3. Advantageous effects
Compared with the prior art, the technical scheme provided by the invention has the following beneficial effects:
According to the technical scheme, the topics and the keywords related to the data objects can be extracted through judging the contents of the data objects, and are pushed to different users through the topics and the keywords so as to perform subsequent reading or processing.
Drawings
Fig. 1 is a schematic structural diagram of an information classification system provided in embodiment 1.
Fig. 2 is a flowchart of an information classification method provided in embodiment 1.
Fig. 3 is a flowchart of an information classification method provided in embodiment 2.
FIG. 4 is a schematic diagram of an apparatus according to the present invention.
Detailed Description
For a further understanding of the present invention, reference will now be made in detail to the embodiments illustrated in the drawings.
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.
The terms first, second, and the like in the present invention are provided for convenience of describing the technical solution of the present invention, and have no specific limiting effect, but are all generic terms, and do not limit the technical solution of the present invention.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Example 1
As shown in fig. 1, in a first aspect, this embodiment provides an information classification system, including a topic database module, a keyword database module, a chinese word segmentation module, and a chinese name word segmentation module, where the topic database module is configured to segment data object contents into a certain topic; the keyword database module is used for extracting keywords related to the data object content; the Chinese word segmentation module is used for extracting Chinese names in the data object contents; and the Chinese noun scoring module is used for calculating the weight of each Chinese name in the data object content.
And the theme database module is used for dividing the content of the data object into a theme. The main problems are juvenile and child protection, entertainment and the like. And the keyword database module is used for storing the latest hotspot keywords in a database. The Chinese word segmentation module: the Chinese participles IKAnalyzer are adopted and matched with a Chinese word stock in the field of the collected data object, and the Chinese name in the content of the data object is extracted. The Chinese noun scoring module calculates the weight of each word in the data object content through a TF-IDF weight scoring algorithm, and screens out the keywords in the Chinese noun scoring module.
In a second aspect, the present embodiment further provides an information classification method, as shown in fig. 2, a data object content classifier selects a data object content to be classified and submits the data object content to the above-mentioned information classification system for analysis, the system automatically selects a chinese word segmentation module to perform chinese word segmentation on the data object content, and ranks all chinese terms after scoring by using the chinese term scoring module, selects a word with high weight as a keyword list, obtains a topic and a keyword to which the data object content belongs according to the keyword list, and determines an associated user by using the topic and the keyword, recommends the data object content to a concerned user, and stores the data object content in the database in a Key-Value form after being divided according to the topic and the keyword. For the server for pushing information to the user, after the acquired data are classified according to the method, the data redundant information is deleted, the occupied database storage space is reduced, the content depth of the data object is increased, and the stickiness between the depth establishment and the user is achieved.
In a further improvement, the recommending the data object content to the concerned user further comprises: and ranking the data object contents according to the matched weights and recommending the data object contents to concerned users.
Example 2
the conventional proposal handling modes of enterprises and public institutions are all processed manually, the problems (subjects) related to the proposal are judged by reading the contents of the proposal, and then the proposal is handled by selecting which level of unit. The method not only wastes time and labor and has low efficiency, but also has low delivery accuracy.
The embodiment provides an information classification system, which comprises a theme database module, a region database module, a Chinese word segmentation module and a Chinese name word segmentation module, wherein the theme database module is used for dividing a proposal into a certain theme; the regional database module (equivalent to the keyword database module) is used for extracting the place names related to the proposal; the Chinese word segmentation module is used for extracting the Chinese name in the proposal; and the Chinese noun scoring module is used for calculating the weight of each Chinese name in the proposal. The weight scoring module shown in fig. 3 is a chinese noun scoring module. The theme database module divides the proposal into a certain theme through the accumulated theme. The theme is juvenile child protection.
The region database module stores the latest region database published in the country in a warehouse, which comprises all provinces, cities and counties in the country, villages, towns and streets, collects the information of the garden and the names of the buildings in the country, and uses the information as a place name database to extract the place names involved in the proposal. The Chinese word segmentation module: the Chinese name in the proposal is extracted by adopting the Chinese participle IKAnalyzer and matching with the collected Chinese word stock. The Chinese word stock is an information word stock. The Chinese noun scoring module calculates the weight of each word in the proposal through a TF-IDF weight scoring algorithm, and screens out the keywords in the Chinese noun scoring module.
As shown in fig. 3, an information classification method is also provided, in which a proposal clerk selects a proposal to be processed and submits the selected proposal to the system for improving the accuracy of proposal submission in embodiment 1, the system automatically selects a chinese word segmentation module to perform chinese word segmentation on the proposal content, and uses the chinese word segmentation module to score all chinese words and then rank them, selects a word with high weight as a keyword list, compares the keyword list with a topic database to obtain the topic to which the proposal belongs, determines the region involved in the content of the proposal according to the site name database, and determines the relevant submission unit according to the topic and the site name, and recommends the relevant submission unit list to the proposal clerk.
The related units are all government departments including education departments, education bureaus, human resources, civil offices and the like. Such as juvenile child protection, etc. And the related submitting unit list is ranked according to the matched weight and then recommended to the proposal handling personnel.
Example 3
An apparatus, the apparatus comprising: one or more processors; memory for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to perform a method as described above.
A storage medium storing a computer program which, when executed by a processor, implements the method as described in the above embodiments.
Fig. 4 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.
As shown in fig. 4, as another aspect, the present application also provides an apparatus 500 including one or more Central Processing Units (CPUs) 501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM503, various programs and data necessary for the operation of the apparatus 500 are also stored. The CPU501, ROM502, and RAM503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.
In particular, according to embodiments disclosed herein, the method described in any of the above embodiments may be implemented as a computer software program. For example, embodiments disclosed herein include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the method described in any of the embodiments above. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511.
As yet another aspect, the present application also provides a computer-readable storage medium, which may be the computer-readable storage medium included in the apparatus of the above-described embodiment; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the methods described herein.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present application may be implemented by software or hardware. The described units or modules may also be provided in a processor, for example, each of the described units may be a software program provided in a computer or a mobile intelligent device, or may be a separately configured hardware device. Wherein the designation of a unit or module does not in some way constitute a limitation of the unit or module itself.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the present application. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (10)

1. An information classification system is characterized by comprising a theme database module, a keyword database module, a Chinese word segmentation module and a Chinese name word segmentation module, wherein the theme database module is used for dividing the content of a data object into a certain theme; the keyword database module is used for extracting keywords related to the data object content; the Chinese word segmentation module is used for extracting Chinese names in the data object contents; and the Chinese noun scoring module is used for calculating the weight of each Chinese name in the data object content.
2. The information classification system as claimed in claim 1, wherein the subject database module is configured to divide the content of the data object into a plurality of subjects.
3. The information classifying system according to claim 2, wherein the subject is juvenile child protection and entertainment.
4. an information classification system as claimed in claim 1, characterized in that the keyword database module is arranged to put the latest hotspot keywords in stock.
5. The information classification system of claim 1, wherein the chinese word segmentation module: the Chinese participles IKAnalyzer are adopted and matched with a Chinese word stock in the field of the collected data object, and the Chinese name in the content of the data object is extracted.
6. the information classification system according to claim 1, wherein the Chinese noun classification module filters the keywords in the Chinese noun classification module by calculating the weight of each word in the data object content through a TF-IDF weight classification algorithm.
7. An information classification method is characterized in that a data object content classifier selects data object content to be classified and submits the data object content to an information classification system of claim 1 for analysis, the system automatically selects a Chinese word segmentation module to perform Chinese word segmentation on the data object content, a Chinese noun scoring module is used for scoring all Chinese nouns and then ranking, words with high weight are selected as a keyword list, a theme and keywords to which the data object content belongs are obtained according to the keyword list, a topic database and a keyword database, a related user is judged through the theme and the keywords, the data object content is recommended to a concerned user, and the data object content is divided according to the theme and the keywords and then stored in the database in a Key-Value mode.
8. The method of claim 7, wherein the recommending data object content to interested users further comprises: and ranking the data object contents according to the matched weights and recommending the data object contents to concerned users.
9. An apparatus, characterized in that the apparatus comprises:
One or more processors;
A memory for storing one or more programs,
The one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 7-8.
10. a storage medium storing a computer program, characterized in that the program, when executed by a processor, implements the method according to any one of claims 7-8.
CN201910762982.0A 2019-08-19 2019-08-19 Information classification method, system, equipment and storage medium Pending CN110580279A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910762982.0A CN110580279A (en) 2019-08-19 2019-08-19 Information classification method, system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910762982.0A CN110580279A (en) 2019-08-19 2019-08-19 Information classification method, system, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110580279A true CN110580279A (en) 2019-12-17

Family

ID=68811122

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910762982.0A Pending CN110580279A (en) 2019-08-19 2019-08-19 Information classification method, system, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110580279A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111259259B (en) * 2020-03-11 2021-03-30 郑州工程技术学院 University student news recommendation method, device, equipment and storage medium
CN113535952A (en) * 2021-07-13 2021-10-22 六棱镜(杭州)科技有限公司 Intelligent matching data processing method based on artificial intelligence

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108763213A (en) * 2018-05-25 2018-11-06 西南电子技术研究所(中国电子科技集团公司第十研究所) Theme feature text key word extracting method
CN109948040A (en) * 2017-12-04 2019-06-28 北京京东尚科信息技术有限公司 Storage, recommended method and the system of object information, equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109948040A (en) * 2017-12-04 2019-06-28 北京京东尚科信息技术有限公司 Storage, recommended method and the system of object information, equipment and storage medium
CN108763213A (en) * 2018-05-25 2018-11-06 西南电子技术研究所(中国电子科技集团公司第十研究所) Theme feature text key word extracting method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111259259B (en) * 2020-03-11 2021-03-30 郑州工程技术学院 University student news recommendation method, device, equipment and storage medium
CN113535952A (en) * 2021-07-13 2021-10-22 六棱镜(杭州)科技有限公司 Intelligent matching data processing method based on artificial intelligence
CN113535952B (en) * 2021-07-13 2024-02-09 六棱镜(杭州)科技有限公司 Intelligent matching data processing method based on artificial intelligence

Similar Documents

Publication Publication Date Title
WO2017020451A1 (en) Information push method and device
Nguyen et al. Real-time event detection on social data stream
CN107544988B (en) Method and device for acquiring public opinion data
Shimada et al. Analyzing tourism information on twitter for a local city
EP1716511A1 (en) Intelligent search and retrieval system and method
CN109871433B (en) Method, device, equipment and medium for calculating relevance between document and topic
US20180046628A1 (en) Ranking social media content
CN105512300B (en) information filtering method and system
CN111310011A (en) Information pushing method and device, electronic equipment and storage medium
US20150206101A1 (en) System for determining infringement of copyright based on the text reference point and method thereof
CN110580279A (en) Information classification method, system, equipment and storage medium
JP6047365B2 (en) SEARCH DEVICE, SEARCH PROGRAM, AND SEARCH METHOD
CN110851562A (en) Information acquisition method, system, equipment and storage medium
US20170235835A1 (en) Information identification and extraction
KR102413961B1 (en) Method for providing news analysis service using robotic process automation monitoring
US20150193444A1 (en) System and method to determine social relevance of Internet content
CN104615685B (en) A kind of temperature evaluation method of network-oriented topic
KR20170045403A (en) A knowledge management system of searching documents on categories by using weights
Murtagh Semantic Mapping: Towards Contextual and Trend Analysis of Behaviours and Practices.
CN111126034A (en) Medical variable relation processing method and device, computer medium and electronic equipment
US20220292127A1 (en) Information management system
Wang et al. Unsupervised opinion phrase extraction and rating in Chinese blog posts
JP5798086B2 (en) Device, method and program for extracting pairs of place names and words from a document
Uchida et al. Evaluation of retweet clustering method classification method using retweets on Twitter without text data
CN110795943B (en) Topic representation generation method and system for event

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191217

RJ01 Rejection of invention patent application after publication