CN114168817A - Semi-supervised learning target identification method - Google Patents

Semi-supervised learning target identification method Download PDF

Info

Publication number
CN114168817A
CN114168817A CN202111306863.8A CN202111306863A CN114168817A CN 114168817 A CN114168817 A CN 114168817A CN 202111306863 A CN202111306863 A CN 202111306863A CN 114168817 A CN114168817 A CN 114168817A
Authority
CN
China
Prior art keywords
data
target
information
terminal
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111306863.8A
Other languages
Chinese (zh)
Other versions
CN114168817B (en
Inventor
张中
黄俊杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Zhanda Intelligent Technology Co ltd
Original Assignee
Hefei Zhanda Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Zhanda Intelligent Technology Co ltd filed Critical Hefei Zhanda Intelligent Technology Co ltd
Priority to CN202111306863.8A priority Critical patent/CN114168817B/en
Publication of CN114168817A publication Critical patent/CN114168817A/en
Application granted granted Critical
Publication of CN114168817B publication Critical patent/CN114168817B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9035Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a semi-supervised learning target identification method, which belongs to the technical field of semi-supervised learning, solves the problem of low target identification accuracy, and divides target data of different keywords in different field data areas into different keyword subsets; matching the field Li marked by the data information with the field Li in the unmarked target database; extracting keywords in the data information, matching the extracted keywords with the divided keyword subsets, and matching corresponding keyword subsets; and then the character number of the data information is scanned by the character number scanning unit, the data information in the subset is screened by the character number interval unit, and the matching degree of the screened data target information and the target information matched by the user is higher by adopting a layer-by-layer partition screening and identifying mode, so that a better matching effect is achieved, and the target identification degree is effectively improved.

Description

Semi-supervised learning target identification method
Technical Field
The invention belongs to the technical field of semi-supervised learning, and particularly relates to a semi-supervised learning target identification method.
Background
Semi-supervised learning is a key problem in the research in the field of pattern recognition and machine learning, and is a learning method combining supervised learning and unsupervised learning, wherein the semi-supervised learning uses a large amount of unlabelled data and simultaneously uses labeled data to perform pattern recognition.
When the semi-supervised learning model is used for identifying and processing targets in the operation and use process, the marked targets can be quickly identified and conveyed, but when the unmarked data targets are identified, a large-to-small identification method is not arranged inside the semi-supervised learning model, so that the identified target data are inaccurate, the deviation value is large, and a good identification processing effect cannot be achieved.
Disclosure of Invention
In order to solve the problems existing in the scheme, the invention provides a semi-supervised learning target identification method.
The purpose of the invention can be realized by the following technical scheme: the semi-supervised learning target identification method comprises the following steps:
s1, partitioning model target data: dividing target data through a semi-supervised learning model, and dividing the target data into two categories of marked target data and unmarked target data;
s2, preprocessing data: an external user inputs personal requirement matching data information by using a user terminal, and a target recognition system receives and extracts the input data in advance;
s3, matching and marking target data: preprocessing the transmitted data information in advance, analyzing the input data, detecting whether the data is target data, if so, directly comparing the data with the internal data of the marked target database, extracting the data, and directly transmitting the data to an external terminal;
s4, matching the unmarked target data: the data which cannot be identified and processed are processed again, the data are sequentially divided according to the fields, the keywords and the number of characters, the divided and processed data are processed through a data processing terminal, the processed and analyzed data are combined into a mother set, then the internal subset of the unmarked target data is extracted and matched with the data of the mother set, the character value is compared with the character number interval, the target value is extracted, and the target value is transmitted to an external terminal to finish the target identification work.
Preferably, the data in S2 is directly transmitted to the information extraction terminal through a user terminal, which may be a mobile terminal of an external person.
Preferably, the pretreatment method in S3 is: and comparing the extracted data information with the data information in the marked target database, wherein the comparison result is correct, extracting the marked target data in the marked target database, the comparison result is incorrect, and transmitting the data information to the data analysis terminal.
Preferably, the analysis in S3 is: analyzing the affiliated field of the data information, extracting and matching the data with the internal data of the external big database, marking the data of the input information with the words of the affiliated field, and marking the data of the input information with different field values corresponding to different fields as Li.
Preferably, the data processing in S4 is performed by a data processing terminal, and the data processing terminal is configured to partition the unmarked target database.
Preferably, the target identification system in S2 includes a user terminal, an information extraction terminal, an information preprocessing terminal, a marked target database, an unmarked target database, a data analysis terminal, a data processing terminal, and a target data output terminal;
the data processing terminal internally comprises a character number scanning unit, a character number interval unit, a keyword extraction unit, a matching unit and a partition marking unit.
Preferably, the specific operation steps of the data processing terminal are as follows:
dividing a plurality of groups of same domain data into a domain data area marked as Li, dividing target data of different keywords in different domain data areas Li again, and dividing the target data of different keywords into different keyword subsets;
step two, the input data information is marked with a corresponding field Li through a data analysis terminal, and the field Li marked by the data information is matched with the field Li in the unmarked target database;
after the field matching is finished, extracting keywords in the data information, matching the extracted keywords with the divided keyword subsets, and matching corresponding keyword subsets;
scanning the character number of the data information by a character number scanning unit to obtain a character number Z, and screening the data information in the subset by a character number interval unit;
and fifthly, directly outputting the screened data information to the outside through a target data output terminal, and directly transmitting the data information to an external mobile terminal.
Preferably, the number-of-characters interval unit internal interval is set to (Z-Y, Z + Y).
Compared with the prior art, the invention has the beneficial effects that: dividing a plurality of groups of same domain data into a domain data area, and dividing target data of different keywords in different domain data areas into different keyword subsets; matching the field Li marked by the data information with the field Li in the unmarked target database; extracting keywords in the data information, matching the extracted keywords with the divided keyword subsets, and matching corresponding keyword subsets; and then the character number of the data information is scanned by the character number scanning unit to obtain the character number Z, the data information in the subset is screened by the character number interval unit, and the screened data target information is matched with the target information matched with the user in a higher degree by adopting a layer-by-layer partition screening and identifying mode, so that a better matching effect is achieved, and the target identification degree is effectively improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a functional block diagram of the present invention;
fig. 3 is a schematic block diagram of the data processing terminal of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, a semi-supervised learning target identification method includes the following steps:
s1, partitioning model target data: dividing target data through a semi-supervised learning model, and dividing the target data into two categories of marked target data and unmarked target data;
s2, preprocessing data: an external user inputs personal requirement matching data information by using a user terminal, and a target recognition system receives and extracts the input data in advance;
s3, matching and marking target data: preprocessing the transmitted data information in advance, analyzing the input data, detecting whether the data is target data, if so, directly comparing the data with the internal data of the marked target database, extracting the data, and directly transmitting the data to an external terminal;
s4, matching the unmarked target data: the data which cannot be identified and processed are processed again, the data are sequentially divided according to the fields, the keywords and the number of characters, the divided and processed data are processed through a data processing terminal, the processed and analyzed data are combined into a mother set, then the internal subset of the unmarked target data is extracted and matched with the data of the mother set, the matching value is compared with a threshold interval to obtain an extraction target value, and the extraction target value is transmitted to an external terminal to finish the target identification work.
The semi-supervised learning target identification method is executed by a target identification system;
as shown in fig. 2, the target identification system includes a user terminal, an information extraction terminal, an information preprocessing terminal, a marked target database, an unmarked target database, a data analysis terminal, a data processing terminal, and a target data output terminal;
the output end of the user terminal is wirelessly connected with the input end of the information extraction terminal, the output end of the information extraction terminal is electrically connected with the input end of the information preprocessing terminal, the information preprocessing terminal is bidirectionally connected with the marked target database, the output end of the information preprocessing terminal is respectively electrically connected with the input ends of the data analysis terminal and the target data output terminal, the output end of the data analysis terminal is electrically connected with the input end of the data processing terminal, the data processing terminal is bidirectionally connected with the unmarked target database, and the output end of the data processing terminal is electrically connected with the input end of the target data output terminal;
external personnel can input information through a user terminal, the data information is directly transmitted into the information extraction terminal through the user terminal, and the user terminal can be a mobile terminal of the external personnel;
the information extraction terminal is used for extracting data information input by a user, the extracted data content is directly transmitted to the information preprocessing terminal, and the extraction mode is that language aid and symbols in the input information are removed;
the information preprocessing terminal is used for preprocessing the extracted data information, wherein the preprocessing mode is as follows: comparing the extracted data information with data information in a marking target database, extracting marking target data in the marking target database after a comparison result is correct, directly outputting the extracted marking target data to the outside through a target data output terminal, and transmitting the data information to a data analysis terminal when the comparison result is incorrect;
the data analysis terminal is in wireless connection with an external big database, and is used for analyzing and processing data information in the following manners: analyzing the affiliated field of the data information, extracting and matching the data information with the internal data of an external big database, and marking the data of the input information with an affiliated field word, wherein the marking mode is the same as the internal marking mode of the data processing terminal, different fields correspond to different field values and are marked as Li, wherein i represents different fields, i is 1, 2, … …, n, and n is a positive integer;
as shown in fig. 3, the data processing terminal is configured to partition the unmarked target database, and divide a plurality of groups of same domain data into a domain data area, which is marked as Li; dividing target data of different keywords in different domain data areas Li again, and dividing the target data of different keywords into different keyword subsets;
the data processing terminal is also used for comparing the analyzed data information, wherein the input data information passes through a corresponding field Li marked on the data analysis terminal, and the field Li marked by the data information is matched with the field Li in the unmarked target database;
after matching is completed, extracting keywords in the data information, matching the extracted keywords with the divided keyword subsets, and matching corresponding keyword subsets;
then, the character number of the data information is scanned by a character number scanning unit to obtain a character number Z, and the data information in the subset is screened by a character number interval unit, wherein the character number interval is set to be (Z-Y, Z + Y), and the Y value is set by external operation;
the character number of the data in the subset is scanned through the character number scanning unit, the character number is screened through the character number interval unit, and the screened data information can be directly output to the outside through the target data output terminal and directly transmitted to the external mobile terminal;
the data processing terminal internally comprises a character number scanning unit, a character number interval unit, a keyword extraction unit, a matching unit and a partition marking unit.
The above formulas are all calculated by removing dimensions and taking numerical values thereof, the formula is a formula which is obtained by acquiring a large amount of data and performing software simulation to obtain the closest real situation, and the preset parameters and the preset threshold value in the formula are set by the technical personnel in the field according to the actual situation or obtained by simulating a large amount of data.
In the embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and there may be other divisions when the actual implementation is performed; the modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the method of the embodiment.
It will also be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. Second, etc. terms are used to denote semi-supervised learning target identification methods and do not denote any particular order.
Finally, it should be noted that the above examples are only intended to illustrate the technical process of the present invention and not to limit the same, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical process of the present invention without departing from the spirit and scope of the technical process of the present invention.

Claims (8)

1. A semi-supervised learning target identification method is characterized by comprising the following steps:
s1, partitioning model target data: dividing target data through a semi-supervised learning model, and dividing the target data into two categories of marked target data and unmarked target data;
s2, preprocessing data: an external user inputs personal requirement matching data information by using a user terminal, and a target recognition system receives and extracts the input data in advance;
s3, matching and marking target data: preprocessing the transmitted data information in advance, analyzing the input data, detecting whether the data is target data, if so, directly comparing the data with the internal data of the marked target database, extracting the data, and directly transmitting the data to an external terminal;
s4, matching the unmarked target data: the data which cannot be identified and processed are processed again, the data are sequentially divided according to the fields, the keywords and the number of characters, the divided and processed data are processed through a data processing terminal, the processed and analyzed data are combined into a mother set, then the internal subset of the unmarked target data is extracted and matched with the data of the mother set, the character value is compared with the character number interval, the target value is extracted, and the target value is transmitted to an external terminal to finish the target identification work.
2. The semi-supervised learning object identification method as claimed in claim 1, wherein the data in S2 is directly transmitted to the information extraction terminal through a user terminal, and the user terminal is a mobile terminal of an external person.
3. The semi-supervised learning object identification method according to claim 1, wherein the preprocessing mode in S3 is as follows: comparing the extracted data information with the data information in the marked target database, and extracting the marked target data in the marked target database if the comparison result is correct; and if the comparison result is incorrect, transmitting the data information to the data analysis terminal.
4. The semi-supervised learning object identification method as claimed in claim 1, wherein the analysis manner in S3 is as follows: analyzing the affiliated field of the data information, extracting and matching the data with the internal data of the external big database, marking the data of the input information with the words of the affiliated field, and marking the data of the input information with different field values corresponding to different fields as Li.
5. The semi-supervised learning object identification method as claimed in claim 4, wherein the data processing in S4 is performed by a data processing terminal, and the data processing terminal is used for partitioning the unmarked object database.
6. The semi-supervised learning object identification method as claimed in claim 1, wherein the object identification system in S2 comprises a user terminal, an information extraction terminal, an information preprocessing terminal, a labeled object database, an unlabeled object database, a data analysis terminal, a data processing terminal and an object data output terminal;
the data processing terminal internally comprises a character number scanning unit, a character number interval unit, a keyword extraction unit, a matching unit and a partition marking unit.
7. The semi-supervised learning object identification method as claimed in claim 6, wherein the data processing terminal comprises the following specific operation steps:
dividing a plurality of groups of same domain data into a domain data area marked as Li, dividing target data of different keywords in different domain data areas Li again, and dividing the target data of different keywords into different keyword subsets;
step two, the input data information is marked with a corresponding field Li through a data analysis terminal, and the field Li marked by the data information is matched with the field Li in the unmarked target database;
after the field matching is finished, extracting keywords in the data information, matching the extracted keywords with the divided keyword subsets, and matching corresponding keyword subsets;
scanning the character number of the data information by a character number scanning unit to obtain a character number Z, and screening the data information in the subset by a character number interval unit;
and fifthly, directly outputting the screened data information to the outside through a target data output terminal, and directly transmitting the data information to an external mobile terminal.
8. The semi-supervised learning object recognition method of claim 7, wherein the character number interval unit internal interval is set at (Z-Y, Z + Y).
CN202111306863.8A 2021-11-05 2021-11-05 Semi-supervised learning target recognition method Active CN114168817B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111306863.8A CN114168817B (en) 2021-11-05 2021-11-05 Semi-supervised learning target recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111306863.8A CN114168817B (en) 2021-11-05 2021-11-05 Semi-supervised learning target recognition method

Publications (2)

Publication Number Publication Date
CN114168817A true CN114168817A (en) 2022-03-11
CN114168817B CN114168817B (en) 2024-07-09

Family

ID=80478141

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111306863.8A Active CN114168817B (en) 2021-11-05 2021-11-05 Semi-supervised learning target recognition method

Country Status (1)

Country Link
CN (1) CN114168817B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107992597A (en) * 2017-12-13 2018-05-04 国网山东省电力公司电力科学研究院 A kind of text structure method towards electric network fault case
CN108170761A (en) * 2017-12-23 2018-06-15 合肥弹刚信息科技有限公司 A kind of Visualized Analysis System and its method based on magnanimity documentation & info
CN109460735A (en) * 2018-11-09 2019-03-12 中国科学院自动化研究所 Document binary processing method, system, device based on figure semi-supervised learning
KR20200018154A (en) * 2018-08-10 2020-02-19 서울대학교산학협력단 Acoustic information recognition method and system using semi-supervised learning based on variational auto encoder model
CN112148750A (en) * 2020-10-20 2020-12-29 成都中科大旗软件股份有限公司 Data integration method and system
CN113326350A (en) * 2021-05-31 2021-08-31 江汉大学 Keyword extraction method, system, device and storage medium based on remote learning
CN113591899A (en) * 2021-06-10 2021-11-02 国网河北省电力有限公司营销服务中心 Power customer portrait recognition method and device and terminal equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107992597A (en) * 2017-12-13 2018-05-04 国网山东省电力公司电力科学研究院 A kind of text structure method towards electric network fault case
CN108170761A (en) * 2017-12-23 2018-06-15 合肥弹刚信息科技有限公司 A kind of Visualized Analysis System and its method based on magnanimity documentation & info
KR20200018154A (en) * 2018-08-10 2020-02-19 서울대학교산학협력단 Acoustic information recognition method and system using semi-supervised learning based on variational auto encoder model
CN109460735A (en) * 2018-11-09 2019-03-12 中国科学院自动化研究所 Document binary processing method, system, device based on figure semi-supervised learning
CN112148750A (en) * 2020-10-20 2020-12-29 成都中科大旗软件股份有限公司 Data integration method and system
CN113326350A (en) * 2021-05-31 2021-08-31 江汉大学 Keyword extraction method, system, device and storage medium based on remote learning
CN113591899A (en) * 2021-06-10 2021-11-02 国网河北省电力有限公司营销服务中心 Power customer portrait recognition method and device and terminal equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
夏颖;马琳;张中兆;周才发: "基于半监督流形学习的 WLAN 室内定位算法", ***工程与电子技术, no. 007, 31 December 2014 (2014-12-31) *
祁磊;于沛泽;高阳: "弱监督场景下的行人重识别研究综述", 软件学报, no. 009, 31 December 2020 (2020-12-31) *

Also Published As

Publication number Publication date
CN114168817B (en) 2024-07-09

Similar Documents

Publication Publication Date Title
CN107958230B (en) Facial expression recognition method and device
CN109829155A (en) Determination method, automatic scoring method, apparatus, equipment and the medium of keyword
CN110298030B (en) Method and device for checking accuracy of semantic analysis model, storage medium and equipment
CN106776544A (en) Character relation recognition methods and device and segmenting method
CN107491536B (en) Test question checking method, test question checking device and electronic equipment
CN110134961A (en) Processing method, device and the storage medium of text
CN116363440B (en) Deep learning-based identification and detection method and system for colored microplastic in soil
CN112926045B (en) Group control equipment identification method based on logistic regression model
CN111177367A (en) Case classification method, classification model training method and related products
CN110414523A (en) A kind of identity card recognition method, device, equipment and storage medium
CN113486664A (en) Text data visualization analysis method, device, equipment and storage medium
CN112528022A (en) Method for extracting characteristic words corresponding to theme categories and identifying text theme categories
CN106529470A (en) Gesture recognition method based on multistage depth convolution neural network
CN114496099A (en) Cell function annotation method, device, equipment and medium
CN114639152A (en) Multi-modal voice interaction method, device, equipment and medium based on face recognition
CN113705468A (en) Digital image identification method based on artificial intelligence and related equipment
CN114168817B (en) Semi-supervised learning target recognition method
CN114842982B (en) Knowledge expression method, device and system for medical information system
CN115601768A (en) Method, device and equipment for judging written characters and storage medium
CN113628077A (en) Method for generating non-repeated examination questions, terminal and readable storage medium
CN113470830A (en) Abnormal data processing method, device, equipment and storage medium
CN113515591A (en) Text bad information identification method and device, electronic equipment and storage medium
CN111382750A (en) Method and device for identifying graphic verification code
CN111291376A (en) Web vulnerability verification method based on crowdsourcing and machine learning
CN115424353B (en) Service user characteristic identification method and system based on AI model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant