CN114168817A

CN114168817A - Semi-supervised learning target identification method

Info

Publication number: CN114168817A
Application number: CN202111306863.8A
Authority: CN
Inventors: 张中; 黄俊杰
Original assignee: Hefei Zhanda Intelligent Technology Co ltd
Current assignee: Hefei Zhanda Intelligent Technology Co ltd
Priority date: 2021-11-05
Filing date: 2021-11-05
Publication date: 2022-03-11
Anticipated expiration: 2041-11-05
Also published as: CN114168817B

Abstract

The invention discloses a semi-supervised learning target identification method, which belongs to the technical field of semi-supervised learning, solves the problem of low target identification accuracy, and divides target data of different keywords in different field data areas into different keyword subsets; matching the field Li marked by the data information with the field Li in the unmarked target database; extracting keywords in the data information, matching the extracted keywords with the divided keyword subsets, and matching corresponding keyword subsets; and then the character number of the data information is scanned by the character number scanning unit, the data information in the subset is screened by the character number interval unit, and the matching degree of the screened data target information and the target information matched by the user is higher by adopting a layer-by-layer partition screening and identifying mode, so that a better matching effect is achieved, and the target identification degree is effectively improved.

Description

Semi-supervised learning target identification method

Technical Field

The invention belongs to the technical field of semi-supervised learning, and particularly relates to a semi-supervised learning target identification method.

Background

Semi-supervised learning is a key problem in the research in the field of pattern recognition and machine learning, and is a learning method combining supervised learning and unsupervised learning, wherein the semi-supervised learning uses a large amount of unlabelled data and simultaneously uses labeled data to perform pattern recognition.

When the semi-supervised learning model is used for identifying and processing targets in the operation and use process, the marked targets can be quickly identified and conveyed, but when the unmarked data targets are identified, a large-to-small identification method is not arranged inside the semi-supervised learning model, so that the identified target data are inaccurate, the deviation value is large, and a good identification processing effect cannot be achieved.

Disclosure of Invention

In order to solve the problems existing in the scheme, the invention provides a semi-supervised learning target identification method.

The purpose of the invention can be realized by the following technical scheme: the semi-supervised learning target identification method comprises the following steps:

s1, partitioning model target data: dividing target data through a semi-supervised learning model, and dividing the target data into two categories of marked target data and unmarked target data;

s2, preprocessing data: an external user inputs personal requirement matching data information by using a user terminal, and a target recognition system receives and extracts the input data in advance;

s3, matching and marking target data: preprocessing the transmitted data information in advance, analyzing the input data, detecting whether the data is target data, if so, directly comparing the data with the internal data of the marked target database, extracting the data, and directly transmitting the data to an external terminal;

s4, matching the unmarked target data: the data which cannot be identified and processed are processed again, the data are sequentially divided according to the fields, the keywords and the number of characters, the divided and processed data are processed through a data processing terminal, the processed and analyzed data are combined into a mother set, then the internal subset of the unmarked target data is extracted and matched with the data of the mother set, the character value is compared with the character number interval, the target value is extracted, and the target value is transmitted to an external terminal to finish the target identification work.

Preferably, the data in S2 is directly transmitted to the information extraction terminal through a user terminal, which may be a mobile terminal of an external person.

Preferably, the pretreatment method in S3 is: and comparing the extracted data information with the data information in the marked target database, wherein the comparison result is correct, extracting the marked target data in the marked target database, the comparison result is incorrect, and transmitting the data information to the data analysis terminal.

Preferably, the analysis in S3 is: analyzing the affiliated field of the data information, extracting and matching the data with the internal data of the external big database, marking the data of the input information with the words of the affiliated field, and marking the data of the input information with different field values corresponding to different fields as Li.

Preferably, the data processing in S4 is performed by a data processing terminal, and the data processing terminal is configured to partition the unmarked target database.

Preferably, the target identification system in S2 includes a user terminal, an information extraction terminal, an information preprocessing terminal, a marked target database, an unmarked target database, a data analysis terminal, a data processing terminal, and a target data output terminal;

the data processing terminal internally comprises a character number scanning unit, a character number interval unit, a keyword extraction unit, a matching unit and a partition marking unit.

Preferably, the specific operation steps of the data processing terminal are as follows:

dividing a plurality of groups of same domain data into a domain data area marked as Li, dividing target data of different keywords in different domain data areas Li again, and dividing the target data of different keywords into different keyword subsets;

step two, the input data information is marked with a corresponding field Li through a data analysis terminal, and the field Li marked by the data information is matched with the field Li in the unmarked target database;

after the field matching is finished, extracting keywords in the data information, matching the extracted keywords with the divided keyword subsets, and matching corresponding keyword subsets;

scanning the character number of the data information by a character number scanning unit to obtain a character number Z, and screening the data information in the subset by a character number interval unit;

and fifthly, directly outputting the screened data information to the outside through a target data output terminal, and directly transmitting the data information to an external mobile terminal.

Preferably, the number-of-characters interval unit internal interval is set to (Z-Y, Z + Y).

Compared with the prior art, the invention has the beneficial effects that: dividing a plurality of groups of same domain data into a domain data area, and dividing target data of different keywords in different domain data areas into different keyword subsets; matching the field Li marked by the data information with the field Li in the unmarked target database; extracting keywords in the data information, matching the extracted keywords with the divided keyword subsets, and matching corresponding keyword subsets; and then the character number of the data information is scanned by the character number scanning unit to obtain the character number Z, the data information in the subset is screened by the character number interval unit, and the screened data target information is matched with the target information matched with the user in a higher degree by adopting a layer-by-layer partition screening and identifying mode, so that a better matching effect is achieved, and the target identification degree is effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a functional block diagram of the present invention;

fig. 3 is a schematic block diagram of the data processing terminal of the present invention.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, a semi-supervised learning target identification method includes the following steps:

s4, matching the unmarked target data: the data which cannot be identified and processed are processed again, the data are sequentially divided according to the fields, the keywords and the number of characters, the divided and processed data are processed through a data processing terminal, the processed and analyzed data are combined into a mother set, then the internal subset of the unmarked target data is extracted and matched with the data of the mother set, the matching value is compared with a threshold interval to obtain an extraction target value, and the extraction target value is transmitted to an external terminal to finish the target identification work.

The semi-supervised learning target identification method is executed by a target identification system;

as shown in fig. 2, the target identification system includes a user terminal, an information extraction terminal, an information preprocessing terminal, a marked target database, an unmarked target database, a data analysis terminal, a data processing terminal, and a target data output terminal;

the output end of the user terminal is wirelessly connected with the input end of the information extraction terminal, the output end of the information extraction terminal is electrically connected with the input end of the information preprocessing terminal, the information preprocessing terminal is bidirectionally connected with the marked target database, the output end of the information preprocessing terminal is respectively electrically connected with the input ends of the data analysis terminal and the target data output terminal, the output end of the data analysis terminal is electrically connected with the input end of the data processing terminal, the data processing terminal is bidirectionally connected with the unmarked target database, and the output end of the data processing terminal is electrically connected with the input end of the target data output terminal;

external personnel can input information through a user terminal, the data information is directly transmitted into the information extraction terminal through the user terminal, and the user terminal can be a mobile terminal of the external personnel;

the information extraction terminal is used for extracting data information input by a user, the extracted data content is directly transmitted to the information preprocessing terminal, and the extraction mode is that language aid and symbols in the input information are removed;

the information preprocessing terminal is used for preprocessing the extracted data information, wherein the preprocessing mode is as follows: comparing the extracted data information with data information in a marking target database, extracting marking target data in the marking target database after a comparison result is correct, directly outputting the extracted marking target data to the outside through a target data output terminal, and transmitting the data information to a data analysis terminal when the comparison result is incorrect;

the data analysis terminal is in wireless connection with an external big database, and is used for analyzing and processing data information in the following manners: analyzing the affiliated field of the data information, extracting and matching the data information with the internal data of an external big database, and marking the data of the input information with an affiliated field word, wherein the marking mode is the same as the internal marking mode of the data processing terminal, different fields correspond to different field values and are marked as Li, wherein i represents different fields, i is 1, 2, … …, n, and n is a positive integer;

as shown in fig. 3, the data processing terminal is configured to partition the unmarked target database, and divide a plurality of groups of same domain data into a domain data area, which is marked as Li; dividing target data of different keywords in different domain data areas Li again, and dividing the target data of different keywords into different keyword subsets;

the data processing terminal is also used for comparing the analyzed data information, wherein the input data information passes through a corresponding field Li marked on the data analysis terminal, and the field Li marked by the data information is matched with the field Li in the unmarked target database;

after matching is completed, extracting keywords in the data information, matching the extracted keywords with the divided keyword subsets, and matching corresponding keyword subsets;

then, the character number of the data information is scanned by a character number scanning unit to obtain a character number Z, and the data information in the subset is screened by a character number interval unit, wherein the character number interval is set to be (Z-Y, Z + Y), and the Y value is set by external operation;

the character number of the data in the subset is scanned through the character number scanning unit, the character number is screened through the character number interval unit, and the screened data information can be directly output to the outside through the target data output terminal and directly transmitted to the external mobile terminal;

The above formulas are all calculated by removing dimensions and taking numerical values thereof, the formula is a formula which is obtained by acquiring a large amount of data and performing software simulation to obtain the closest real situation, and the preset parameters and the preset threshold value in the formula are set by the technical personnel in the field according to the actual situation or obtained by simulating a large amount of data.

In the embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and there may be other divisions when the actual implementation is performed; the modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the method of the embodiment.

It will also be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.

The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. Second, etc. terms are used to denote semi-supervised learning target identification methods and do not denote any particular order.

Finally, it should be noted that the above examples are only intended to illustrate the technical process of the present invention and not to limit the same, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical process of the present invention without departing from the spirit and scope of the technical process of the present invention.

Claims

1. A semi-supervised learning target identification method is characterized by comprising the following steps:

2. The semi-supervised learning object identification method as claimed in claim 1, wherein the data in S2 is directly transmitted to the information extraction terminal through a user terminal, and the user terminal is a mobile terminal of an external person.

3. The semi-supervised learning object identification method according to claim 1, wherein the preprocessing mode in S3 is as follows: comparing the extracted data information with the data information in the marked target database, and extracting the marked target data in the marked target database if the comparison result is correct; and if the comparison result is incorrect, transmitting the data information to the data analysis terminal.

4. The semi-supervised learning object identification method as claimed in claim 1, wherein the analysis manner in S3 is as follows: analyzing the affiliated field of the data information, extracting and matching the data with the internal data of the external big database, marking the data of the input information with the words of the affiliated field, and marking the data of the input information with different field values corresponding to different fields as Li.

5. The semi-supervised learning object identification method as claimed in claim 4, wherein the data processing in S4 is performed by a data processing terminal, and the data processing terminal is used for partitioning the unmarked object database.

6. The semi-supervised learning object identification method as claimed in claim 1, wherein the object identification system in S2 comprises a user terminal, an information extraction terminal, an information preprocessing terminal, a labeled object database, an unlabeled object database, a data analysis terminal, a data processing terminal and an object data output terminal;

7. The semi-supervised learning object identification method as claimed in claim 6, wherein the data processing terminal comprises the following specific operation steps:

8. The semi-supervised learning object recognition method of claim 7, wherein the character number interval unit internal interval is set at (Z-Y, Z + Y).