CN117688139A

CN117688139A - Text searching method and device based on industrial Internet identification in blockchain

Info

Publication number: CN117688139A
Application number: CN202410145576.0A
Authority: CN
Inventors: 刘阳; 韩天宇; 张钰雯; 李胡升; 朱斯语
Original assignee: China Academy of Information and Communications Technology CAICT
Current assignee: China Academy of Information and Communications Technology CAICT
Priority date: 2024-02-01
Filing date: 2024-02-01
Publication date: 2024-03-12

Abstract

The embodiment of the disclosure discloses a text searching method and device based on industrial Internet identification in a blockchain, wherein the method comprises the following steps: extracting features of the target text to obtain target text features; searching target nodes corresponding to the target text features in the hierarchical navigable small world map index based on the target text features; acquiring text storage information corresponding to a target node from a block chain; acquiring a first matched text matched with a target text based on text storage information corresponding to the target node; determining whether the identification information corresponding to the first matching text has an associated text identification, wherein the associated text identification is an industrial Internet identification; and responding to the fact that the identification information corresponding to the first matching text has the associated text identification, and acquiring a second matching text of the target text based on the associated text identification.

Description

Text searching method and device based on industrial Internet identification in blockchain

Technical Field

The disclosure relates to the technical field of blockchain and text searching, in particular to a text searching method and device based on industrial Internet identification in a blockchain.

Background

With the rapid growth of text data, text search technology has also been developed. In the related art, keywords are generally determined first, and then matching text is determined using the keywords. However, in practical applications, in order to ensure the security of data, the text is generally stored in different servers or cloud ends, so that the text searching is required to be performed between different servers or cloud ends for multiple times each time when the text is searched through the keywords, resulting in low text searching efficiency.

Disclosure of Invention

To solve the above problems, embodiments of the present disclosure provide a text searching method and apparatus based on industrial internet identification in a blockchain.

In one aspect of the disclosed embodiments, there is provided a text search method based on industrial internet identification in a blockchain, including: extracting features of the target text to obtain target text features; searching a target node corresponding to the target text feature in a hierarchical navigable small world map index based on the target text feature, wherein the hierarchical navigable small world map index comprises a plurality of layers of sub-navigation small world maps arranged from top to bottom, any one of the plurality of layers of sub-navigation small world maps comprises a plurality of nodes, and any one of the plurality of nodes corresponds to one text feature; acquiring text storage information corresponding to the target node from a blockchain; acquiring a first matched text matched with the target text based on text storage information corresponding to the target node; determining whether the identification information of the first matching text has an associated text identification or not, wherein the associated text identification is an industrial Internet identification; and in response to determining that the identification information corresponding to the first matching text has the associated text identification, carrying out identification analysis on the associated text identification to obtain a second matching text matched with the target text.

In another aspect of the disclosed embodiments, there is provided a text searching apparatus based on industrial internet identification in a blockchain, including: the first acquisition module is used for extracting characteristics of the target text to obtain characteristics of the target text; the search module is used for searching a target node corresponding to the target text feature in a hierarchical navigable small world map index based on the target text feature, the hierarchical navigable small world map index comprises a plurality of layers of sub-navigation small world maps arranged from top to bottom, any one of the plurality of layers of sub-navigation small world maps comprises a plurality of nodes, and any one of the plurality of nodes corresponds to one text feature; the second acquisition module is used for acquiring text storage information corresponding to the target node from the blockchain; the third acquisition module is used for acquiring a first matched text matched with the target text based on the text storage information corresponding to the target node; a fourth obtaining module, configured to determine whether the identification information corresponding to the first matching text has an associated text identifier, where the associated text identifier is an industrial internet identifier; and the identification analysis module is used for carrying out identification analysis on the associated text identification in response to the fact that the associated text identification is contained in the identification information corresponding to the first matching text, so as to obtain a second matching text matched with the target text.

In yet another aspect of the disclosed embodiments, there is provided an electronic device including: a memory for storing a computer program; and the processor is used for executing the computer program stored in the memory and realizing a text searching method based on industrial Internet identification in the blockchain when the computer program is executed.

In yet another aspect of the disclosed embodiments, a computer-readable storage medium is provided having a computer program stored thereon that, when executed by a processor, implements a text search method in a blockchain based on industrial internet identification.

In the embodiment of the disclosure, the feature extraction is firstly performed on the target text to obtain the target text feature, then the target node corresponding to the target text feature is directly searched in the navigable small world map index comprising the text feature and the text storage information, and then the first matching text matched with the target text is obtained through the text storage information corresponding to the target node, so that the first matching text matched with the target text is efficiently and rapidly searched, the problem of low text searching efficiency caused by the fact that the text is stored in different servers or cloud is solved, and the text searching efficiency is effectively improved. In addition, in the disclosed embodiment, through the identification information corresponding to the first matching text, whether the first matching text has the associated text identification is determined, and when the first matching text is determined to have the associated text identification, a second matching text of the target text is acquired based on the associated text identification. Therefore, a plurality of matching texts related to the target text can be provided for the user at one time, and the text searching efficiency is further improved.

The technical scheme of the present disclosure is described in further detail below through the accompanying drawings and examples.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.

The disclosure may be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings in which:

FIG. 1 is a flow diagram of a text search method based on industrial Internet identification in a blockchain provided in an exemplary embodiment of the present disclosure;

FIG. 2 is a flow diagram of a text search method based on industrial Internet identification in a blockchain provided in another exemplary embodiment of the present disclosure;

FIG. 3 is a flow chart of step S110 provided by an exemplary embodiment of the present disclosure;

FIG. 4 is a flow diagram of a text search method based on industrial Internet identification in a blockchain provided by yet another exemplary embodiment of the present disclosure;

FIG. 5 is a schematic diagram of an object of the class of embodiments of the present disclosure;

FIG. 6 is a schematic structural diagram of an ontology object according to an embodiment of the present disclosure;

FIG. 7 is a block diagram of a text search device based on industrial Internet identification in a blockchain provided in an exemplary embodiment of the present disclosure;

Fig. 8 is a schematic structural diagram of an application embodiment of the electronic device of the present disclosure.

Detailed Description

Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless it is specifically stated otherwise.

It will be appreciated by those of skill in the art that the terms "first," "second," etc. in embodiments of the present disclosure are used merely to distinguish between different steps, devices or modules, etc., and do not represent any particular technical meaning nor necessarily logical order between them.

It should also be understood that in embodiments of the present disclosure, "plurality" may refer to two or more, and "at least one" may refer to one, two or more.

It should also be appreciated that any component, data, or structure referred to in the presently disclosed embodiments may be generally understood as one or more without explicit limitation or the contrary in the context.

In addition, the term "and/or" in this disclosure is merely an association relationship describing an association object, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the front and rear association objects are an or relationship.

It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and that the same or similar features may be referred to each other, and for brevity, will not be described in detail.

Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.

Embodiments of the present disclosure may be applicable to electronic devices such as terminal devices, computer systems, servers, etc., which may operate with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with the terminal device, computer system, server, or other electronic device include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments that include any of the foregoing, and the like.

Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc., that perform particular tasks or implement particular abstract data types. The computer system/server may be implemented in a distributed cloud computing environment in which tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computing system storage media including memory storage devices.

In this embodiment:

the narrow Blockchain (Blockchain) technology is a distributed ledger that combines data blocks in a sequential manner into a chained data structure in time order, and that is cryptographically secured against tampering and counterfeiting. The generalized blockchain technology is a brand-new distributed infrastructure and calculation paradigm for verifying and storing data by utilizing a blockchain data structure, generating and updating the data by utilizing a distributed node consensus algorithm, ensuring the safety of data transmission and access by utilizing a cryptography mode, and programming and operating the data by utilizing an intelligent contract consisting of an automatic script code.

The industrial Internet identification analysis system is a basic system of the industrial Internet. The industrial Internet identification analysis system mainly comprises an identification distribution management system and an identification analysis system, wherein the identification in the industrial Internet is an identification card of a machine and a product, has uniqueness and is managed in a layered mode of gradual distribution. The identity resolution system uses the identity to locate and query the machine and the item. The industrial internet identification resolution system may include: international root node, national top level node, secondary identification resolution node, enterprise node, public recursive resolution node, other/enterprise information system.

International root node: it means that the highest-level service node managed by a certain identification system is not limited to a specific country or region, but provides public root zone data management and root analysis service to the global scope. National top level node: the key of the industrial Internet identification analysis system in China is an international gateway for external interconnection and a core hub for internal overall planning. The system can provide management capabilities of top-level identification code registration and identification analysis service for nationwide range, such as identification record, identification authentication and the like. The national top level node is communicated with the international root nodes of various identification systems and is communicated with various secondary and following other identification analysis service nodes in China. Secondary identity resolution node: the identification analysis public service node in the industry or the area can provide identification code registration and identification analysis service for the industry or the area, and complete related identification service management, identification application docking and the like. Each secondary node is assigned a unique secondary node identification prefix by the top level node of the country. Enterprise node: the identification analysis service node in the enterprise can provide identification code registration and identification analysis service for specific enterprises. The system can be deployed independently or used as a constituent element of an enterprise information system. Each enterprise node is assigned a unique enterprise node identification prefix by the secondary node, the content of the identification suffix is defined and assigned by the enterprise, and the enterprise node identification prefix plus the identification suffix form a complete industrial internet identification. Public recursive resolution node: the method is a key entry facility for providing the identification analysis service to the outside by the identification analysis system, receives an identification inquiry request of an external client, finds enterprise nodes in the identification analysis system in a stepwise recursion mode, and acquires detailed information of the identification. Other/enterprise information systems: refers to industrial Internet Application (APP) and industrial Internet platform which are widely used in industrial scenes by means of the identification capability of an industrial Internet identification analysis system, processing data and business logic.

Specifically, the industrial internet identification analysis basic flow:

step (1): the identification analysis client sends an identification analysis request to the recursion node;

step (2): the recursion node checks the local cache, signs the identification analysis request when no cache result exists, and sends the national top node;

step (3): the national top node performs verification on the signed identification analysis request, verifies the authenticity of the recursive node and the integrity of the message, and feeds back the recursive node after the verification is passed after signing the secondary node analysis record information, wherein the secondary node analysis record information comprises a secondary node analysis address;

step (4): the recursion node verifies the signed second-level node analysis record information, verifies the authenticity of the top-level node of the country and the integrity of the second-level node analysis record information, signs the identification analysis request after verification, and sends the signed identification analysis request to the second-level node according to the second-level node analysis address;

step (5): the second-level node performs verification on the signed identification analysis request, verifies the authenticity of the recursive node and the integrity of the identification analysis request message, and after verification, signs enterprise node analysis record information, which includes enterprise node analysis addresses, and feeds back the enterprise node analysis record information to the recursive node;

Step (6): the recursive node verifies the signature, verifies the authenticity of the enterprise node and the integrity of the enterprise node analysis record information, signs the identification analysis request after verification, and sends the signed identification analysis request to the enterprise node according to the enterprise node analysis address;

step (7): the enterprise node checks the authenticity of the recursion node and the integrity of the identification analysis request, after the verification is passed, the analysis result is signed and then fed back to the recursion node, and the analysis result comprises the identification analysis service address;

step (8): checking signature by the recursion node, checking the reality of the enterprise node, the integrity and the reality validity of the analysis result, caching the analysis result after the verification is passed, and feeding the analysis result back to the identification analysis client;

step (9): the method comprises the steps that an identification analysis client sends a query request to an enterprise information system, wherein the query request comprises an identification analysis service address and an identification to be queried;

step (10): and the enterprise information system returns the object information identified by the identification to be queried to the identification analysis client.

Fig. 1 is a flow diagram of a text search method based on industrial internet identification in a blockchain provided in an exemplary embodiment of the present disclosure. The embodiment can be applied to an electronic device, as shown in fig. 1, and includes the following steps:

And step S110, extracting features of the target text to obtain features of the target text. In the present embodiment, the text feature of the target text is referred to as a target text feature.

Wherein the target text feature may include at least one keyword of the target text. For example, the target text feature may include keywords of the target text. In one particular implementation, at least one keyword of the target text may also be converted into a vector and the vector is used as the target text feature.

In a specific implementation manner, word segmentation processing can be performed on the target text by using word segmentation technology and the like to obtain a plurality of words corresponding to the target text, the word with the highest occurrence frequency in the target text in the plurality of words is used as a keyword of the target text, and the keyword and the occurrence frequency of the keyword in the target text form a target text feature of the target text.

Step S120, searching target nodes corresponding to the target text features in the hierarchical navigable small world map index based on the target text features.

Wherein the hierarchical navigable small world map index (Hierarchical Navigable Small World, HNSW) comprises a plurality of sub-navigation small world maps arranged from top to bottom, any of the sub-navigation small world maps of the plurality of sub-navigation small world maps comprising a plurality of nodes, any of the plurality of nodes corresponding to a text feature. In this embodiment, a node closest to the target text feature in the bottom sub-navigation small world map is referred to as a target node corresponding to the target text feature.

In a specific implementation manner, in HNSW, the number of nodes included in the sub-navigation small world map of each layer sequentially increases from top to bottom, that is, the number of nodes included in the sub-navigation small world map of the lowest layer is the largest, and the number of nodes included in the sub-navigation small world map of the highest layer is the smallest. For each sub-navigation small world graph, the sub-navigation small world graph comprises a plurality of nodes, and each node in the plurality of nodes is connected with a preset number of nodes around the node through an edge. Each node may correspond to a text feature, and each text feature may include at least one keyword in the text and related information of the keyword. For example, the text feature may include, but is not limited to, at least one keyword, and the like. In the multi-layer sub-navigation small world diagram, each node in each sub-navigation small world diagram except the lowest-layer sub-navigation small world diagram has a mapping relation with one node in the next sub-navigation small world diagram of the sub-navigation small world diagram where the node is located.

In one specific implementation, in HNSW, the target nodes are searched layer by layer in a top-to-bottom order until the target nodes are obtained from the lowest sub-navigation small world map of the multi-layer sub-navigation small world map. Specifically, in HNSW, a search is started from the top sub-navigation small world map, and a node closest to the target text feature in the top sub-navigation small world map is searched as a neighboring node; then, in other sub-navigation small world maps except the uppermost sub-navigation small world map and the lowermost sub-navigation small world map, each sub-navigation small world map takes a node mapped by a neighboring node of the last sub-navigation small world map as a searching starting node, and therefore the starting node starts to search all nodes included in the sub-navigation small world map for the neighboring node nearest to the starting node; then, the adjacent node of the previous sub-navigation small world map of the lowest sub-navigation small world map is used as a searching initial node, so that the initial node starts to search out the node closest to the initial node from all the nodes included in the lowest sub-navigation small world map, and the closest node is used as a target node corresponding to the target text feature.

Step S130, obtaining text storage information corresponding to the target node from the blockchain.

Wherein the text storage information includes a text storage address.

In one specific implementation, each node of the HNSW corresponds to a text storage address. Correspondence between each node of HNSW and text storage information is stored in the blockchain.

For example, assuming that the text is stored in the cloud, a corresponding relationship between the target node and the text storage information may be obtained from the blockchain, where the form of the corresponding relationship between the target node and the text storage information may be shown in formula (1), and then determining the text storage information (uri 2// uri 3) corresponding to the target node based on the corresponding relationship between the target node and the text storage information;

node.uri1=uri2// uri3 （1）

node is a target node, uri1 is a uri (Uniform Resource Identifier ) address of the target node, uri2 is a uri address of the cloud, and uri3 is a uri address of the text in the cloud.

Step S140, based on the text storage information corresponding to the target node, a first matching text matched with the target text is obtained.

In this embodiment, the text obtained from the text storage information corresponding to the target node is referred to as a first matching text. In one particular implementation, the first matching text may include target text features therein.

Step S150, determining whether the identification information corresponding to the first matching text has an associated text identification.

The identification information corresponding to the first matching text comprises a text identification for uniquely identifying the first matching text. The associated text identifier is used for uniquely identifying one text, and the associated text identifier and the text identifier of the first matching text are both industrial Internet identifiers.

In a specific implementation manner, when the identification information corresponding to the first matching text includes an associated text identification, determining that the first matching text has an associated text identification, that is, the associated text identification included in the identification information corresponding to the first matching text has an association relationship with the first matching text; when the first matching text does not include the associated text identification, it is determined that the first matching text does not have the associated text identification.

Step S160, in response to determining that the identification information corresponding to the first matching text has the associated text identification, carrying out identification analysis on the associated text identification to obtain a second matching text matched with the target text.

And performing identification analysis processing on the associated text identification through an identification analysis system of the industrial Internet to obtain a text identified by the associated text identification, and determining the text as a second matched text of the target text.

Fig. 2 is a flow diagram of a text search method based on industrial internet identification in a blockchain provided in another exemplary embodiment of the present disclosure. In some alternative embodiments, as shown in fig. 2, the hierarchical navigable small world map index may be constructed by a method specifically comprising:

In step S210, a plurality of sample texts are acquired.

Sample text may include, for example, but is not limited to: papers, journal articles, job reports, etc.

Step S220, for each sample text in the plurality of sample texts, acquiring word frequency information of the sample text.

The word frequency information of the sample text comprises word frequency data of each word in the sample text.

In one particular implementation, word Frequency data for each word in the text may be determined based on a word Frequency-reverse document Frequency (TF-IDF) algorithm.

For example, multiple sample text may be used as a corpus. Firstly, word segmentation processing is carried out on the sample text, each word corresponding to the sample text is determined, for each word, the occurrence frequency of the word in the sample text can be determined, the occurrence frequency is determined to be TF of the word, and considering the difference of the lengths of different sample texts, normalization processing can be carried out on the TF of the word, namely, the TF of the word is divided by the total number of words corresponding to the sample text. Then determining the IDF of the word based on the formula (2), multiplying the normalized TF of the word with the IDF to obtain the TF-IDF of the word, determining the TF-IDF of the word as word frequency data of the word, and determining the word frequency data of each word corresponding to each sample text by the method;

（2）

Where m is the number of sample text in the corpus and p is the number of sample text in the corpus that contains the word.

Step S230, determining initial text characteristics of the sample text according to word frequency information of the sample text.

Wherein the initial text feature specifying the sample text includes keywords of the sample text.

In an alternative embodiment, the first k words with the largest word frequency data in each sample text can be used as keywords of the sample text, and the keywords of the sample text form the initial text feature of the sample text.

Step S240, encrypting the initial text features of the sample text by using a preset encryption algorithm to obtain the text features of the sample text.

The preset encryption algorithm may be a searchable encryption (Searchable Encryption, SE) algorithm, or the preset encryption algorithm may be a hash algorithm. The initial text feature of each sample text may be encrypted using a searchable encryption algorithm to obtain the text feature of the sample text.

For example, assuming that the preset encryption algorithm is a hash algorithm, hash computation may be performed on each keyword in the initial text feature, and the text feature of the sample text is formed by the hash value of each keyword.

Encryption algorithm assuming that preset encryption algorithm can be used for secure vector inner product calculationThe method (Secure kNN), in particular, can generate two n×n-dimensional invertible matrices M first ₁ And M ₂ And generating an n-dimensional binary vector s= (S) ₁ 、S ₂ 、S ₃ …S _n ). For initial text feature F _i (keyword) conversion into vector, initial text feature F _i Expanding to obtain F _i ′=（F _i1 ，F _i2 ，F _i2 ，…F _in ，‖F _i ‖ ² ） ^T For example, F can be _i Split into n sub-features, from which F is formed _i 'A'; thereafter F is carried out by using (3) and (4) _i ' split into two vectors；

Wherein Sj E S, F _ij ∈F _i V is a random number, v ε {0,1};

constructing text features based on (5)；

（5）。

Step S250 constructs a hierarchical navigable small world map index using a hierarchical navigable small world algorithm based on text features of the plurality of sample texts.

In a specific implementation manner, in the HNSW algorithm, for a multi-layer sub-navigation small world map, when a new node is inserted into the HNSW map index, the sub-navigation small world map into which the new node is to be inserted is determined first, and then the new node is inserted layer by layer in the order from top to bottom from the sub-navigation small world map into which the new node is to be inserted. In the HNSW algorithm, aiming at each layer of sub-navigation small world map, nodes are inserted into each layer of sub-navigation small world map one by one.

The method for inserting the newly added node into the HNSW map index may specifically include: assume that HNSW build parameters include: the number of connections u and the total number of layers m. The m-layer sub-navigation small world map in HNSW is respectively a 0-layer sub-navigation small world map, a 1-layer sub-navigation small world map, a … -layer sub-navigation small world map and an m-layer sub-navigation small world map in sequence from bottom to top, and u is more than or equal to 1.

When a new node is inserted into the hierarchical navigable small world map index, determining a sub-navigation small world map to be inserted by the new node in a random selection mode, and assuming that the sub-navigation small world map to be inserted by the new node is an f-th-layer sub-navigation small world map, wherein m is not less than f and not more than 0.

Inserting the newly added node z into the f-layer sub-navigation small world diagram, and recording the newly added node as an inserted node z _f Then any node in the f-layer sub-navigation small world map is used as the initial node of the f-layer sub-navigation small world map, the initial node of the f-layer sub-navigation small world map is used for determining a distance insertion node z in the f-layer sub-navigation small world map by utilizing a heuristic search algorithm _f The nearest u nodes are respectively connected with the inserting node z _f Connection, while inserting distance into node z _f The nearest node acts as an ingress node.

Determining a node with a mapping relation with the entering node in the f-1 layer sub-navigation small world diagram, taking the node as a starting node of the f-1 layer sub-navigation small world diagram, and recording a newly added node z as an inserted node z _f-1 Starting from the initial node of the f-1 layer sub-navigation small world diagram, determining a distance insertion node z in the f-1 layer sub-navigation small world diagram by using a heuristic search algorithm _f-1 The nearest u nodes are respectively connected with the inserting node z _f-1 Connection, while inserting distance into node z _f-1 The nearest node is used as an entry node, a node with a mapping relation with the entry node is determined in the f-2 layer sub-navigation small world diagram, and the node is used as a starting node of the f-2 layer sub-navigation small world diagram;

and repeating the operation of inserting the newly added node in the f-1 layer sub-navigation small world map in the f-2 layer to the m layer so that the newly added node is inserted into the f layer to the m layer sub-navigation small world map layer by layer. And simultaneously establishing a mapping relation between the inserted node of the newly added node z in the f-layer to the inserted node of the m-layer, namely, the inserted node of the newly added node z in the f-layer sub-navigation small world diagram and the inserted node of the newly added node z in the f-1-layer sub-navigation small world diagram, and the like, wherein the inserted node of the newly added node z in the m-layer sub-navigation small world diagram and the inserted node of the newly added node z in the m-1-layer sub-navigation small world diagram have the mapping relation.

Fig. 3 is a flowchart illustrating step S110 according to an exemplary embodiment of the present disclosure. In some alternative embodiments, as shown in fig. 3, step S110 includes the steps of:

step S111, word frequency information of each word in the target text is obtained.

The word frequency information of each word in the target text comprises word frequency data of each word in the target text.

In a specific implementation, word frequency data of each word in the target text may be determined based on a TF-IDF algorithm, i.e., TF-IDF of each word in the target text is determined as the word frequency data of the word.

Step S112, determining initial text characteristics of the target text according to word frequency information of each word in the target text.

Wherein the initial text feature of the target text includes keywords of the target text.

In an alternative embodiment, the first k words with the largest word frequency data in the target text may be used as keywords of the target text, and the keywords of the target text form the initial text feature of the target text.

Step S113, encrypting the initial text features of the target text by using a preset encryption algorithm method to obtain target text features of the target text.

The encryption manner of the initial text feature of the target text may refer to the encryption manner of the initial text feature of the sample text in step S240, which is not described herein.

Fig. 4 is a flow diagram of a text search method based on industrial internet identification in a blockchain provided in accordance with yet another exemplary embodiment of the present disclosure. In some alternative embodiments, as shown in fig. 4, step S110 further includes the following steps:

step S310, determining the association degree between the sample texts based on the initial text features of each sample text.

Wherein, for any two sample texts in the plurality of sample texts, the association degree between the any two sample texts can be determined based on the keywords in the initial text features of the any two sample texts.

For example, based on keywords included in the initial text feature of the arbitrary two sample texts, determining that the arbitrary two sample texts have the same number of keywords, and dividing the same number of keywords by the total number of keywords of the arbitrary two sample texts to obtain the association degree of the arbitrary two sample texts.

Step S320, determining the sample text with the association relation in the plurality of sample texts according to the association degree between the sample texts and a preset association degree threshold value.

When the association degree between any two sample texts is greater than or equal to a preset association degree threshold value, determining that the association relationship exists between any two sample texts; and when the association degree between any two sample texts is smaller than a preset association degree threshold value, determining that no association relation exists between any two sample texts.

Step S330, according to the text identification of the sample text with the association relationship, the identification information of the sample text with the association relationship is determined.

The identification information of any sample text comprises text identification of the sample text with an association relation with any sample text. That is, text identifications of any two sample texts having an association relationship are stored as each other's associated text identifications into each other's identification information.

For example, assuming that the sample text a and the sample text B have an association relationship, the text identifier of the sample text a is stored as an association text identifier in the identification information of the sample text B, and the text identifier of the sample text B is stored as an association text identifier in the identification information of the sample text a.

In one particular implementation, the identification information may be a digital asset. In particular, data transmitted via a transport protocol may be referred to as digital assets. The transfer protocol may be a digital asset transfer protocol (Digital Asset Transfer Protocol, DATP), or the like. Digital assets can be divided into category objects (Class AO) and ontology objects (OnloogyAO). The category object characterizes the directory information. An ontology object is a specific instance of a category object that characterizes information corresponding to the category object. Fig. 5 shows the structure of a category object. As shown in FIG. 5, object ID represents an Object identifier that uniquely identifies a category Object. The Registry Time is the creation Time, representing the Time the category object was created. The Expiration Time is the Expiration Time, representing the Expiration Time of the category object. Modified Time is a modification Time, representing the Modified Time of the category object; class Object is an Object type, meaning that the type is a category Object. The Father represents the previous class object represented by the class object, represented by a set containing identifiers of the inherited previous class object, such as: { Object ID1, object ID2, … }. Child represents the next category object represented by the category object, represented by a set containing identifiers of the next category object, such as: { ObjectID7, objectID 8, … }. The meta attribute information is used for representing the catalogue, attribute characteristics, labels and the attribute of the category object of the attribute included in the body object corresponding to the category object. For example, the meta attribute information may include: the attribute belongs to category standards, attribute names, attribute descriptions, attribute processing types, value dictionaries, value types, examples, update periods, security levels, object relationships and the like. Attribute category criteria: i.e., specification standards followed by attributes, such as Eclass; attribute name: attribute naming should follow three major principles: avoiding privacy violation, using the same attribute name by the same attribute, and using the similar sentence structure by the similar attribute; description of attributes: the attribute names are interpreted by using a sentence of two words, so that the problems of ambiguity, ambiguity and the like of the attribute names caused by too short words are avoided; attribute processing type: the method is characterized in that the method is divided into an original class attribute, a statistical class attribute and an algorithm class attribute according to different processing types, wherein the original class attribute is a field existing in an original data table, and the original class attribute is changed into an attribute after simple processing (such as de-duplication and the like) so as to be used by business staff, such as an author of a text, a publishing date and the like; the statistical attribute represents the attribute formed by processing the original data through simple mathematical function operations such as summation, average, regular expression and the like, such as the total number of 7-day text browsed and the like; the algorithm type attribute is the label of the deep processing type calculated by the original data through a model algorithm, such as text influence and the like; value dictionary: i.e. enumeration of various possible values of the attributes, such as: the value dictionary of the gender attribute is [ male and female ], and the value dictionary of the text attribute is [ foreign language and Chinese ]. Value type: namely, the data type of the attribute value includes numerical value type, character type, date type and the like; examples: specific examples of attribute values; update period: refers to the update period of attribute data; security level: the attribute data has data security risks in the processes of data processing, attribute online and attribute use from the source data acquisition, so that security levels are formulated for the attributes, and attribute use specifications of different levels are generated according to the security levels of the attributes; object relationship: the native attribute labels of category objects of father and child can be further described by object relationships. Fig. 6 shows the structure of an ontology object. As shown in fig. 6, the Object ID is an Object identifier for uniquely identifying an Object of an entity. Registry Time is creation Time, representing the Time that an ontology object is created. The Expiration Time is the Expiration Time, and represents the Expiration Time of the ontology object. Modified Time is modification Time, which represents modification Time of the ontology object; data/Opera is an Object type, represents the type of the entity Object, data represents the entity Object as a Data Object, opera represents the entity Object as an operation Object, reference ID is a Reference identifier, and represents the Object ID of the category Object corresponding to the entity Object. The data body represents specific actual data or a data operation interface address and the like under the category object of the body object, namely the data body can comprise text identifiers of sample texts, and when the sample texts have associated text identifiers, the data body also comprises the associated text identifiers. Where data formats, semantics, periodicity, security levels, etc. need to be in compliance with the definitions in the category object to which they pertain.

In some alternative implementations, step S140 in the embodiments of the present disclosure may include: acquiring an encrypted text corresponding to text storage information corresponding to a target node; and decrypting the encrypted text corresponding to the text storage information corresponding to the target node to obtain a first matching text.

In a specific implementation manner, each sample text in the plurality of sample images can be encrypted by utilizing a symmetric encryption algorithm (SM 4 and SM 2) to obtain a plurality of encrypted texts, the plurality of encrypted texts are stored in a cloud end, a storage address of each encrypted text in the cloud end is fed back by the cloud end, and the storage address is used as text storage information of the sample text. And establishing a corresponding relation between the text storage information of each sample text and the nodes corresponding to the text characteristics of the sample text, and storing the corresponding relation into the block.

The method comprises the steps that based on text storage information corresponding to a target node, encrypted text corresponding to the text storage information can be obtained from a cloud; and then decrypting the encrypted text to obtain a decrypted sample text, and taking the decrypted sample text as a first matching text.

In some alternative implementations, step S160 in the embodiments of the present disclosure may include: performing identification analysis on the associated text identification to obtain an encrypted text corresponding to the associated text identification; and decrypting the encrypted text corresponding to the associated text identifier to obtain a second matched text.

In an alternative embodiment, the encrypted text of the plurality of sample texts can also be stored in the industrial internet. Each sample text can be assigned an industrial internet identifier as the text identifier of the sample text, and a corresponding relation between the text identifier of the sample text and the encrypted text of the sample text is established. When the sample text has the associated relation, the text identification of the sample text can be used as the associated text identification of the sample text with the associated relation, and the associated text identification is stored in the identification information of the sample text with the associated relation.

The identification analysis system through the industrial internet can be utilized to carry out identification analysis processing on the associated text identification, so as to obtain the encrypted text corresponding to the associated text identification, then the encrypted text corresponding to the associated text identification is decrypted, and the decrypted sample text is used as a second matching text.

Fig. 7 is a block diagram of a text search device based on industrial internet identification in a blockchain provided in an exemplary embodiment of the present disclosure. As shown in fig. 7, the text searching apparatus based on industrial internet identification in a blockchain includes:

A first obtaining module 400, configured to perform feature extraction on the target text to obtain a target text feature;

a search module 410, configured to search, based on the target text feature, a hierarchical navigable small world map index for a target node corresponding to the target text feature, where the hierarchical navigable small world map index includes a plurality of sub-navigation small world maps arranged from top to bottom, and any one of the sub-navigation small world maps includes a plurality of nodes, and any one of the plurality of nodes corresponds to one text feature;

a second obtaining module 420, configured to obtain text storage information corresponding to the target node from a blockchain;

a third obtaining module 430, configured to obtain a first matching text that is matched with the target text based on the text storage information corresponding to the target node;

a fourth obtaining module 440, configured to determine whether the identification information corresponding to the first matching text has an associated text identifier, where the associated text identifier is an industrial internet identifier;

and the identification analysis module 450 is configured to, in response to determining that the identification information corresponding to the first matching text has an associated text identification, perform identification analysis on the associated text identification, and obtain a second matching text matched with the target text.

In some optional examples, the text searching apparatus based on industrial internet identification in a blockchain in the above embodiments of the present disclosure further includes:

a fifth obtaining module, configured to obtain a plurality of sample texts;

the extraction module is used for obtaining word frequency information of each sample text in the plurality of sample texts respectively, wherein the word frequency information comprises word frequency data of each word in the sample texts;

the first determining module is used for determining initial text characteristics of the sample text according to word frequency information of the sample text;

the encryption module is used for encrypting the initial text characteristics of the sample text by using a preset encryption algorithm to obtain the text characteristics of the sample text;

and the construction module is used for constructing the hierarchical navigable small world map index by adopting a hierarchical navigable small world algorithm based on the text characteristics of the plurality of sample texts.

In some optional examples, the first obtaining module 400 in the above embodiments of the present disclosure may include:

the acquisition sub-module is used for acquiring word frequency information of each word in the target text;

the determining submodule is used for determining initial text characteristics of the target text according to word frequency information of words in the target text;

And the encryption sub-module is used for encrypting the initial text characteristics of the target text by using the preset encryption algorithm method to obtain the target text characteristics of the target text.

the second determining module is used for determining the association degree between the sample texts based on the initial text characteristics of each sample text;

the third determining module is used for determining the sample texts with the association relation in the plurality of sample texts according to the association degree between the sample texts and a preset association degree threshold value;

and the fourth determining module is used for determining the identification information of the sample text with the association relation according to the text identification of the sample text with the association relation, wherein the identification information of any sample text comprises the text identification of the sample text with the association relation with any sample text.

In some optional examples, the third obtaining module 430 in the above embodiments of the disclosure is specifically configured to:

acquiring an encrypted text corresponding to the text storage information corresponding to the target node;

decrypting the encrypted text corresponding to the text storage information corresponding to the target node to obtain the first matching text.

In some optional examples, the identification resolution module 450 in the above embodiments of the disclosure is specifically configured to:

performing identification analysis on the associated text identification to obtain an encrypted text corresponding to the associated text identification;

and decrypting the encrypted text corresponding to the associated text identifier to obtain the second matching text.

In the text searching device based on industrial internet identification in the blockchain of the present disclosure, various optional embodiments, optional implementations and optional examples disclosed above may be flexibly selected and combined according to needs, so as to achieve corresponding functions and effects, which are not listed in one-to-one.

The text searching device based on the industrial internet identifier in the blockchain and the embodiments of the text searching method based on the industrial internet identifier in the blockchain disclosed in the disclosure correspond to each other, and the related contents can be referred to each other and are not repeated here.

The beneficial technical effects corresponding to the exemplary embodiments of the text searching apparatus based on industrial internet identification in the blockchain of the present disclosure may refer to the corresponding beneficial technical effects of the above-mentioned exemplary method section, and will not be described herein.

In addition, the embodiment of the disclosure also provides an electronic device, which comprises:

a memory for storing a computer program;

and the processor is used for executing the computer program stored in the memory, and when the computer program is executed, the text searching method based on the industrial Internet identification in the blockchain according to any embodiment of the disclosure is realized.

Fig. 8 is a schematic structural diagram of an application embodiment of the electronic device of the present disclosure. Next, an electronic device according to an embodiment of the present disclosure is described with reference to fig. 8. The electronic device may be either or both of the first device and the second device, or a stand-alone device independent thereof, which may communicate with the first device and the second device to receive the acquired input signals therefrom.

As shown in fig. 8, the electronic device includes one or more processors and memory.

The processor may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities, and may control other components in the electronic device to perform the desired functions.

The memory may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer readable storage medium that can be executed by a processor to implement the text search method based on industrial internet identification and/or other desired functions in the blockchain of the various embodiments of the present disclosure described above.

In one example, the electronic device may further include: input devices and output devices, which are interconnected by a bus system and/or other forms of connection mechanisms (not shown).

In addition, the input device may include, for example, a keyboard, a mouse, and the like.

The output device may output various information including the determined distance information, direction information, etc., to the outside. The output devices may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, etc.

Of course, only some of the components of the electronic device relevant to the present disclosure are shown in fig. 8, components such as buses, input/output interfaces, and the like are omitted for simplicity. In addition, the electronic device may include any other suitable components depending on the particular application.

In addition to the methods and apparatus described above, embodiments of the present disclosure may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform steps in a text search method based on industrial internet identification in a blockchain according to various embodiments of the present disclosure described in the above section of the present specification.

The computer program product may write program code for performing the operations of embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.

Further, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform steps in a text search method based on industrial internet identification in a blockchain according to various embodiments of the present disclosure described in the above section of the present description.

The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.

The basic principles of the present disclosure have been described above in connection with specific embodiments, however, it should be noted that the advantages, benefits, effects, etc. mentioned in the present disclosure are merely examples and not limiting, and these advantages, benefits, effects, etc. are not to be considered as necessarily possessed by the various embodiments of the present disclosure. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, since the disclosure is not necessarily limited to practice with the specific details described.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, so that the same or similar parts between the embodiments are mutually referred to. For system embodiments, the description is relatively simple as it essentially corresponds to method embodiments, and reference should be made to the description of method embodiments for relevant points.

The block diagrams of the devices, apparatuses, devices, systems referred to in this disclosure are merely illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatuses, devices, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.

The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, firmware. The above-described sequence of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the sequence specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present disclosure may also be implemented as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

It is also noted that in the apparatus, devices and methods of the present disclosure, components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered equivalent to the present disclosure.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the disclosure to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, a person of ordinary skill in the art will recognize certain variations, modifications, alterations, additions, and subcombinations thereof.

Claims

1. A text search method based on industrial internet identification in a blockchain, comprising:

Extracting features of the target text to obtain target text features;

searching a target node corresponding to the target text feature in a hierarchical navigable small world map index based on the target text feature, wherein the hierarchical navigable small world map index comprises a plurality of layers of sub-navigation small world maps arranged from top to bottom, any one of the plurality of layers of sub-navigation small world maps comprises a plurality of nodes, and any one of the plurality of nodes corresponds to one text feature;

acquiring text storage information corresponding to the target node from a blockchain;

acquiring a first matched text matched with the target text based on text storage information corresponding to the target node;

determining whether the identification information of the first matching text has an associated text identification or not, wherein the associated text identification is an industrial Internet identification;

and in response to determining that the identification information corresponding to the first matching text has the associated text identification, carrying out identification analysis on the associated text identification to obtain a second matching text matched with the target text.

2. The method as recited in claim 1, further comprising:

acquiring a plurality of sample texts;

Respectively obtaining word frequency information of each sample text in the plurality of sample texts, wherein the word frequency information comprises word frequency data of each word in the sample text;

determining initial text characteristics of the sample text according to word frequency information of the sample text;

encrypting the initial text characteristics of the sample text by using a preset encryption algorithm to obtain the text characteristics of the sample text;

and constructing the hierarchical navigable small world map index by adopting a hierarchical navigable small world algorithm based on the text characteristics of the plurality of sample texts.

3. The method according to claim 2, wherein the feature extraction of the target text to obtain the target text feature of the target text comprises:

acquiring word frequency information of each word in the target text;

determining initial text characteristics of the target text according to word frequency information of each word in the target text;

and encrypting the initial text characteristics of the target text by using the preset encryption algorithm method to obtain the target text characteristics of the target text.

4. The method as recited in claim 2, further comprising:

Determining the association degree between sample texts based on the initial text characteristics of each sample text;

determining sample texts with association relations in the plurality of sample texts according to the association degrees between the sample texts and a preset association degree threshold;

according to the text identification of the sample text with the association relationship, the identification information of the sample text with the association relationship is determined, and the identification information of any sample text comprises the text identification of the sample text with the association relationship with any sample text.

5. The method according to any one of claims 1-4, wherein the obtaining the first matching text of the target text based on the text storage information corresponding to the target node includes:

6. The method of any of claims 1-4, wherein the obtaining the second matching text of the target text based on the associated text identification comprises:

7. A text search device based on industrial internet identification in a blockchain, comprising:

the first acquisition module is used for extracting characteristics of the target text to obtain characteristics of the target text;

the search module is used for searching a target node corresponding to the target text feature in a hierarchical navigable small world map index based on the target text feature, the hierarchical navigable small world map index comprises a plurality of layers of sub-navigation small world maps arranged from top to bottom, any one of the plurality of layers of sub-navigation small world maps comprises a plurality of nodes, and any one of the plurality of nodes corresponds to one text feature;

the second acquisition module is used for acquiring text storage information corresponding to the target node from the blockchain;

the third acquisition module is used for acquiring a first matched text matched with the target text based on the text storage information corresponding to the target node;

a fourth obtaining module, configured to determine whether the identification information corresponding to the first matching text has an associated text identifier, where the associated text identifier is an industrial internet identifier;

And the identification analysis module is used for carrying out identification analysis on the associated text identification in response to the fact that the associated text identification is contained in the identification information corresponding to the first matching text, so as to obtain a second matching text matched with the target text.

8. The apparatus as recited in claim 7, further comprising:

a fifth obtaining module, configured to obtain a plurality of sample texts;

the extraction module is used for acquiring word frequency information of the sample texts for the plurality of sample texts, wherein the word frequency information comprises word frequency data of each word in the sample texts;

9. An electronic device, comprising:

a memory for storing a computer program;

A processor for executing a computer program stored in the memory and which, when executed, implements the text search method in a blockchain based on industrial internet identification as claimed in any of the preceding claims 1-6.

10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements the text search method based on industrial internet identification in a blockchain as claimed in any of the preceding claims 1-6.