CN114491177A

CN114491177A - Information determination method, model training method, model determination device and electronic equipment

Info

Publication number: CN114491177A
Application number: CN202210138775.XA
Authority: CN
Inventors: 张惠玲; 吕雅娟
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-02-15
Filing date: 2022-02-15
Publication date: 2022-05-13

Abstract

The utility model provides an information determination method, a model training device and an electronic device, which relate to the technical field of artificial intelligence, in particular to the technical field of big data and knowledge maps, can be applied to scenes such as data processing, information extraction and the like, and the specific implementation scheme comprises the following steps: acquiring first recording information, wherein the first recording information comprises M first characteristics, and M is a positive integer; determining second record information from N pieces of standard record information acquired in advance, where the second record information includes M second features respectively corresponding to the M first features and processing information matched with the M second features, a first similarity between the second record information and the first record information satisfies a preset condition, the first similarity is equal to a weighted sum of M second similarities, the M second similarities are similarities between the M first features and the M second features, and N is a positive integer; and outputting the processing information.

Description

Information determination method, model training device and electronic equipment

Technical Field

The present disclosure relates to the technical field of artificial intelligence, and in particular, to the technical field of big data and knowledge maps, which can be applied to data processing, information extraction, and other scenarios, and in particular, to an information determination method, a model training method, an apparatus, and an electronic device.

Background

With the development of artificial intelligence technology, the application range of artificial intelligence technology in the life of people is more and more extensive. For example: the artificial intelligence technology can be applied to the field of intelligent inquiry information, so that a user can directly complete intelligent inquiry. In actual use, when inquiring intelligent information, it is common to determine the processing information of the inquiry information by the past experience of the inquirer.

Disclosure of Invention

The disclosure provides an information determination method, a model training device and electronic equipment.

According to a first aspect of the present disclosure, there is provided an information determination method, including:

acquiring first recording information, wherein the first recording information comprises M first characteristics, and M is a positive integer;

determining second record information from N pieces of standard record information acquired in advance, where the second record information includes M second features respectively corresponding to the M first features and processing information matched with the M second features, a first similarity between the second record information and the first record information satisfies a preset condition, the first similarity is equal to a weighted sum of M second similarities, the M second similarities are similarities between the M first features and the M second features, and N is a positive integer;

and outputting the processing information.

According to a second aspect of the present disclosure, there is provided a similarity calculation network model training method, including:

acquiring recording sample information, wherein the recording sample information comprises M first characteristics, and M is a positive integer;

inputting the M first features into an original network model for L-th iterative training, wherein the original network model is used for calculating the similarity between the M first features and M second features in standard record information and outputting M similarities, and L is a positive integer;

and under the condition that the weighted sum of the M similarity degrees meets the preset condition, determining the original network model after the L-th iteration training as a similarity calculation network model.

According to a third aspect of the present disclosure, there is provided an information determining apparatus comprising:

the device comprises a first acquisition module, a second acquisition module and a recording module, wherein the first acquisition module is used for acquiring first recording information, the first recording information comprises M first characteristics, and M is a positive integer;

a first determining module, configured to determine second record information from N pieces of standard record information acquired in advance, where the second record information includes M second features respectively corresponding to the M first features and processing information matched with the M second features, a first similarity between the second record information and the first record information satisfies a preset condition, the first similarity is equal to a weighted sum of M second similarities, the M second similarities are similarities between the M first features and the M second features, and N is a positive integer;

and the output module is used for outputting the processing information.

According to a fourth aspect of the present disclosure, there is provided a similarity calculation network model training apparatus, including:

the third acquisition module is used for acquiring recording sample information, wherein the recording sample information comprises M first characteristics, and M is a positive integer;

the iterative training module is used for inputting the M first features into an original network model for L-th iterative training, the original network model is used for calculating the similarity between the M first features and M second features in standard record information and outputting M similarities, and L is a positive integer;

and the third determining module is used for determining the original network model after the L-th iteration training as the similarity calculation network model under the condition that the weighted sum of the M similarities meets the preset condition.

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the first or second aspects.

According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of the first or second aspects.

According to a seventh aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of any one of the first or second aspects.

In the embodiment of the disclosure, the second similarity between the M first features included in the first recording information and the M second features of the second recording information may be calculated, and when the first similarity obtained by performing weighted summation on the M second similarities satisfies the preset condition, the processing information included in the second recording information may be output, so that the processing information of the first recording information may be estimated according to the processing information, thereby improving the accuracy and the efficiency of estimation of the processing information of the first recording information.

It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

Fig. 1 is a schematic flow chart of an information determination method provided by an embodiment of the present disclosure;

FIG. 2 is a schematic flow chart diagram of a similarity calculation network model training method provided in the embodiment of the present disclosure;

FIG. 3 is a flow chart of an information determination method and a similarity calculation network model training method provided by an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an information determination apparatus provided in an embodiment of the present disclosure;

FIG. 5 is a schematic structural diagram of a similarity calculation network model training apparatus provided in an embodiment of the present disclosure;

FIG. 6 is a schematic block diagram of an example electronic device used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Referring to fig. 1, fig. 1 is a flowchart of an information determining method provided in an embodiment of the present disclosure, and as shown in fig. 1, the information determining method includes the following steps:

step S101, first recording information is obtained, wherein the first recording information comprises M first characteristics, and M is a positive integer.

The types of the first recording information, the second recording information and the standard recording information are not limited herein, for example: the first record information, the second record information, and the standard record information may all be cases, that is, the contents described in the first record information, the second record information, and the standard record information may be medical histories, cautions, and the like of the user.

As an optional implementation manner, the first record information and the second record information are both medical record information. In this way, when the first recorded information and the second recorded information are both medical record information, the processing information of the second recorded information similar to the first recorded information can be determined, and the processing information of the first recorded information can be determined according to the processing information of the second recorded information, so that the accuracy and the determination rate of the determination result of the processing information of the first recorded information are improved.

For example: when the processing information is a treatment scheme, the patient corresponding to the second recording information can be treated according to the treatment scheme of the first recording information, so that the diagnosis process is shortened, the accuracy and the determining rate of the treatment scheme are improved, and the degree of dependence on personal experience of a doctor is reduced; when the processing information is the treatment duration and other information, the treatment duration and other information of the patient corresponding to the second record information can be determined according to the treatment duration of the first record information, so that the determination speed and the determination accuracy of the patient on the treatment duration and other information are improved.

The medical record information may be referred to as case information or medical record information.

Another example is: the first record information, the second record information and the standard record information may also be specifications or operation specifications of the product, that is, the first record information, the second record information and the standard record information may be recorded in various parameter features of a certain product and matters needing attention for operation.

The first record information is acquired in a manner not limited herein, for example: the first record information may be obtained by real-time entry through the electronic device, and of course, the first record information may also be data stored in the server or the electronic device, and may be obtained from the server or the electronic device.

The type of the first feature is not limited herein, and the first feature may refer to different parameter information in the first recording information, for example: when the first recorded information is a case, the first feature may refer to at least one of information such as name, notice, sex, and the like; when the first recorded information is a specification of a product, the first characteristic may refer to at least one of information of a production place, a production manufacturer, a notice, and the like of the product.

It should be noted that, any two first features of the M first features may be different in kind, for example: one first characteristic may refer to a name of the user and another first characteristic may refer to a gender of the user.

As an optional implementation, the method further includes:

acquiring original recording information;

performing data preprocessing on the original recording information to obtain the first recording information;

wherein the data pre-processing comprises at least one of: missing value processing, outlier processing, and filtering processing.

Since the original recorded information generally has some defect characteristics, in order to ensure the accuracy of the determined second recorded information, data preprocessing needs to be performed on the characteristics in the original recorded information, so that the original recorded information is converted into the first recorded information.

The data preprocessing mode is not limited herein, and the missing value processing may refer to supplementing the missing value of the feature according to other information, or the missing value of the feature may be lost in the version updating process, so the missing value processing may query the historical version and supplement the missing value of the feature according to the historical version; the abnormal value processing can directly delete or ignore the characteristics corresponding to the abnormal values; the filtering process may directly filter the features.

In the embodiment of the disclosure, data preprocessing can be performed on the original recorded information to obtain the first recorded information, so that each feature of the first recorded information is more complete and accurate, the accuracy of the determined second recorded information is higher, and the accuracy of the processed information of the output second recorded information is further improved.

As an optional implementation manner, in a case that the data preprocessing includes missing value processing, the performing data preprocessing on the original recording information to obtain the first recording information includes:

determining a data missing item of a first page of the original recording information;

supplementing first data included in the middle page of the original recording information to a first page of the original recording information to obtain the first recording information, wherein the first data corresponds to missing data corresponding to the data missing item.

In this case, the importance of the information included in the top page of the original recording information is generally high, and therefore, when there is a data missing item in the top page of the original recording information, the integrity of the original recording information is low.

In the embodiment of the present disclosure, the first data included in the middle page of the original recorded information may be supplemented to the top page of the original recorded information to obtain the first recorded information, so that the integrity of the information of the top page of the first recorded information may be improved, and the accuracy of the finally determined second recorded information matched with the first recorded information may be higher.

It should be noted that the first data corresponds to missing data corresponding to a missing data item, for example: the missing data corresponding to the missing data item is a name, and the name appears in the middle page, so that the name appearing in the middle page can be supplemented to the position of the missing data item in the first page, and the completeness of the first page of the first record information is improved. Of course, the missing data may be other information, and the above is only an example. The above-described process of supplementing missing data may also be referred to as a backfill process.

In addition, the first data may be separate information in the middle page, that is, the first data alone constitutes one item of data; alternatively, the first data may be partial information in a certain item of data in the intermediate page, that is, the first data may be partial information extracted from a certain item of data. The specific structure is not limited herein.

As an optional implementation manner, the supplementing the first data included in the middle page of the original recording scheme to the first page of the original recording scheme to obtain the first recording information includes:

determining missing data according to a plurality of second data included in the middle page under the condition that the type of the missing data corresponding to the data missing item is quantitative data and the first data corresponding to the missing data does not appear in the middle page, wherein the second data is quantitative data;

and supplementing the missing data to the first page of the original recording information to obtain the first recording information.

Wherein the first data, the missing data and the second data may all be quantitative data, i.e. the first data, the missing data and the second data may all be numbers, and the second data may be associated with the first data, for example: the first data and the second data are related data corresponding to the same user, or the first data and the second data can be used for representing the same parameter, and the parameter can be treatment days or use duration, for example.

The method for determining missing data according to the plurality of second data is not limited herein, and for example: the plurality of second data may be averaged or weighted to obtain missing data.

In the embodiment of the present disclosure, when the type of the missing data is the number type data, since the missing data corresponds to the first data, the type of the first data is also the number type data, and meanwhile, in the case where the first data does not appear in the middle page, the missing data may be determined according to the plurality of second data, so that, in the case where the first data does not appear in the middle page, the missing data may be supplemented as well.

In addition, when the type of missing data is other types, for example: the other types can be text types, and because the data requirements of the text types are strict, the data cannot be generated at will according to other data, and at this time, the missing data corresponding to the data missing item can be maintained in a vacant state.

As another optional implementation manner, in a case that the data preprocessing includes a filtering process, the performing data preprocessing on the original recording information to obtain the first recording information includes:

scoring the original recording information according to a preset scoring rule;

and determining the original recording information as the first recording information when the score of the original recording information is higher than a preset score.

Wherein, preset scoring rules can be scored according to parameters such as integrality, rationality, normalization, timeliness and consistency, for example: the higher the integrity of the original recorded information, the higher the score, the lower the integrity, the lower the score, the greater the number of missing information, the lower the integrity, and the fewer the number of missing information, the higher the integrity.

Rationality refers to whether the information in the original recorded information is consistent with a common rationale, for example: the height of the user is usually 1 to 2 meters, and if the height information in the original recorded information indicates 10 meters, the rationality is poor, and the score is low; if the height newly represents 1.8 m, the reasonability is higher, and the score is higher.

The normative means whether the recording format of the information in the original recording information conforms to the preset format, if so, the score is higher, and if not, the score is lower.

The timeliness means that the shorter the interval between the recording time of the information in the original recorded information and the current time, the better the timeliness and the higher the score, whereas the longer the interval between the recording time of the information in the original recorded information and the current time, the worse the timeliness and the lower the score.

Consistency means that when the information in the original recorded information appears for multiple times, if the information is consistent when appearing for multiple times, the consistency is better, and the score is higher; on the contrary, if the information appears for multiple times, the numerical values of each time are different, the consistency is poor, and the score is low.

In the embodiment of the disclosure, the original recording information may be scored according to a preset scoring rule, and when the score of the original recording information is higher than a preset score, it indicates that the reliability of the original recording information is higher, so that the original recording information may be determined as the first recording information, thereby improving the accuracy of the information in the first recording information.

In addition, when the score of the original recording information is less than or equal to the preset score, the original recording information can be eliminated, so that the calculation workload of the first recording information is reduced compared with a mode of comparing all the original recording information with the standard recording information.

It should be noted that, the parameters such as integrity, rationality, normalization, timeliness and consistency are different, and the corresponding weights thereof may be different, that is, different parameters correspond to different weights, so that the weights of more important parameters can be adjusted higher, and the weights of unimportant parameters can be adjusted lower, so that the original recorded information can be screened according to the important parameters, and the flexibility of screening is enhanced.

Step S102, determining second recording information from N pieces of standard recording information acquired in advance, where the second recording information includes M second features respectively corresponding to the M first features and processing information matched with the M second features, a first similarity between the second recording information and the first recording information satisfies a preset condition, the first similarity is equal to a weighted sum of M second similarities, the M second similarities are similarities between the M first features and the M second features, and N is a positive integer.

The standard record information may refer to record information stored in a database in advance, and the reliability of the standard record information is high, for example: the standard recording information may be recording information carrying a tag with a higher reliability, or the standard recording information may be confirmed after being manually checked.

For the second feature, reference may be made to the related expression of the first feature, and when the second similarity is calculated, the similarities of the first feature and the second feature of the same kind may be calculated, for example: the similarity of the first feature and the second feature each representing the name of the user may be calculated, or the similarity of the first feature and the second feature each representing the gender of the user may be calculated. That is to say: the first feature and the second feature may have a one-to-one correspondence, so that a second similarity between a certain first feature in the first recorded information and a second feature corresponding to the first feature type in the second recorded information may be calculated. Thus, since the M first features are provided, M second similarities can be obtained.

The first similarity is equal to the weighted sum of the M second similarities, and the value of the weight is not limited herein.

As an optional implementation manner, the weights of the first features corresponding to each second similarity are all equal, that is to say: the first similarity is equal to the sum of M second similarities; as another optional implementation, the weights of the first features corresponding to different second similarities are different, that is, the weights corresponding to different first features are not equal, and when the importance of a part of the first features is higher, the weight of the second similarity calculated by the first feature is larger; and when the importance of part of the first features is lower, the weight of the second similarity calculated by the first features is smaller.

For example: when the first feature represents an allergic item or a complaint of the user, the weight of the first feature may be higher; when the first feature represents information such as a home address, the weight of the first feature may be low.

Therefore, different weights are given to the first characteristics according to different importance of the first characteristics, so that the mode of the determined second recording information is more flexible, the accuracy is higher, the first characteristics with higher importance can be more prominently displayed during the determination of the second recording information, and the determined second recording information can better meet the requirements of users.

The specific type of the processing information is not limited herein, and for example: when the first recorded information and the second recorded information are both cases, the processing information may include at least one of treatment plan, treatment duration, cautionary matters, suggestions and the like, so that after the processing information of the second recorded information is output, a user can accurately know the processing information of the first recorded information according to the processing information of the second recorded information, and the intelligentization degree and the determination efficiency of determining the processing information by the user are improved.

Another example is: when the first record information and the second record information are both products, the processing information of the second record information can comprise at least one of the information of the assembling step, the use notice and the like, so that after the processing information of the second record information is output, a user can also accurately know the processing information of the first record information according to the processing information of the second record information, and the intelligence degree and the determination efficiency of the user for determining the processing information are improved.

The preset condition is not limited herein, as an optional implementation manner, the preset condition may refer to that the first similarity is the maximum, as another optional implementation manner, the preset condition may refer to that the first similarity is greater than a preset threshold, and as another optional implementation manner, the preset condition may refer to that the first similarity is within a preset range.

As an optional implementation manner, the second similarity is obtained by inputting the M first features into a similarity calculation network model for similarity calculation, where the similarity calculation network model is a pre-trained network model for calculating the second similarity of the first record information and the second record information.

In the embodiment of the disclosure, when the second similarity is calculated, the similarity calculation network model may be used for calculation, and since the similarity calculation network model may more accurately pay attention to the correlation between different first features, the accuracy of the calculation result of the second similarity may be improved, and meanwhile, the accuracy of the first similarity finally calculated according to the second similarity may be higher.

It should be noted that, for the training process of the similarity calculation network model, reference may be made to the process of the embodiment shown in fig. 2 hereinafter, and details are not described here again.

And step S103, outputting the processing information.

It should be noted that both the processing information and the first characteristic may be adjusted during the output process, and the obvious error result may be corrected in time, or the output processing information or the first characteristic may further satisfy the user's requirement.

The scheme formed by the steps disclosed in the embodiment of the present disclosure may also be referred to as clustering of the recorded information, that is, classifying the first recorded information into a cluster corresponding to a similar second recorded information.

The manner of outputting the processing information is not specifically limited herein, and for example: the processing information can be displayed; or, the processing information is transmitted to the electronic device.

As an optional implementation, the method further includes: and determining the processing information matched with the first recording information according to the processing information. In this way, the processing information matched with the first recording information is determined through the output processing information of the second recording information, and the similarity between the second characteristic of the second recording information and the first characteristic of the first recording information is higher, so that the determination rate and the accuracy of the processing information of the first recording information are improved.

The specific manner of determining the processing information matched with the first recording information according to the processing information of the second recording information is not limited herein, and for example: the processing information of the second recording information may be directly determined as the processing information of the first recording information; alternatively, the processing information of the second recording information may be modified, and the modified processing information of the second recording information may be determined as the processing information of the first recording information, and the modification method is not limited herein, and for example: the correction may be performed based on a second similarity between the first characteristic and the second characteristic of the first recorded information, and when the second similarity is 80%, 80% of the processed information (which may be, for example, a treatment duration or an assembly duration of the product) of the second recorded information may be determined as the processed information of the first recorded information.

In the embodiment of the present disclosure, through steps S101 to S103, the second similarity between the M first features included in the first recording information and the M second features of the second recording information may be calculated, and when the first similarity obtained by performing weighted summation on the M second similarities satisfies the preset condition, the processing information included in the second recording information may be output, so that the processing information of the first recording information may be estimated according to the processing information, thereby improving the accuracy and the efficiency of estimation of the processing information of the first recording information.

Referring to fig. 2, fig. 2 is a flowchart of a similarity calculation network model training method according to an embodiment of the present disclosure, and as shown in fig. 2, the method includes the following steps:

step S201, recording sample information is obtained, wherein the recording sample information comprises M first characteristics, and M is a positive integer.

The recording sample information can refer to the expression of the first recording sample information in the above embodiment, and has the same beneficial technical effects; the first feature can be referred to the expression of the first feature in the above-described embodiment, and has the same advantageous technical effects.

It should be noted that the recorded sample information may be stored in a sample pool.

As an optional implementation, the method further includes:

acquiring original record sample information;

performing data preprocessing on the original recording sample information to obtain the recording sample information;

Both the original recorded sample information and the data preprocessing can be referred to the corresponding expressions in the above embodiments, and have the same beneficial technical effects.

In the embodiment of the disclosure, since the data preprocessing can be performed on the original recording information to obtain the recording sample information, each feature of the recording sample information is more complete and accurate.

As an optional implementation manner, in a case that the data preprocessing includes missing value processing, the performing data preprocessing on the original recording sample information to obtain the recording sample information includes:

determining a data missing item of a first page of the original recording sample information;

and supplementing first data included in the middle page of the original recording sample information to the first page of the original recording sample information to obtain the recording sample information, wherein the first data corresponds to missing data corresponding to the data missing item.

The data missing item, the first data, the middle page and the first page can all refer to the corresponding expressions in the above embodiments, and have the same beneficial technical effects.

In the embodiment of the present disclosure, the first data included in the middle page of the original recording information may be supplemented to the top page of the original recording information to obtain the recording sample information, so that the integrity of the information of the top page of the recording sample information may be improved.

Step S202, inputting the M first features into an original network model for L-th iterative training, wherein the original network model is used for calculating the similarity between the M first features and M second features in standard record information and outputting M similarities, and L is a positive integer.

For the calculation method of the similarity, reference may be made to the calculation method of the second similarity in the foregoing embodiments, and details are not described herein again.

And S203, under the condition that the weighted sum of the M similarity degrees meets a preset condition, determining the original network model after the L-th iteration training as a similarity calculation network model.

For the preset condition, reference may be made to the corresponding description of the preset condition in the above embodiments, which is not described herein again in detail.

In the embodiment of the present disclosure, through steps S201 to S203, the similarity calculation network model can be obtained through training, and the similarity calculation network model can pay attention to the correlation between different first features more accurately, so that the accuracy of the similarity calculated by the similarity calculation network model is higher.

It should be noted that, referring to fig. 3, the online calculation portion in the flowchart included in fig. 3 may refer to the embodiment shown in fig. 1, and the offline calculation portion may refer to the embodiment shown in fig. 2, and finally, the similarity calculation network model may be trained and applied to the embodiment shown in fig. 1.

The electronization in fig. 3 may refer to converting original recorded information of a paper version into original recorded information of an electronic version, so that the original recorded information can be conveniently flowed and obtained; the feature extraction may refer to a first feature extraction in the embodiments shown in fig. 1 and 2, and the recording information clustering may refer to a process of determining second recording information.

It should be noted that, as shown in fig. 3, the user may also intervene in the extracted features (i.e., the feature intervention in fig. 3) and the output processing information (i.e., the record information intervention in fig. 3), so that the final output result may be intervened and adjusted, and the output result may better meet the user requirement.

Referring to fig. 4, fig. 4 is a schematic structural diagram of an information determining apparatus according to an embodiment of the present disclosure, and as shown in fig. 4, the information determining apparatus 400 includes:

a first obtaining module 401, configured to obtain first recording information, where the first recording information includes M first features, and M is a positive integer;

a first determining module 402, configured to determine second record information from N pieces of standard record information acquired in advance, where the second record information includes M second features respectively corresponding to the M first features and processing information matched with the M second features, a first similarity between the second record information and the first record information satisfies a preset condition, the first similarity is equal to a weighted sum of M second similarities, the M second similarities are similarities between the M first features and the M second features, and N is a positive integer;

an output module 403, configured to output the processing information.

Optionally, the information determining apparatus 400 further includes:

and the second determining module is used for determining the processing information matched with the first record information according to the processing information.

Optionally, the information determining apparatus 400 further includes:

the second acquisition module is used for acquiring original recording information;

the preprocessing module is used for preprocessing data of the original recording information to obtain the first recording information;

Optionally, in a case that the data preprocessing includes missing value processing, the preprocessing module includes:

the first determining submodule is used for determining a data missing item of a first page of the original recording information;

and the first supplementing submodule is used for supplementing first data included in a middle page of the original recording information to a first page of the original recording information to obtain the first recording information, wherein the first data corresponds to missing data corresponding to the data missing item.

Optionally, the first supplementary sub-module comprises:

a determining unit, configured to determine, when a type of missing data corresponding to the data missing item is quantitative data and first data corresponding to the missing data does not appear in the intermediate page, the missing data according to a plurality of second data included in the intermediate page, where the second data is quantitative data;

and a supplementing unit, configured to supplement the missing data to a top page of the original recording information to obtain the first recording information.

Optionally, in a case that the data preprocessing includes a filtering process, the preprocessing module includes:

the scoring submodule is used for scoring the original recording information according to a preset scoring rule;

and the second determining sub-module is used for determining the original recording information as the first recording information under the condition that the score of the original recording information is higher than the preset score.

Optionally, the second similarity is obtained by inputting the M first features into a similarity calculation network model for similarity calculation, where the similarity calculation network model is a pre-trained network model for calculating the second similarity of the first record information and the second record information.

Optionally, the first record information and the second record information are medical record information.

The information determining apparatus 400 provided in the present disclosure can implement each process implemented by the information determining method embodiment, and can achieve the same beneficial effects, and for avoiding repetition, the details are not repeated here.

Referring to fig. 5, fig. 5 is a structural diagram of a similarity calculation network model training apparatus according to an embodiment of the present disclosure, and as shown in fig. 5, the similarity calculation network model training apparatus 500 includes:

a third obtaining module 501, configured to obtain record sample information, where the record sample information includes M first features, and M is a positive integer;

an iterative training module 502, configured to input the M first features into an original network model for an lth iterative training, where the original network model is configured to calculate similarities between the M first features and M second features in standard record information, and output M similarities, where L is a positive integer;

a third determining module 503, configured to determine, when the weighted sum of the M similarities satisfies a preset condition, the original network model after the L-th iterative training is determined as the similarity calculation network model.

Optionally, the similarity calculation network model training apparatus 500 further includes:

the fourth acquisition module is used for acquiring original record sample information;

the data preprocessing module is used for preprocessing the original recording sample information to obtain the recording sample information;

Optionally, in a case that the data preprocessing includes missing value processing, the data preprocessing module includes:

a third determining submodule, configured to determine a data missing item of a top page of the original recording sample information;

and the second supplementing submodule is used for supplementing the first data included in the middle page of the original recording sample information to the first page of the original recording sample information to obtain the recording sample information, wherein the first data corresponds to the missing data corresponding to the data missing item.

The similarity calculation network model training device 500 provided by the present disclosure can implement each process implemented by the similarity calculation network model training method embodiment, and can achieve the same beneficial effects, and for avoiding repetition, it is not repeated here.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 performs the respective methods and processes described above, such as the information determination method or the similarity calculation network model training method. For example, in some embodiments, the information determination method or the similarity computation network model training method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM603 and executed by the computing unit 601, one or more steps of the information determination method or the similarity calculation network model training method described above may be performed. Alternatively, in other embodiments, the calculation unit 601 may be configured by any other suitable means (e.g., by means of firmware) to perform an information determination method or a similarity calculation network model training method.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. An information determination method, comprising:

and outputting the processing information.

2. The method of claim 1, further comprising:

and determining the processing information matched with the first recording information according to the processing information.

3. The method of claim 1, further comprising:

acquiring original recording information;

wherein the data preprocessing comprises at least one of the following ways: missing value processing, outlier processing, and filtering processing.

4. The method of claim 3, wherein in the case that the data preprocessing comprises missing value processing, the data preprocessing the original recording information to obtain the first recording information comprises:

5. The method of claim 4, wherein supplementing the first data included in the middle page of the original recording scheme to the top page of the original recording scheme to obtain the first recording information comprises:

6. The method of claim 3, wherein the data pre-processing the original recording information to obtain the first recording information in a case that the data pre-processing includes a filtering process, comprises:

scoring the original recording information according to a preset scoring rule;

7. The method according to any one of claims 1 to 6, wherein the second similarity is obtained by inputting the M first features into a similarity calculation network model for similarity calculation, wherein the similarity calculation network model is a pre-trained network model for calculating the second similarity of the first record information and the second record information.

8. The method of any of claims 1-6, wherein the first record information and the second record information are both medical record information.

9. A similarity calculation network model training method comprises the following steps:

10. The method of claim 9, further comprising:

acquiring original record sample information;

11. The method of claim 10, where the data preprocessing comprises missing value processing, the data preprocessing the raw record sample information to obtain the record sample information comprising:

12. An information determining apparatus comprising:

and the output module is used for outputting the processing information.

13. The apparatus of claim 12, further comprising:

14. The apparatus of claim 12, further comprising:

15. The apparatus of claim 14, wherein in the case that the data preprocessing comprises missing value processing, the preprocessing module comprises:

16. The apparatus of claim 15, wherein the first supplemental sub-module comprises:

17. The apparatus of claim 14, wherein, in the case that the data pre-processing comprises filtering, the pre-processing module comprises:

the scoring submodule is used for scoring the original record information according to a preset scoring rule;

and the second determining submodule is used for determining the original recording information as the first recording information under the condition that the score of the original recording information is higher than a preset score.

18. The apparatus according to any one of claims 12 to 17, wherein the second similarity is obtained by inputting the M first features into a similarity calculation network model for similarity calculation, wherein the similarity calculation network model is a pre-trained network model for calculating a second similarity of the first record information and the second record information.

19. The apparatus of any of claims 12-17, wherein the first recorded information and the second recorded information are both medical record information.

20. A similarity calculation network model training device comprises:

21. The apparatus of claim 20, further comprising:

22. The apparatus of claim 21, wherein in the case that the data pre-processing comprises missing value processing, the data pre-processing module comprises:

23. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8 or to enable the at least one processor to perform the method of any one of claims 9-11.

24. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8 or the computer instructions for causing the computer to perform the method of any one of claims 9-11.

25. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-8, or which, when executed by a processor, implements the method according to any one of claims 9-11.