CN116304259B - Spectrogram data matching retrieval method, system, electronic equipment and storage medium - Google Patents

Spectrogram data matching retrieval method, system, electronic equipment and storage medium Download PDF

Info

Publication number
CN116304259B
CN116304259B CN202310590733.4A CN202310590733A CN116304259B CN 116304259 B CN116304259 B CN 116304259B CN 202310590733 A CN202310590733 A CN 202310590733A CN 116304259 B CN116304259 B CN 116304259B
Authority
CN
China
Prior art keywords
data
spectrogram
peak
value
range
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310590733.4A
Other languages
Chinese (zh)
Other versions
CN116304259A (en
Inventor
王薇
杨柳青
彭雨洁
王中健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yaorongyun Digital Technology Chengdu Co ltd
Original Assignee
Yaorongyun Digital Technology Chengdu Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yaorongyun Digital Technology Chengdu Co ltd filed Critical Yaorongyun Digital Technology Chengdu Co ltd
Priority to CN202310590733.4A priority Critical patent/CN116304259B/en
Publication of CN116304259A publication Critical patent/CN116304259A/en
Application granted granted Critical
Publication of CN116304259B publication Critical patent/CN116304259B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/30Assessment of water resources

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Software Systems (AREA)
  • Spectrometry And Color Measurement (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a matching retrieval method, a system, electronic equipment and a storage medium of spectrogram data, wherein the method comprises the following steps: acquiring first spectrogram data to be retrieved, wherein the first spectrogram data comprises a plurality of retrieval peak values; simultaneously acquiring data standard requirements; expanding each search peak value data into a range value with a corresponding number according to a preset threshold value; searching in a corresponding spectrogram database by using the range value to obtain second spectrogram data in the spectrogram database, wherein a plurality of matched peak value data of the second spectrogram data can respectively fall into the corresponding range value; the second spectrogram data is presented according to data standardization requirements. According to the invention, the search peak data of the nuclear magnetic resonance spectrogram, the infrared spectrogram and the mass spectrogram are searched, so that the obtained first spectrogram data can be searched under the condition of fluctuation within an acceptable error range, and meanwhile, the selection of the data standard requirement is set, so that different requirements of users are met.

Description

Spectrogram data matching retrieval method, system, electronic equipment and storage medium
Technical Field
The present invention relates to the field of spectrogram data processing, and in particular, to a method, a system, an electronic device, and a storage medium for matching and retrieving spectrogram data.
Background
A spectrogram is a visual representation of light, sound, or other signal, which changes over time or other variable. The nuclear magnetic resonance spectrum, infrared spectrum and mass spectrum are used as powerful tools for qualitative analysis of the components and structures of various organic and inorganic matters, and the nuclear magnetic resonance spectrum can be divided into hydrogen spectrum and carbon spectrum.
Both nuclear magnetic resonance spectrum and infrared spectrum are transitions at different energy levels after absorption of electromagnetic waves by microscopic particles. The infrared spectrum is obtained by selectively absorbing infrared rays with certain wavelengths (for example, the wavelength is 2.5-25 μm) by molecules, causing transition of vibration energy level and rotation energy level in the molecules, and detecting the condition that the infrared rays are absorbed. In the nuclear magnetic resonance spectrum, electromagnetic waves with very long wavelength (about 106-109 μm in a radio frequency region), frequency of megahertz order and very low energy are used for irradiating molecules, so that vibration or transition of a rotation energy level of the molecules is not caused, and transition of an electron energy level is not caused. But this electromagnetic wave energy interacts with the magnetic nuclei in a strong magnetic field causing the magnetic nuclei to undergo resonant transitions of magnetic energy levels in an external magnetic field, thereby producing an absorption signal. The absorption of such nuclear radio frequency electromagnetic radiation is known as nuclear magnetic resonance spectroscopy. Mass spectrometry is a plot of ion abundance as a function of mass to charge ratio obtained by a mass spectrometer.
Nuclear magnetic resonance spectroscopy can be used for the kind and number of specific kinds of atoms (e.g. hydrogen or carbon), infrared spectroscopy can be used to analyze specific chemical bonds in molecules to analyze functional groups, and mass spectrometry can be used to determine the number and mass of atoms and molecules. Three are important characterization ways for determining the structure of the compound.
In some cases, it is necessary to retrieve the elements (peak data) in the acquired nmr spectra, infrared spectra and mass spectra, in order to achieve the inference of the corresponding compound structure. However, in the prior art, only a function of querying one element in the database can be generally realized, but an array formed by a plurality of elements is required to be used for searching the three spectrograms, so that the prior art cannot realize the function. Meanwhile, due to the fact that the generation of nuclear magnetic resonance spectrum, infrared spectrum and mass spectrum is affected by different machine brands or environments and the like, the obtained data fluctuates within an acceptable error range, and if the fluctuation is not considered, the search result is inaccurate. In addition, in some cases, the user cannot determine whether the element to be searched is all key elements corresponding to the structure of the compound, for example, in the case that the spectrogram is unclear, the user cannot select all peaks in the spectrogram, and for example, the user cannot determine whether the peak selected from the spectrogram is all peaks of the structure of the compound, even when the compound is impure, has impurities, and has solvent peaks, so that in this case, if only one search mode is adopted, the search result is inaccurate.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a matching retrieval method, a system, electronic equipment and a storage medium of spectrogram data.
The aim of the invention is realized by the following technical scheme:
in a first aspect of the present invention, there is provided a matching search method of spectrum data, the spectrum including a nuclear magnetic resonance spectrum, an infrared spectrum and a mass spectrum, comprising the steps of:
acquiring first spectrogram data to be retrieved, wherein the first spectrogram data comprises a plurality of retrieval peak values; simultaneously acquiring data standardization requirements, wherein the data standardization requirements comprise accurate matching standards requiring the same quantity of peak data and/or boundary matching standards requiring the quantity of matched peak data to be larger than the quantity of search peak data;
expanding each search peak value data into a range value with a corresponding number according to a preset threshold value;
searching in a corresponding spectrogram database by utilizing the range value to obtain second spectrogram data in the spectrogram database, wherein a plurality of matching peak values of the second spectrogram data can respectively fall into the corresponding range value;
and displaying second spectrogram data according to the data standardization requirement.
Further, the first spectrogram data to be retrieved is data manually input by a user or data obtained according to spectrum analysis software;
when the first spectrogram data to be retrieved is manually input data by a user, the data standardization requirement is the selection of the accuracy judgment of the retrieval peak value data of the first spectrogram data by the user; when the first spectrogram data to be retrieved is data obtained according to the spectrum analysis software, the data standardization requirement is the result of accuracy judgment of the spectrum analysis software on the retrieval peak value data of the first spectrogram data or the selection of accuracy judgment of the peak value data of the first spectrogram data by a user.
Further, the setting mode of the preset threshold includes:
setting a general first threshold range, and expanding the search peak data of all types of first spectrogram data according to the first threshold range; or:
setting a corresponding second threshold range according to different types of spectrograms, and expanding the search peak data of the first spectrogram data of the corresponding type according to the second threshold range; or:
performing first calculation by using the maximum value and the minimum value of the search peak data, taking the first calculated value as a third threshold range, expanding the search peak data of the first spectrogram data according to the third threshold range, wherein the first calculation mode is as follows:
|X|=(lgX max -lgX min )*X min
Where |X| is a threshold value, X max To retrieve the maximum value of the peak data, X min To retrieve the minimum of the peak data, X is a third threshold range.
Further, the method further comprises:
searching in a corresponding spectrogram database by utilizing the range value to obtain third spectrogram data in the spectrogram database, wherein a plurality of matched peak value data of the third spectrogram data can partially fall into the corresponding range value;
and displaying third spectrogram data according to the boundary matching standard.
In a second aspect of the present invention, there is provided a matching retrieval system for spectrogram data, the spectrogram comprising a nuclear magnetic resonance spectrogram, an infrared spectrogram and a mass spectrogram, comprising:
and a retrieval data acquisition module: the method comprises the steps of acquiring first spectrogram data to be retrieved, wherein the first spectrogram data comprises a plurality of retrieval peak data; simultaneously acquiring data standardization requirements, wherein the data standardization requirements comprise accurate matching standards requiring the same quantity of peak data and/or boundary matching standards requiring the quantity of matched peak data to be larger than the quantity of search peak data;
the range expansion module: the search peak value data processing module is used for expanding each search peak value data into range values with corresponding numbers according to a preset threshold value;
A first retrieval module: the method comprises the steps of searching in a corresponding spectrogram database by utilizing the range value to obtain second spectrogram data in the spectrogram database, wherein a plurality of matched peak value data of the second spectrogram data can respectively fall into the corresponding range value;
a first display module: for displaying second spectrogram data according to the data standardization requirements.
Further, in the retrieval data acquisition module, the first spectrogram data to be retrieved is data manually input by a user or data obtained according to spectrum analysis software;
when the first spectrogram data to be retrieved is manually input data by a user, the data standardization requirement is the selection of the accuracy judgment of the retrieval peak value data of the first spectrogram data by the user; when the first spectrogram data to be retrieved is data obtained according to the spectrum analysis software, the data standardization requirement is the result of accuracy judgment of the spectrum analysis software on the retrieval peak value data of the first spectrogram data or the selection of accuracy judgment of the peak value data of the first spectrogram data by a user.
Further, in the range expansion module, the setting manner of the preset threshold includes:
setting a general first threshold range, and expanding the search peak data of all types of first spectrogram data according to the first threshold range; or:
Setting a corresponding second threshold range according to different types of spectrograms, and expanding the search peak data of the first spectrogram data of the corresponding type according to the second threshold range; or:
performing first calculation by using the maximum value and the minimum value of the search peak data, taking the first calculated value as a third threshold range, expanding the search peak data of the first spectrogram data according to the third threshold range, wherein the first calculation mode is as follows:
|X|=(lgX max -lgX min )*X min
where |X| is a threshold value, X max To retrieve the maximum value of the peak data, X min To retrieve the minimum of the peak data, X is a third threshold range.
Further, the system further comprises:
the second retrieval module is used for retrieving in the corresponding spectrogram database by utilizing the range value to obtain third spectrogram data in the spectrogram database, wherein a plurality of matched peak value data of the third spectrogram data can partially fall into the corresponding range value;
and the second display module is used for displaying the third spectrogram data according to the boundary matching standard.
In a third aspect of the present invention, there is provided an electronic device including a storage unit and a processing unit, where the storage unit stores computer instructions executable on the processing unit, and the processing unit executes steps of the matching search method for spectrogram data when the processing unit executes the computer instructions.
In a fourth aspect of the present invention, there is provided a storage medium having stored thereon computer instructions which, when executed, perform the steps of the method for matching search of spectrogram data.
The beneficial effects of the invention are as follows:
(1) In an exemplary embodiment of the present invention, a function of searching a spectrum including a nuclear magnetic resonance spectrum, an infrared spectrum, and a mass spectrum is implemented, and a plurality of search peak data of the nuclear magnetic resonance spectrum, the infrared spectrum, and the mass spectrum can be searched, thereby implementing a function of convenient matching. Meanwhile, in order to solve the problem that the obtained first spectrogram data can be searched under the condition of fluctuation within an acceptable error range due to different influences of machine brands or environments and the like, the related content of a preset threshold value and a range value is set. In addition, in order to solve the problem that a user cannot determine whether elements to be searched (i.e., search peak data of the first spectrogram data) are all key elements of a corresponding compound structure, and a search result is inaccurate only by adopting one search mode, the present exemplary embodiment sets selection of data standardization requirements, so as to meet different requirements of the user.
(2) In an exemplary embodiment of the present invention, two types of first spectrum data acquisition manners are adopted to improve system usability.
(3) In an exemplary embodiment of the present invention, three specific implementations of the preset threshold are disclosed.
(4) In an exemplary embodiment of the present invention, the user may be given similar results (third spectrogram data) without retrieving the eligible second spectrogram data, thereby providing the user with relevant guidance and ideas.
Drawings
Fig. 1 is a diagram illustrating a method for matching and retrieving spectrogram data according to an exemplary embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made apparent and fully understood from the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the description of the present invention, it should be noted that directions or positional relationships indicated as being "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are directions or positional relationships described based on the drawings are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the apparatus or elements to be referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it should be noted that, unless explicitly specified and limited otherwise, terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.
In addition, the technical features of the different embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
Referring to fig. 1, fig. 1 shows a method for matching and retrieving spectrum data according to an exemplary embodiment of the present invention, where the spectrum includes a nuclear magnetic resonance spectrum, an infrared spectrum, and a mass spectrum, and the method includes the following steps:
acquiring first spectrogram data to be retrieved, wherein the first spectrogram data comprises a plurality of retrieval peak values; simultaneously acquiring data standardization requirements, wherein the data standardization requirements comprise accurate matching standards requiring the same quantity of peak data and/or boundary matching standards requiring the quantity of matched peak data to be larger than the quantity of search peak data;
Expanding each search peak value data into a range value with a corresponding number according to a preset threshold value;
searching in a corresponding spectrogram database by utilizing the range value to obtain second spectrogram data in the spectrogram database, wherein a plurality of matching peak values of the second spectrogram data can respectively fall into the corresponding range value;
and displaying second spectrogram data according to the data standardization requirement.
Specifically, in the present exemplary embodiment, first spectrogram data to be retrieved is acquired first, the first spectrogram data including a plurality of retrieval peak data. It should be noted that, the retrieved peak data may correspond to a structure of a compound to be retrieved, but it does not necessarily include all peak data of the structure of the compound to be retrieved, and thus needs to be processed accordingly according to the requirement of the standard of the obtained data. Correspondingly, there are two kinds of data standardization requirements, namely an exact match standard and a boundary match standard, wherein: the exact match criteria require the same number of peak data (i.e., the number of retrieved peak data and subsequent matched peak data), where it is represented that the first spectral data includes multiple retrieved peak data that are all peak data of the structure of the compound to be retrieved or that require exact match of the data retrieved in the spectral database; the boundary matching standard requires that the number of matching peak data is larger than the number of searching peak data, namely, the plurality of searching peak data included in the first spectrogram data are part of peak data of the structure of the compound to be searched.
It should be noted that, when the user selects the exact match criterion, the user can determine that the inputted values are all valid data, and can determine the peak belonging to the compound. The number of the peak values obtained by the search can be selected to be equal to the number of the input peak values, and the data which contain the input values and have the same number of the peak values can be found; because the chemical patterns have the same molecular formula, the chemical shift ranges are similar, the chemical environments are the same, but the actual structures are quite different. In addition, when the compound has diastereoisomers, two groups of peaks are arranged in each group of the spectrum, and when a user deduces several compounds with the same molecular formula but different structures according to the spectrum, precise matching can be selected.
For the boundary matching standard, the user can determine that a part of peaks truly belong to the compound, other peaks are unclear or uncertain (possibly solvent peaks or impurity peaks), and the number of the peaks obtained by searching can be selected to be larger than the number of the input peaks, so that all data containing the input values can be found. Since the user has information about the reactants and reaction conditions when determining the reaction products, a preliminary determination is made regarding the product profile, which peaks must be of the compound, and when other peak data are not determined, only the determined values can be entered, and boundary search is required.
Then, considering that the generation of nuclear magnetic resonance spectrum, infrared spectrum and mass spectrum can be affected by different machine brands or environments, each search peak value data is expanded into range values with corresponding numbers according to a preset threshold value, so that the follow-up search process can allow the fluctuation of the first map data within an acceptable error range.
And then searching in a corresponding spectrogram database by utilizing the range value, namely searching the nuclear magnetic resonance spectrogram in the nuclear magnetic resonance spectrogram database, searching the infrared spectrogram in the infrared spectrogram database and searching the mass spectrogram in the mass spectrogram database. Obtaining second spectrogram data in a spectrogram database, wherein a plurality of matching peak values of the second spectrogram data can respectively fall into corresponding range values, and at the moment, it can be seen that the second spectrogram data which can be retrieved from the spectrogram database can comprise two types: one is that the number of the matched peak data is the same as the number of the retrieved peak data, and the numerical value of the matched peak data can be one by one within the range value; there is also one in which the number of matching peak data is greater than the number of search peak data, and there is corresponding matching peak data in each range value, while one matching peak data appears in only one of the range values. Thereby realizing that "the plurality of matching peak data of the second spectrogram data can fall into the corresponding range values, respectively".
Finally, displaying second spectrogram data according to the data standardization requirement, namely: if the accurate matching standard is adopted, displaying the compound structures with the same peak data quantity meeting the matching requirement in the spectrogram database; and if the boundary matching standard is adopted, displaying a compound structure that the number of matching peak data meeting the matching requirement in the spectrogram database is larger than the number of search peak data.
Therefore, in the present exemplary embodiment, a function of searching for a spectrum including a nuclear magnetic resonance spectrum, an infrared spectrum, and a mass spectrum is realized, and a plurality of search peak data of the nuclear magnetic resonance spectrum, the infrared spectrum, and the mass spectrum can be searched, thereby realizing a function of convenient matching. Meanwhile, in order to solve the problem that the obtained first spectrogram data can be searched under the condition of fluctuation within an acceptable error range due to different influences of machine brands or environments and the like, the related content of a preset threshold value and a range value is set. In addition, in order to solve the problem that a user cannot determine whether elements to be searched (i.e., search peak data of the first spectrogram data) are all key elements of a corresponding compound structure, and a search result is inaccurate only by adopting one search mode, the present exemplary embodiment sets selection of data standardization requirements, and the data standardization requirements are acquired while the first spectrogram data is acquired, where the data standardization requirements include two types: the exact match standard requires the same number of peak data, and here, the representation is that the plurality of search peak data included in the first spectrogram data are all peak data of the compound structure to be searched or the data are required to be searched in the spectrogram database for exact match; the number of the matched peak data is required to be larger than the number of the search peak data by the boundary matching standard, namely, a plurality of search peak data included in the first spectrogram data are part of peak data of a compound structure to be searched, so that different requirements of users are met.
In addition, it should be noted that, for the case that "the plurality of matching peak values of the second spectrogram data can fall into the corresponding range values" in the boundary matching standard "respectively, when the condition needs to be satisfied, there is corresponding matching peak value data in each range value, and meanwhile, one matching peak value data only appears in one range value, that is, a" unique mark "manner is adopted. The reason why the value needs to be uniquely marked is that the range value is large, so that a plurality of data are all in the same range in some cases, but one peak value in the spectrogram data only needs to appear once in the spectrogram, so that the matching peak value data appearing in one range value for the first time is marked, and the matching peak value data appearing in the range again is put into other range values.
More preferably, in an exemplary embodiment, the first spectrum data to be retrieved is data manually input by a user or data obtained according to spectrum analysis software;
when the first spectrogram data to be retrieved is manually input data by a user, the data standardization requirement is the selection of the accuracy judgment of the retrieval peak value data of the first spectrogram data by the user; when the first spectrogram data to be retrieved is data obtained according to the spectrum analysis software, the data standardization requirement is the result of accuracy judgment of the spectrum analysis software on the retrieval peak value data of the first spectrogram data or the selection of accuracy judgment of the peak value data of the first spectrogram data by a user.
Specifically, in the present exemplary embodiment, the first spectrogram data to be retrieved can be obtained in two ways:
the first is that after the user gets the spectrogram, the peak position is selected by himself, and the corresponding search peak data is obtained according to the peak position, and at this time, the user inputs the corresponding search peak data. When the method is adopted, a user can determine whether the search peak data is accurate or not according to actual conditions, such as whether a map is clear or not, whether a plurality of peak positions are difficult to distinguish or not, or the like, namely, the user cannot determine whether elements to be searched are all key elements corresponding to a compound structure or not, so if the search peak data of the first spectrogram data input by the user is considered to be clear, the accurate matching standard in the data standard requirement is selected at the moment, and otherwise, the boundary matching standard is selected.
The second type of the system is that the system is directly docked with the first spectrum data of the spectrum analysis software, which may be spectrum analysis software built in the system or spectrum analysis software of an external system, and the spectrum analysis software may be OPUS/GRAMS/OMNIC/OCEANVIEW, etc., and is not limited herein. The spectral analysis software can directly acquire peak data for the obtained spectrogram, thereby forming search peak data. When this mode is adopted, the result of the accuracy judgment of the spectral analysis software on the search peak data of the first spectrogram data can be also the selection of the accuracy judgment of the peak data of the first spectrogram data by the user.
Therefore, in the present exemplary embodiment, two types of acquisition manners of the first spectrogram data are adopted, so that the usability of the system is improved.
More preferably, in an exemplary embodiment, the setting manner of the preset threshold includes:
setting a general first threshold range, and expanding the search peak data of all types of first spectrogram data according to the first threshold range; or:
setting a corresponding second threshold range according to different types of spectrograms, and expanding the search peak data of the first spectrogram data of the corresponding type according to the second threshold range; or:
performing first calculation by using the maximum value and the minimum value of the search peak data, taking the first calculated value as a third threshold range, expanding the search peak data of the first spectrogram data according to the third threshold range, wherein the first calculation mode is as follows:
|X|=(lgX max -lgX min )*X min
where |X| is a threshold value, X max To retrieve the maximum value of the peak data, X min To retrieve the minimum of the peak data, X is a third threshold range.
Specifically, in the present exemplary embodiment, a specific implementation manner of the preset threshold is disclosed, and specifically three manners may be adopted:
the first method is to set a general first threshold range, expand the search peak data of all types of first spectrogram data according to the first threshold range, namely expand the search peak data by adopting the size of "+ -A"; this method is highly versatile, but is not practical because it has different characteristics (content such as data size of search peak data, data range of search peak data, etc.) for different spectra (nuclear magnetic resonance spectrum, infrared spectrum, and mass spectrum).
The second is to set a corresponding second threshold range for different types of spectrograms, expand the search peak data of the first spectrogram data of the corresponding type according to the second threshold range, for example, expand the search peak data by using the size of "+ -B" for nuclear magnetic resonance spectrograms, expand the search peak data by using the size of "+ -C" for infrared spectrograms, and expand the search peak data by using the size of "+ -D" for mass spectrograms; compared with the former mode, the mode has stronger applicability, but needs to be set for each type in turn, even needs to be set for the lower concepts of each type (such as hydrogen spectrum, carbon spectrum and even other types in nuclear magnetic resonance spectrum), and if a new type appears, the mode cannot be adapted.
And thirdly, performing first calculation by using the maximum value and the minimum value of the search peak data, and taking the value obtained by the first calculation as a third threshold range. The method is not only applicable to various types, but also takes the maximum value and the minimum value of the search peak value data into consideration, so that the threshold value setting is more available by considering the problem that different patterns are different in chemical shift orders.
More preferably, in an exemplary embodiment, the method further comprises:
searching in a corresponding spectrogram database by utilizing the range value to obtain third spectrogram data in the spectrogram database, wherein a plurality of matched peak value data of the third spectrogram data can partially fall into the corresponding range value;
and displaying third spectrogram data according to the boundary matching standard.
Specifically, in the present exemplary embodiment, a search result manner is also provided, that is, if the second spectrogram data cannot be searched in the spectrogram database by the given first spectrogram data, the third spectrogram data may be displayed by the boundary matching standard at this time, where a plurality of matching peak values of the third spectrogram data may partially fall into the corresponding range values. In this way, similar results can be given to the user without retrieving the eligible second spectrogram data, thereby providing the user with relevant guidance and ideas.
Of course, the method may also be performed simultaneously when the user can obtain the second spectrogram data, but at this time, the third spectrogram data may be given a lower display weight (compared with the second spectrogram data). And the third spectrogram data meeting a certain repetition proportion can be set for display.
Preferably, in an exemplary embodiment, during searching, the searching peak data of the first spectrogram data is arranged in a size sequence (from small to large or from large to small), and meanwhile, the data in the spectrogram database are arranged in the same time manner, so that searching comparison is convenient.
The following is exemplified by infrared spectra and is presented using a third threshold range in a preferred exemplary embodiment:
the data in the infrared spectrogram database includes:
1.[3053, 3054, 1068, 1071, 1190, 455]
2.[3053, 3058, 1071, 1190]
3.[3053, 3054, 1068, 1075, 1177, 1190, 452]
4.[3053, 3866, 1070, 1089, 1092, 454]
5.[3053, 3053, 1066, 1087, 458]
6.[3052, 3055, 1067, 1070, 1188, 475]
7.[3051, 3053, 1055, 1067, 1068, 1070, 1189, 480, 66]
8.[3053, 3053, 1055, 1066, 1070, 1187, 355]
at this time, the user inputs first spectrogram data including a plurality of search peak data: [3052, 3054, 1067, 1071, 1189, 455].
Firstly, calculating the first spectrogram data into range values with corresponding numbers by using a third threshold range calculation mode: |x|= (lg 3052-lg 455) ×455=378, i.e. the third threshold range x= (-378, +378), at which point the range value correspondence of the first spectrogram data is modified as: [ (2674, 3430), (2676, 3432), (689, 1445), (693, 1449), (811, 1567), (77, 833) ].
Then, the array is divided into 6 groups of ranges and is searched in an infrared spectrogram database, and the data meeting the conditions are screened out. The infrared spectrogram database is satisfactory to be the second spectrogram data, namely the data of No. 1, no. 6, no. 7 and No. 8.
And finally, displaying the corresponding second spectrogram data according to the data standardization requirement input by the user:
(1) When the data standardization requirement input by the user is an exact matching standard, that is, the number of elements of the second spectrogram data (matching peak data) is the same as the number of elements of the first spectrogram data (search peak data), the search result is:
1.[3053, 3054, 1068, 1071, 1190, 455]
6.[3052, 3055, 1067, 1070, 1188, 475]
(2) When the data standardization requirement input by the user is an exact matching standard, namely, the element number (matching peak value data) of the second spectrogram data is larger than the element number (searching peak value data) of the first spectrogram data, the searching result is that:
7.[3051, 3053, 1055, 1067, 1068, 1070, 1189, 480, 66]
8.[3053, 3053, 1055, 1066, 1070, 1187, 355]
in addition, when the user wants to view the third spectrogram data (without other restrictions), the search result is:
3.[3053, 3054, 1068, 1075, 1177, 1190, 452]
4.[3053, 3866, 1070, 1089, 1092, 454]
a further exemplary embodiment of the present invention provides a matching retrieval system of spectrogram data, including a nuclear magnetic resonance spectrogram, an infrared spectrogram, and a mass spectrogram, including:
and a retrieval data acquisition module: the method comprises the steps of acquiring first spectrogram data to be retrieved, wherein the first spectrogram data comprises a plurality of retrieval peak data; simultaneously acquiring data standardization requirements, wherein the data standardization requirements comprise accurate matching standards requiring the same quantity of peak data and/or boundary matching standards requiring the quantity of matched peak data to be larger than the quantity of search peak data;
The range expansion module: the search peak value data processing module is used for expanding each search peak value data into range values with corresponding numbers according to a preset threshold value;
a first retrieval module: the method comprises the steps of searching in a corresponding spectrogram database by utilizing the range value to obtain second spectrogram data in the spectrogram database, wherein a plurality of matched peak value data of the second spectrogram data can respectively fall into the corresponding range value;
a first display module: for displaying second spectrogram data according to the data standardization requirements.
Correspondingly, in the exemplary embodiment, in the retrieving data obtaining module, the first spectrogram data to be retrieved is data manually input by a user or data obtained according to spectrum analysis software;
when the first spectrogram data to be retrieved is manually input data by a user, the data standardization requirement is the selection of the accuracy judgment of the retrieval peak value data of the first spectrogram data by the user; when the first spectrogram data to be retrieved is data obtained according to the spectrum analysis software, the data standardization requirement is the result of accuracy judgment of the spectrum analysis software on the retrieval peak value data of the first spectrogram data or the selection of accuracy judgment of the peak value data of the first spectrogram data by a user.
Correspondingly, in this exemplary embodiment, in the range expansion module, the setting manner of the preset threshold includes:
setting a general first threshold range, and expanding the search peak data of all types of first spectrogram data according to the first threshold range; or:
setting a corresponding second threshold range according to different types of spectrograms, and expanding the search peak data of the first spectrogram data of the corresponding type according to the second threshold range; or:
performing first calculation by using the maximum value and the minimum value of the search peak data, taking the first calculated value as a third threshold range, expanding the search peak data of the first spectrogram data according to the third threshold range, wherein the first calculation mode is as follows:
|X|=(lgX max -lgX min )*X min
where |X| is a threshold value, X max To retrieve the maximum value of the peak data, X min To retrieve the minimum of the peak data, X is a third threshold range.
Correspondingly, in this exemplary embodiment, the system further comprises:
the second retrieval module is used for retrieving in the corresponding spectrogram database by utilizing the range value to obtain third spectrogram data in the spectrogram database, wherein a plurality of matched peak value data of the third spectrogram data can partially fall into the corresponding range value;
And the second display module is used for displaying the third spectrogram data according to the boundary matching standard.
In a third aspect of the present invention, there is provided an electronic device including a storage unit and a processing unit, where the storage unit stores computer instructions executable on the processing unit, and the processing unit executes steps of the matching search method for spectrogram data when the processing unit executes the computer instructions.
The electronic device is in the form of a general purpose computing device. Components of an electronic device may include, but are not limited to: the at least one processing unit, the at least one memory unit, and a bus connecting the different system components (including the memory unit and the processing unit).
Wherein the storage unit stores program code executable by the processing unit such that the processing unit performs steps according to various exemplary embodiments of the present invention described in the above section of the exemplary method of the present specification. For example, the processing unit may perform the method as shown in fig. 1.
The memory unit may include readable media in the form of volatile memory units, such as Random Access Memory (RAM) 3201 and/or cache memory units, and may further include Read Only Memory (ROM).
The storage unit may also include a program/utility having a set (at least one) of program modules including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
The bus may be one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device may also communicate with one or more external devices (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device, and/or with any device (e.g., router, modem, etc.) that enables the electronic device to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface. And, the electronic device may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through a network adapter. The network adapter communicates with other modules of the electronic device via a bus. It should be appreciated that other hardware and/or software modules may be used in connection with an electronic device, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
As will be readily appreciated by those skilled in the art from the foregoing description, the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Accordingly, the technical solution according to the present exemplary embodiment may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the present exemplary embodiment.
In a fourth aspect of the present invention, there is provided a storage medium having stored thereon computer instructions which, when executed, perform the steps of the method for matching search of spectrogram data.
Based on this understanding, the technical solution of the present embodiment may be essentially or, what contributes to the prior art, or part of the technical solution may be embodied in the form of a software product (program product) stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to execute all or part of the steps of the method described in the embodiments of the present invention.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
It is apparent that the above examples are given by way of illustration only and not by way of limitation, and that other variations or modifications may be made in the various forms based on the above description by those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. And obvious variations or modifications thereof are contemplated as falling within the scope of the present invention.

Claims (8)

1. The spectrogram data matching and searching method comprises a nuclear magnetic resonance spectrogram, an infrared spectrogram and a mass spectrogram, and is characterized in that: the method comprises the following steps:
acquiring first spectrogram data to be retrieved, wherein the first spectrogram data comprises a plurality of retrieval peak values; simultaneously acquiring data standardization requirements, wherein the data standardization requirements comprise accurate matching standards requiring the same quantity of peak data and/or boundary matching standards requiring the quantity of matched peak data to be larger than the quantity of search peak data;
expanding each search peak value data into a range value with a corresponding number according to a preset threshold value;
searching in a corresponding spectrogram database by utilizing the range value to obtain second spectrogram data in the spectrogram database, wherein a plurality of matching peak values of the second spectrogram data can respectively fall into the corresponding range value;
displaying second spectrogram data according to the data standardization requirement;
the setting mode of the preset threshold value comprises the following steps:
setting a general first threshold range, and expanding the search peak data of all types of first spectrogram data according to the first threshold range; or:
setting a corresponding second threshold range according to different types of spectrograms, and expanding the search peak data of the first spectrogram data of the corresponding type according to the second threshold range; or:
Performing first calculation by using the maximum value and the minimum value of the search peak data, taking the first calculated value as a third threshold range, expanding the search peak data of the first spectrogram data according to the third threshold range, wherein the first calculation mode is as follows:
|X|=(lgX max -lgX min )*X min
where |X| is a threshold value, X max To retrieve the maximum value of the peak data, X min To retrieve the minimum of the peak data, X is a third threshold range.
2. The matching search method of spectrogram data according to claim 1, characterized in that: the first spectrogram data to be retrieved are data manually input by a user or data obtained according to spectrum analysis software;
when the first spectrogram data to be retrieved is manually input data by a user, the data standardization requirement is the selection of the accuracy judgment of the retrieval peak value data of the first spectrogram data by the user; when the first spectrogram data to be retrieved is data obtained according to the spectrum analysis software, the data standardization requirement is the result of accuracy judgment of the spectrum analysis software on the retrieval peak value data of the first spectrogram data or the selection of accuracy judgment of the peak value data of the first spectrogram data by a user.
3. The matching search method of spectrogram data according to claim 1, characterized in that: the method further comprises the steps of:
Searching in a corresponding spectrogram database by utilizing the range value to obtain third spectrogram data in the spectrogram database, wherein a plurality of matched peak value data of the third spectrogram data can partially fall into the corresponding range value;
and displaying third spectrogram data according to the boundary matching standard.
4. The spectrogram comprises a nuclear magnetic resonance spectrogram, an infrared spectrogram and a mass spectrum, and is characterized in that: comprising the following steps:
and a retrieval data acquisition module: the method comprises the steps of acquiring first spectrogram data to be retrieved, wherein the first spectrogram data comprises a plurality of retrieval peak data; simultaneously acquiring data standardization requirements, wherein the data standardization requirements comprise accurate matching standards requiring the same quantity of peak data and/or boundary matching standards requiring the quantity of matched peak data to be larger than the quantity of search peak data;
the range expansion module: the search peak value data processing module is used for expanding each search peak value data into range values with corresponding numbers according to a preset threshold value;
a first retrieval module: the method comprises the steps of searching in a corresponding spectrogram database by utilizing the range value to obtain second spectrogram data in the spectrogram database, wherein a plurality of matched peak value data of the second spectrogram data can respectively fall into the corresponding range value;
A first display module: for displaying second spectrogram data according to the data standardization requirements;
in the range expansion module, the setting mode of the preset threshold value includes:
setting a general first threshold range, and expanding the search peak data of all types of first spectrogram data according to the first threshold range; or:
setting a corresponding second threshold range according to different types of spectrograms, and expanding the search peak data of the first spectrogram data of the corresponding type according to the second threshold range; or:
performing first calculation by using the maximum value and the minimum value of the search peak data, taking the first calculated value as a third threshold range, expanding the search peak data of the first spectrogram data according to the third threshold range, wherein the first calculation mode is as follows:
|X|=(lgX max -lgX min )*X min
where |X| is a threshold value, X max To retrieve the maximum value of the peak data, X min To retrieve the minimum of the peak data, X is a third threshold range.
5. The matching retrieval system for spectrogram data as recited in claim 4, wherein: in the retrieval data acquisition module, the first spectrogram data to be retrieved is data manually input by a user or data obtained according to spectrum analysis software;
When the first spectrogram data to be retrieved is manually input data by a user, the data standardization requirement is the selection of the accuracy judgment of the retrieval peak value data of the first spectrogram data by the user; when the first spectrogram data to be retrieved is data obtained according to the spectrum analysis software, the data standardization requirement is the result of accuracy judgment of the spectrum analysis software on the retrieval peak value data of the first spectrogram data or the selection of accuracy judgment of the peak value data of the first spectrogram data by a user.
6. The matching retrieval system for spectrogram data as recited in claim 4, wherein: the system further comprises:
the second retrieval module is used for retrieving in the corresponding spectrogram database by utilizing the range value to obtain third spectrogram data in the spectrogram database, wherein a plurality of matched peak value data of the third spectrogram data can partially fall into the corresponding range value;
and the second display module is used for displaying the third spectrogram data according to the boundary matching standard.
7. An electronic device comprising a memory unit and a processing unit, the memory unit having stored thereon computer instructions executable on the processing unit, characterized by: the processing unit executes the steps of the matching search method for spectrogram data according to any one of claims 1 to 3 when executing the computer instructions.
8. A storage medium having stored thereon computer instructions, characterized by: the computer instructions, when executed, perform the steps of the method for matching and retrieving spectrogram data according to any one of claims 1 to 3.
CN202310590733.4A 2023-05-24 2023-05-24 Spectrogram data matching retrieval method, system, electronic equipment and storage medium Active CN116304259B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310590733.4A CN116304259B (en) 2023-05-24 2023-05-24 Spectrogram data matching retrieval method, system, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310590733.4A CN116304259B (en) 2023-05-24 2023-05-24 Spectrogram data matching retrieval method, system, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116304259A CN116304259A (en) 2023-06-23
CN116304259B true CN116304259B (en) 2023-08-04

Family

ID=86785537

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310590733.4A Active CN116304259B (en) 2023-05-24 2023-05-24 Spectrogram data matching retrieval method, system, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116304259B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112014515A (en) * 2019-05-30 2020-12-01 萨默费尼根有限公司 Operating a mass spectrometer with a mass spectral database search
CN113640445A (en) * 2021-08-11 2021-11-12 贵州中烟工业有限责任公司 Characteristic peak identification method based on image processing, computing equipment and storage medium
CN114186596A (en) * 2022-02-17 2022-03-15 天津国科医工科技发展有限公司 Multi-window identification method and device for spectrogram peaks and electronic equipment

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004072103A2 (en) * 2003-02-11 2004-08-26 Activx Biosciences, Inc. Macromolecule identification made by mass spectrometry and database searching
US7498568B2 (en) * 2005-04-29 2009-03-03 Agilent Technologies, Inc. Real-time analysis of mass spectrometry data for identifying peptidic data of interest
WO2015107690A1 (en) * 2014-01-20 2015-07-23 株式会社島津製作所 Tandem mass spectrometry data processing device
CN103955518B (en) * 2014-05-06 2017-10-31 北京华泰诺安探测技术有限公司 A kind of matching process of detectable substance spectrogram and database spectrogram
CN104458785B (en) * 2014-12-12 2016-09-07 中国科学院武汉物理与数学研究所 A kind of NMR spectrum spectral peak alignment and spectral peak extracting method
WO2019104487A1 (en) * 2017-11-28 2019-06-06 深圳达闼科技控股有限公司 Mixture detection method and device
US10768164B2 (en) * 2018-10-30 2020-09-08 Ganzu Province Transportation Planning, Survey & Design Institute Co., Ltd. Method for fast detecting pavement asphalt and early warning based on infrared spectrum big data
CN112362609A (en) * 2019-07-24 2021-02-12 红塔烟草(集团)有限责任公司 Method for identifying oil stain smoke pollution source based on infrared spectrum technology
CN113933373B (en) * 2021-12-16 2022-02-22 成都健数科技有限公司 Method and system for determining organic matter structure by using mass spectrum data
CN115359847A (en) * 2022-08-09 2022-11-18 国科温州研究院(温州生物材料与工程研究所) Peak searching algorithm for proteomics series mass spectrogram
CN116312845A (en) * 2022-12-14 2023-06-23 药融云数字科技(成都)有限公司 Chemical structure prediction method and system based on characteristic groups, storage medium and terminal
CN115858838A (en) * 2022-12-19 2023-03-28 珠海高凌信息科技股份有限公司 Deep learning-based mass spectrogram search matching method and device and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112014515A (en) * 2019-05-30 2020-12-01 萨默费尼根有限公司 Operating a mass spectrometer with a mass spectral database search
CN113640445A (en) * 2021-08-11 2021-11-12 贵州中烟工业有限责任公司 Characteristic peak identification method based on image processing, computing equipment and storage medium
CN114186596A (en) * 2022-02-17 2022-03-15 天津国科医工科技发展有限公司 Multi-window identification method and device for spectrogram peaks and electronic equipment

Also Published As

Publication number Publication date
CN116304259A (en) 2023-06-23

Similar Documents

Publication Publication Date Title
Polavarapu Ab initio vibrational Raman and Raman optical activity spectra
CN104813324B (en) Method and apparatus for exporting identification polymer species from mass spectrography
CN1898674B (en) Methods for calibrating mass spectrometry (ms) and other instrument systems and for processing ms and other data
US7763846B2 (en) Method of analyzing mass analysis data and apparatus for the method
US6008490A (en) Method and apparatus for measuring and analyzing mass spectrum
CN102971623B (en) Analysis data processing method and device
US20140274751A1 (en) Chemical identification using a chromatography retention index
US9595426B2 (en) Method and system for mass spectrometry data analysis
CN116304259B (en) Spectrogram data matching retrieval method, system, electronic equipment and storage medium
EP2590206A1 (en) Method and device for computing molecular isotope distributions and for estimating the elemental composition of a molecule from an isotopic distribution
CN205691671U (en) Frequency-selecting electromagnetic radiation monitoring instrument
CN117273151B (en) Scientific instrument use analysis method, device and system based on large language model
Brazier et al. Fourier transform detection of laser‐induced fluorescence from the CCN free radical
JP2005083952A (en) Liquid chromatograph mass spectroscope
US4987548A (en) Analyzer of partial molecular structures
JP2013002967A (en) Mass spectroscopy data display device and mass spectroscopy data display program
Pretsch Spectra interpretation of organic compounds
US10453227B2 (en) Mass spectrometry data processing apparatus
Motiyenko et al. Millimeter and submillimeter wave spectra of 13C methylamine
US20150228464A1 (en) Systems and Methods for Utilizing Accurate Mass Information for Elemental Composition Determination
US8073639B2 (en) Method for identifying a convolved peak
CN104798174A (en) Compound identification using multiple spectra at different collision energies
CN113744814B (en) Mass spectrum data library searching method and system based on Bayesian posterior probability model
Pyne et al. Spectrochimica acta part A: molecular and biomolecular spectroscopy
Jurs et al. Carbon-13 NMR spectral simulation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant