WO2024059750A2 - Diagnosis of ovarian cancer using targeted quantification of site-specific protein glycosylation - Google Patents

Diagnosis of ovarian cancer using targeted quantification of site-specific protein glycosylation Download PDF

Info

Publication number
WO2024059750A2
WO2024059750A2 PCT/US2023/074251 US2023074251W WO2024059750A2 WO 2024059750 A2 WO2024059750 A2 WO 2024059750A2 US 2023074251 W US2023074251 W US 2023074251W WO 2024059750 A2 WO2024059750 A2 WO 2024059750A2
Authority
WO
WIPO (PCT)
Prior art keywords
peptide
ovarian cancer
structures
peptide structure
data
Prior art date
Application number
PCT/US2023/074251
Other languages
French (fr)
Inventor
Chirag DHAR
Prasanna Ramachandran
Tomislav CAVAL
Original Assignee
Venn Biosciences Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Venn Biosciences Corporation filed Critical Venn Biosciences Corporation
Publication of WO2024059750A2 publication Critical patent/WO2024059750A2/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/20Protein or domain folding

Definitions

  • Embodiments of the present disclosure generally relate to methods and systems for analyzing peptide structures for diagnosing and/or treating ovarian cancer. More particularly, embodiments of the present disclosure relate to analyzing quantification data for a set of peptide structures detected in a biological sample obtained from a subject for use in diagnosing and/or treating the subject, the set of peptide structures being associated with ovarian cancer.
  • Protein glycosylation and other post-translational modifications play vital roles in virtually all aspects of human physiology. Unsurprisingly, faulty or altered protein glycosylation often accompanies various disease states. The identification of aberrant glycosylation provides opportunities for early detection, intervention, and treatment of affected subjects.
  • Current biomarker identification methods such as those developed in the fields of proteomics and genomics, can be used to detect indicators of certain diseases, such as cancer, and to differentiate certain types of cancer from other, non-cancerous diseases.
  • glycoproteomic analyses has not previously been used to successfully identify disease processes.
  • Glycoprotein analysis is fraught with challenges on several levels.
  • a single glycan composition in a peptide can contain a large number of isomeric structures due to different glycosidic linkages, branching patterns, and/or multiple monosaccharides having the same mass.
  • the presence of multiple glycans that share the same peptide backbone can lead to assay signals from various glycoforms, lowering their individual abundances compared to aglycosylated peptides. Accordingly, the development of algorithms that can identify glycan structures on peptide fragments remains elusive.
  • EOC Epithelial ovarian cancer
  • stage III or IV the majority of EOC cases are diagnosed at late-stage (stage III or IV), with 5-year survival rates between about 15% and 40%. Diagnosing early-stage EOC is impeded by initial clinical signs and symptoms that are generally nonspecific and commonly missed such as, for example, pelvic pain, urinary urgency/frequency, abdominal bloating, early satiety, loss of appetite, and weight loss.
  • An approach that is non-invasive, accurate, and reliable and that enables early diagnosis is needed.
  • An approach enabling early diagnosis may help reduce negative health outcomes in patients with ovarian cancer, reduce the under-treatment of ovarian cancer, and/or reduce the over-treatment of benign disease.
  • more strategic treatments can be provided with a diagnostic test that can assess whether a subject has early stage or late stage ovarian cancer.
  • a method for diagnosing a subject with respect to an ovarian cancer disease state includes receiving peptide structure data corresponding to a biological sample obtained from the subject.
  • the peptide structure data can be analyzed using a supervised machine learning model to generate a disease indicator that indicates whether the biological sample evidences the ovarian cancer disease state of having early stage or late stage ovarian cancer based on at least one peptide structure selected from one of a group of peptide structures identified in Tables 3B, 3C, or 3D.
  • a diagnosis output can be generated based on the disease indicator.
  • the disease indicator can include a score.
  • the method of generating the diagnosis output can include determining that the score falls above a selected threshold and generating the diagnosis output based on the score falling above the selected threshold, wherein the diagnosis output includes a classification of late stage ovarian cancer disease state.
  • the method of generating the diagnosis output can include determining that the score falls below a selected threshold and generating the diagnosis output based on the score falling below the selected threshold, wherein the diagnosis output includes a classification of early stage ovarian cancer disease state.
  • the score may include a probability score and the selected threshold is 0.5. Alternatively, the selected threshold may fall within a range between 0.30 and 0.65.
  • the analyzing the peptide structure data can include analyzing the peptide structure data using a binary classification model.
  • the peptide structure of the at least one peptide structure can include a glycopeptide structure defined by a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3D, with the peptide sequence being one of SEQ ID NOS: 126-175 in Table 3D as defined in Table 5.
  • the method can include training the supervised machine learning model using training data, wherein the training data comprises a plurality of peptide structure profiles for a plurality of subjects and a plurality of subject diagnoses for the plurality of subjects, wherein the plurality of subject diagnoses includes a diagnosis for any subject of the plurality of subjects determined to have early stage or late stage ovarian cancer.
  • the method can include performing a differential expression analysis using initial training data to compare a first portion of the plurality of subjects diagnosed with the classification of early stage ovarian cancer disease state versus a second portion of the plurality of subjects diagnosed with the classification of late stage ovarian cancer disease state; identifying a training group of peptide structures based on the differential expression analysis for use as prognostic markers for the ovarian cancer disease state; and forming the training data based on the training group of peptide structures identified.
  • the training of the supervised machine learning model can include reducing the training group of peptide structures to a final group of peptide structures identified in Tables 3B, 3C, or 3D.
  • each peptide structure profile of the plurality of peptide structure profiles can include a feature selected from one of a relative abundance and a concentration for a corresponding peptide structure.
  • the plurality of peptide structure profiles can include a first peptide structure profile with a relative abundance for a corresponding peptide structure and a second peptide structure profile with a concentration for the corresponding peptide structure.
  • the supervised machine learning model can include a logistic regression model.
  • the first group of peptide structures in Tables 3B, 3C, or 3D is used to distinguish between the ovarian cancer disease state being late stage or early stage.
  • the quantification data for a peptide structure of the set of peptide structures can include at least one of an abundance, a relative abundance, a normalized abundance, a relative quantity, an adjusted quantity, a normalized quantity, a relative concentration, an adjusted concentration, or a normalized concentration.
  • the peptide structure data can be generated using multiple reaction monitoring mass spectrometry (MRM-MS), wherein the using of the MRM-MS includes ionizing one or more glycopeptides to form ionized glycopeptides; filtering the ionized glycopeptides with a mass filter to form filtered glycopeptides; fragmenting the filtered glycopeptides in a collision chamber into product ions; and detecting the product ions.
  • MRM-MS multiple reaction monitoring mass spectrometry
  • the method can include preparing a sample of the biological sample using reduction, alkylation, and enzymatic digestion to form a prepared sample that includes a set of peptide structures.
  • the method of classifying early and late stage ovarian cancer can be implemented after the subject has already been diagnosed as having ovarian cancer.
  • the subject can be initially diagnosed for having ovarian cancer using one or more biomarkers in Tables 1, 2, or 3.
  • the generating the diagnosis output can include generating a report identifying that the biological sample evidences the early stage or late stage ovarian cancer disease state.
  • the generating a treatment output can be generated based on at least one of the diagnosis output or the disease indicator.
  • the treatment output can include at least one of an identification of a treatment to treat the subject or a treatment plan.
  • the treatment can include at least one of surgery, radiation therapy, a targeted drug therapy, chemotherapy, immunotherapy, hormone therapy, or neoadjuvant therapy.
  • the group of peptide structures in Tables 3B, 3C, or 3D is listed in order of relative significance to the disease indicator.
  • the method can further include preparing a sample of the biological sample using reduction, alkylation, and enzymatic digestion to form a prepared sample that includes a set of peptide structures.
  • the method can further include generating the peptide structure data from the prepared sample using multiple reaction monitoring mass spectrometry (MRM-MS).
  • MRM-MS multiple reaction monitoring mass spectrometry
  • a method of training a model to diagnose a subject with respect to an ovarian cancer disease state having a malignant pelvic tumor is described.
  • the method can include receiving quantification data for a panel of peptide structures for a plurality of samples for a plurality of subjects.
  • the plurality of subjects includes a first portion diagnosed with a classification of early stage ovarian cancer disease state and a second portion diagnosed with a classification of late stage ovarian cancer disease state.
  • the quantification data can include a plurality of peptide structure profiles for the plurality of subjects and training a machine learning model using the quantification data to diagnose a biological sample with respect to the ovarian cancer disease state using a group of peptide structures associated with the ovarian cancer disease state, wherein the group of peptide structures is identified in Tables 3B, 3C, or 3D.
  • the machine learning model can include a logistic regression model.
  • the method of training the model can further include identifying an initial plurality of peptide structure profiles, filtering the initial plurality of peptide structure profiles by a coefficient of variation to generate a plurality of peptide structure profiles for use in training the machine learning model.
  • the filtering can be performed to exclude peptide structure profiles having the coefficient of variation at or above 20%.
  • the training of the machine learning model can include reducing the plurality of peptide structure profiles using LASSO regression to identify a final group of peptide structures identified in Tables 3B, 3C, or 3D.
  • the quantification data for the panel of peptide structures for the plurality of subjects diagnosed with the plurality of ovarian cancer disease states can include at least one of an abundance, a relative abundance, a normalized abundance, a relative quantity, an adjusted quantity, a normalized quantity, a relative concentration, an adjusted concentration, or a normalized concentration.
  • the trained model can use a relative abundance for a first portion of the first group of peptide structures and a concentration for a second portion of the second group of peptide structures.
  • Each peptide structure profile of the plurality of peptide structure profiles includes a feature selected from one of a relative abundance and a concentration for a corresponding peptide structure.
  • the plurality of peptide structure profiles can include a first peptide structure profile with a relative abundance for a corresponding peptide structure and a second peptide structure profile with a concentration for the corresponding peptide structure.
  • a composition can include at least one of peptide structures identified in Tables 3B, 3C, or 3D.
  • a method for diagnosing a subject with respect to an ovarian cancer disease state is described. The method can include analyzing the peptide structure data using a supervised machine learning model to generate a disease indicator that indicates whether a biological sample evidences the ovarian cancer disease state of having early stage or late stage ovarian cancer based on a group of glycopeptide structures.
  • the group of glycopeptide structures can include tri-antennary or tetra-antennary sialic acid moieties, wherein a portion of the glycopeptide structures of the group are fucosylated. A diagnosis is then outputted based on the disease indicator.
  • the group of glycopeptide structures can include at least one, at least three, at least five, or at least 10 glycopeptide structure identified in Tables 3B, 3C, or 3D
  • the peptide structure data was generated with a mass spectrometer using the biological sample obtained from the subject.
  • the method can further include preparing a sample of the biological sample using reduction, alkylation, and enzymatic digestion to form a prepared sample that includes a set of peptide structures.
  • the peptide structure data can be generated from the prepared sample using multiple reaction monitoring mass spectrometry (MRM-MS).
  • MRM-MS multiple reaction monitoring mass spectrometry
  • the use of the MRM-MS can include ionizing one or more glycopeptides to form ionized glycopeptides; filtering the ionized glycopeptides with a mass filter to form filtered glycopeptides; fragmenting the filtered glycopeptides in a collision chamber into product ions; and detecting the product ions.
  • a system comprising one or more data processors is described according to various embodiments.
  • the system comprises a non-transitory computer readable storage medium containing instructions which, when executed on the one or more data processors, cause the one or more data processors to perform part or all of any of the methods described herein.
  • a computer-program product tangibly embodied in a non-transitory machine-readable storage medium, including instructions configured to cause one or more data processors to perform part or all of any one of the methods described according to various embodiments.
  • a system is described according to various embodiments.
  • the system comprises one or more data processors and a non-transitory computer readable storage medium containing instructions which, when executed on the one or more data processors, cause the one or more data processors to perform part or all of any one or more of the methods described herein.
  • a computer-program product tangibly embodied in a non-transitory machine-readable storage medium, including instructions configured to cause one or more data processors to perform part or all of any one or more of the methods described herein.
  • the peptide structure data is listed in Table 3D and the detected product ion comprises a first product having a m/z value listed in Table 4C.
  • the at least one peptide structure comprises a peptide sequence and a glycan structure, wherein the glycan structure is attached to a linking site position in the peptide sequence in accordance with one of Tables 1, 2, 3, 3B, 3C, and 3D.
  • the glycan structure of the peptide sequence corresponds to a glycan structure GL number in accordance with Tables 1, 2, 3, 3B, 3C, and 3D, wherein the glycan structure comprises a symbol structure in accordance with the glycan structure GL number according to Tables 1, 2, 3, 3B, 3C, 3D, and 7.
  • the glycan structure of the peptide sequence corresponds to a glycan structure GL number in accordance with Tables 1, 2, 3, 3B, 3C, and 3D, wherein the glycan structure comprises a composition in accordance with the glycan structure GL number, Tables 1, 2, 3, 3B, 3C, 3D, and 7.
  • a rightmost N-acetylgalactosamine (open square) of the glycan structure in Table 7 is attached to a linking site position in the peptide sequence in accordance with Tables 3 and 5.
  • a bottommost N-acetylglucosamine (dark square) of the glycan structure in Table 7 is attached to a linking site position in the peptide sequence in accordance with Tables 1, 2, 3, 3B, 3C, 3D, and 5.
  • composition comprising one or more peptide structures from Tables 1, 2, 3, 3B, 3C, and 3D.
  • the at least one peptide structure comprises a peptide sequence and a glycan structure, wherein the glycan structure is attached to a linking site position in the peptide sequence in accordance with Tables 1, 2, 3, 3B, 3C, and 3D.
  • the glycan structure of the peptide sequence corresponds to a glycan structure GL number in accordance with Tables 1, 2, 3, 3B, 3C, and 3D, wherein the glycan structure comprises a symbol structure in accordance with the glycan structure GL number according to Tables 1, 2, 3, 3B, 3C, 3D, and 7.
  • the glycan structure of the peptide sequence corresponds to a glycan structure GL number in accordance with Tables 1, 2, 3, 3B, 3C, and 3D, wherein the glycan structure comprises a composition in accordance with the glycan structure GL number, Tables 1, 2, 3, 3B, 3C, 3D, and 7.
  • a rightmost N-acetylgalactosamine (GalNAc) of the glycan structure in Table 7 is attached to a linking site position in the peptide sequence in accordance with Tables 3 and 5.
  • a bottommost N-acetylglucosamine (GlcNAc) of the glycan structure in Table 7 is attached to a linking site position in the peptide sequence in accordance with Tables 1, 2, 3, 3B, 3C, 3D, and 5.
  • the peptide sequence can be one of SEQ ID NOS: 130-135, 137, 139, 140, 143, 148, 149, 155, 158-162, 166, and 171.
  • the peptide structure of the at least one peptide structure comprises a glycopeptide structure defined by a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3D, with the peptide sequence being one of SEQ ID NOS: 130-135, 137, 139, 140, 143, 148, 149, 155, 158-162, 166, and 171 in Table 3D as defined in Table 5.
  • the peptide structure of the at least one peptide structure comprises a glycopeptide structure defined by a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3D, with the peptide sequence being one of SEQ ID NOS: 130-135, 137, 139, 140, 143, 148, 149, 155, 159-162, 166, and 171 in Table 3D as defined in Table 5.
  • the glycan structure corresponding to the peptide sequence of SEQ ID NOS: 131, 137, 143, 155, 159, 162, 166, and 171 includes a fucose and the fucose is in an outer arm orientation.
  • a peptide structure of the at least one peptide structure comprises a glycopeptide structure defined by a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3D, with the peptide sequence being one of SEQ ID NOS: 131, 137, 143, 155, 159, 162, 166, and 171 in Table 3D as defined in Table 5, wherein a fucose of the glycan structure comprises an outer arm orientation.
  • the at least one peptide structure is selected from one of a group of peptide structures identified in Tables 3D.
  • a peptide structure of the at least one peptide structure comprises a glycopeptide structure defined by a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3D, with the peptide sequence being one of SEQ ID NOS: 131-134, 137, 139, 140, 143, 151, 165-167 in Table 3D as defined in Table 5.
  • the glycan structure corresponding to the peptide sequence of SEQ ID NOS: 131, 137, and 143, includes a fucose and the fucose is in an outer arm orientation.
  • the outer arm orientation of the fucose comprises the fucose being linked to a N-acetylglucosamine by a a-(l-3/4) linkage.
  • a method of treating ovarian cancer in an individual comprising administering to the individual an ovarian cancer therapy, wherein the individual has been determined to be responsive to the ovarian cancer therapy via a trained machine learning classifier that distinguishes between responsive and non-responsive individuals who have received the ovarian cancer therapy, based at least in part on a group of peptide structures identified in Tables 3B, 3C, or 3D.
  • Figure 1 is a schematic diagram of an exemplary workflow 100 for the detection of peptide structures associated with a disease state for use in diagnosis and/or treatment in accordance with one or more embodiments.
  • Figure 2A is a schematic diagram of a preparation workflow in accordance with one or more embodiments.
  • Figure 2B is a schematic diagram of data acquisition in accordance with one or more embodiments.
  • Figure 3 is a block diagram of an analysis system in accordance with one or more embodiments.
  • Figure 4 is a block diagram of a computer system in accordance with various embodiments.
  • Figure 5 is a flowchart of a process for diagnosing a subject with respect to an ovarian cancer disease state in accordance with one or more embodiments based on Tables 1 or 2.
  • Figure 6 is a flowchart of a process for diagnosing a subject with respect to ovarian cancer disease state in accordance with one or more embodiments based on Table 3.
  • Figure 6B is a flowchart of a process for diagnosing a subject with respect to ovarian cancer disease state in accordance with one or more embodiments based on Table 3B.
  • Figure 7 is a flowchart of a process for training a model to diagnose a subject with respect to an ovarian cancer disease state in accordance with one or more embodiments.
  • Figure 8 is a table describing the distribution of the samples acquired in this exemplary retrospective analysis in accordance with one or more embodiments.
  • Figure 9 is a plot diagram illustrating the results of a principal component analysis performed to assess the segregation between healthy, benign pelvic tumor, and EOC samples across first and second principal components in accordance with one or more embodiments.
  • Figure 10 is a plot diagram illustrating the results of a principal component analysis performed to assess segregation between healthy, benign pelvic tumor, early EOC, late EOC, and missing (undocumented) samples).
  • FIG 11 is an illustration of a receiver operating characteristic (ROC) diagram corresponding to the multivariable model built to predict malignancy v. benign status of pelvic tumors in accordance with one or more embodiments.
  • ROC receiver operating characteristic
  • Figure 12 is an illustration of a diagram showing the probability distributions for the various groups using the multivariable model for predicting malignancy v. benign status of pelvic tumors in accordance with one or more embodiments.
  • Figure 13 is an illustration of a receiver operating characteristic (ROC) diagram corresponding to the multivariable model built to predict malignancy v. benign status of pelvic tumors in accordance with one or more embodiments.
  • ROC receiver operating characteristic
  • Figure 14 is an illustration of a diagram showing the probability distributions for the various groups using the multivariable model for predicting malignancy v. benign status of pelvic tumors in accordance with one or more embodiments.
  • Figure 15A to 15E are a plurality of charts illustrating the upregulation of fucosylated biomarkers having tri or tetra-antennary sialic acids from stages 1/2 to 3/4 of ovarian cancer and the down regulation of non-fucosylated biomarkers having tri or tetra- antennary sialic acids from stages 1/2 to 3/4 of ovarian cancer.
  • Figure 16 is an illustration of a diagram showing the probability distributions for early stage v. late stage ovarian cancer using training data set and the testing data set in accordance with one or more embodiments using the biomarkers of Table 3C.
  • Figure 17 is an illustration of a receiver operating characteristic (ROC) diagram corresponding to the multivariable model built to predict early stage v. late stage ovarian cancer in accordance with one or more embodiments.
  • ROC receiver operating characteristic
  • Figure 18 is a graph illustrating the fold changes for a plurality of tri- and tetra- antennary glycans glycopeptides that were either non-fucosylated or fucosylated.
  • Figure 19A is a graph illustrating the fold changes for pairs of tri- and tetra- antennary glycans glycopeptides that were either non-fucosylated or fucosylated.
  • Figure 19B is a graph illustrating the fold changes for triplets of tri- and tetra- antennary glycans glycopeptides that were either non-fucosylated, mono-fucosylated, or di- fucosylated. Both mono-fucosylated and di-fucosylated markers has median FC’s above 1 suggesting correlation of these markers with malignant EOC.
  • Figure 20 is an illustration of a diagram showing the probability distributions for early stage v. late stage ovarian cancer using training data set and the testing data set in accordance with one or more embodiments using the biomarkers of Table 3D
  • Figures 21A to 21E are graphs of the relative abundance of five distinct types of fucosylated glycopeptides in benign tumors, early stage EOC, and late stage EOC.
  • Figure 22 is a representative mass spectra showing breakdown fragments of 3 glycans and 4 glycan aggregates that indicate the presence of glycans with an outer arm fucosylated orientation.
  • glycoproteomics is an emerging field that can be used in the overall diagnosis and/or treatment of subjects with various types of diseases.
  • Glycoproteomics aims to determine the positions, identities, and quantities of glycans and glycosylated proteins in a given sample (e.g., blood sample, cell, tissue, etc.).
  • Protein glycosylation is one of the most common and most complex forms of post-translational protein modification, and can affect protein structure, conformation, and function.
  • glycoproteins may play crucial roles in important biological processes such as cell signaling, host-pathogen interactions, and immune response and disease. Glycoproteins may therefore be important to diagnosing different types of diseases.
  • protein glycosylation provides useful information about cancer and other diseases
  • analysis of protein glycosylation may be difficult as the glycan typically cannot be traced back to the protein site of origin with currently available methodologies.
  • Glycoprotein analysis can be challenging in general due to several reasons. For example, a single glycan composition in a peptide may contain a large number of isomeric structures because of different glycosidic linkages, branching, and many monosaccharides having the same mass.
  • MS mass spectrometry
  • This information can be used to distinguish the disease state from other states, diagnose a subject as having or not having the disease state, determine a likelihood that a subject has the disease state, determine whether a subject has one of early stage (stages 1 and 2) or late stage (stages 3 and 4) EOC, or a combination thereof.
  • such analysis may be useful in diagnosing an ovarian cancer disease state for a subject (e.g., a negative diagnosis for the ovarian cancer disease state or a positive diagnosis for the ovarian cancer disease state).
  • Sample collection and analysis can be collected at different time points for comparing ovarian cancer disease states over time for a subject.
  • the negative diagnosis may include a healthy state or a benign tumor state (i.e., “benign” as seen throughout).
  • An example of the positive diagnosis includes the subject suffering from a form of ovarian cancer (e.g., epithelial ovarian cancer (EOC)).
  • EOC epithelial ovarian cancer
  • a diagnosis can also assess a malignancy status of a previously identified pelvic (or adnexal) tumor (or mass).
  • a machine learning model is trained to analyze peptide structure data and generate a disease indicator that provides information relating to one or more diseases.
  • the peptide structure data comprises quantification metrics (e.g., abundance or concentration data) for peptide structures.
  • a peptide structure may be defined by an aglycosylated peptide sequence (e.g., a peptide or peptide fragment of a larger parent protein) or a glycosylated peptide sequence.
  • a glycosylated peptide sequence may be a peptide sequence having a glycan structure that is attached to a linking site (e.g., an amino acid residue) of the peptide sequence, which may occur via, for example, a particular atom of the amino acid residue).
  • a linking site e.g., an amino acid residue
  • Non-limiting examples of glycosylated peptides include N-linked glycopeptides and O-linked glycopeptides.
  • An ovarian cancer disease state may include any condition that can be diagnosed as cancer that occurs in in the ovaries. Many malignant pelvic tumors are ovarian cancer. Certain peptide structures that are associated with an ovarian cancer disease state may be more relevant to that disease state than other peptide structures that are also associated with that disease state.
  • Analyzing the abundance of peptide sequences and glycosylated peptide sequences in a biological sample may provide a more accurate way in which to distinguish a positive ovarian cancer disease state (e.g., a state including the presence of ovarian cancer) from a negative ovarian cancer disease state (e.g., healthy state, a benign tumor state, an absence of ovarian cancer, etc.).
  • This type of peptide structure analysis may be more conducive to generating accurate diagnoses as compared to glycoprotein analysis that focuses on analyzing glycoproteins that are too large to be resolved via mass spectrometry. Further, with glycoproteins, there may be too many potential proteoforms to consider.
  • analysis of peptide structure data in the manner described by the various embodiments herein may be more conducive to generating accurate diagnoses as compared to glycomic analysis that provides little to no information about what proteins and to which amino acid residue sites various glycan structures attach.
  • ovarian cancer treated with surgical resection will reoccur due to the metastasis.
  • tests that can diagnose metastatic ovarian cancer and monitor the progression of the disease (e.g., assessing the state of early vs late stage ovarian cancer).
  • Such a test may be based on either ELISA or mass spectrometry.
  • stage 1 the cancer is confined to the ovaries and hasn’t spread to the abdomen, pelvis or lymph nodes, nor to distant sites.
  • stage 2 the cancer has spread from one or both ovaries to other areas of the pelvis. However, the cancer hasn’t spread to nearby lymph nodes or distant sites.
  • Stages 1 and 2 are considered early stage.
  • stage 3 the cancer has spread to nearby lymph nodes and/or other parts of the abdomen, but it hasn’t spread to distant sites.
  • stage 4 the cancer has spread beyond the abdomen. Stages 3 and 4 are considered late stage.
  • glycopeptides having fucosylation were found through mass spectrometry measurements to be associated with metastatic ovarian cancer.
  • this type of glycopeptide had tri- and tetra-antennary N-glycans on certain proteins.
  • various proteins such as AGP1, AGP2, APOC3, FETUA, HPT, CLUS, A2MG, TRFE, VTNC, IGJ, and CFAH can be captured on an ELISA plate from patient samples followed by a lectin based detection (four lectins: LCA, AAL, PHA-E, PHA-L).
  • Mass spectrometry can be used to analyze serum for various glycoproteins and/or glycopeptides to differentiate between benign and malignant adnexal masses.
  • a distinct signature was found with the circulating N-glycoproteins that allows a differentiation between late stage (metastatic disease of stage III/IV) and early stage (stage I/II) epithelial ovarian cancer (EOC).
  • EOC epithelial ovarian cancer
  • Qiagen s Ingenuity Pathway Analysis package on this data, it was predicted that the signature markers are downstream of cytokine signaling.
  • the markers also suggest the presence of the sialyl Lewis X (sLex) epitope on N-glycans of certain liver-derived circulatory glycoproteins.
  • the methods, systems, and compositions provided by the embodiments described herein may enable an earlier and more accurate diagnosis of ovarian cancer in a subject as compared to currently available diagnostic modalities (e.g., imaging, biochemical tests) used for determining whether surgical intervention is indicated.
  • diagnostic modalities e.g., imaging, biochemical tests
  • various currently available non-invasive tests to distinguish between benign and malignant pelvic tumors rely on detection of the biomarker cancer antigen 125 (CA125).
  • CA125 cancer antigen 125
  • serum CA125 is not elevated in over 20% of ovarian carcinomas and is elevated in a variety of other malignant and non-malignant conditions.
  • the term “plurality” may be 2, 3, 4, 5, 6, 7, 8, 9, 10, or more.
  • the term “set of” means one or more. For example, a set of items includes one or more items.
  • the phrase “at least one of,” when used with a list of items, means different combinations of one or more of the listed items may be used and only one of the items in the list may be needed.
  • the item may be a particular object, thing, step, operation, process, or category.
  • “at least one of’ means any combination of items or number of items may be used from the list, but not all of the items in the list may be required.
  • “at least one of item A, item B, or item C” means item A; item A and item B; item B; item A, item B, and item C; item B and item C; or item A and C.
  • “at least one of item A, item B, or item C” means, but is not limited to, two of item A, one of item B, and ten of item C; four of item B and seven of item C; or some other suitable combination.
  • “substantially” means sufficient to work for the intended purpose. The term “substantially” thus allows for minor, insignificant variations from an absolute or perfect state, dimension, measurement, result, or the like such as would be expected by a person of ordinary skill in the field but that do not appreciably affect overall performance. When used with respect to numerical values or parameters or characteristics that can be expressed as numerical values, “substantially” means within ten percent.
  • amino acid generally refers to any organic compound that includes an amino group (e.g., -NH2), a carboxyl group (-COOH), and a side chain group (R) which varies based on a specific amino acid. Amino acids can be linked using peptide bonds.
  • alkylation generally refers to the transfer of an alkyl group from one molecule to another.
  • alkylation is used to react with reduced cysteines to prevent the re-formation of disulfide bonds after reduction has been performed.
  • linking site or “glycosylation site” as used herein generally refers to the location where a sugar molecule of a glycan or glycan structure is directly bound (e.g., covalently bound) to an amino acid of a peptide, a polypeptide, or a protein.
  • the linking site may be an amino acid residue and a glycan structure may be linked via an atom of the amino acid residue.
  • types of glycosylation can include N-linked glycosylation, O-linked glycosylation, C-linked glycosylation, S-linked glycosylation, and glycation.
  • biological sample generally refers to a specimen taken by sampling so as to be representative of the source of the specimen, typically, from a subject.
  • a biological sample can be representative of an organism as a whole, specific tissue, cell type, or category or sub-category of interest.
  • Biological samples may include, but are not limited to synovial fluid, whole blood, blood serum, blood plasma, urine, sputum, tissue, saliva, tears, spinal fluid, tissue section(s) obtained by biopsy; cell(s) that are placed in or adapted to tissue culture; sweat, mucous, fecal material, gastric fluid, abdominal fluid, amniotic fluid, cyst fluid, peritoneal fluid, pancreatic juice, breast milk, lung lavage, marrow, gastric acid, bile, semen, pus, aqueous humor, transudate, and the like including derivatives, portions and combinations of the foregoing.
  • biological samples include, but are not limited, to blood and/or plasma.
  • biological samples include, but are not limited, to urine or stool.
  • Biological samples include, but are not limited, to saliva. Biological samples include, but are not limited, to tissue dissections and tissue biopsies. Biological samples include, but are not limited, any derivative or fraction of the aforementioned biological samples.
  • the biological sample can include a macromolecule.
  • the biological sample can include a small molecule.
  • the biological sample can include a virus.
  • the biological sample can include a cell or derivative of a cell.
  • the biological sample can include an organelle.
  • the biological sample can include a cell nucleus.
  • the biological sample can include a rare cell from a population of cells.
  • the biological sample can include any type of cell, including without limitation prokaryotic cells, eukaryotic cells, bacterial, fungal, plant, mammalian, or other animal cell type, mycoplasmas, normal tissue cells, tumor cells, or any other cell type, whether derived from single cell or multicellular organisms.
  • the biological sample can include a constituent of a cell.
  • the biological sample can include nucleotides (e.g., ssDNA, dsDNA, RNA), organelles, amino acids, peptides, proteins, carbohydrates, glycoproteins, or any combination thereof.
  • the biological sample can include a matrix (e.g., a gel or polymer matrix) comprising a cell or one or more constituents from a cell (e.g., cell bead), such as DNA, RNA, organelles, proteins, or any combination thereof, from the cell.
  • a matrix e.g., a gel or polymer matrix
  • the biological sample may be obtained from a tissue of a subject.
  • the biological sample can include a hardened cell. Such hardened cells may or may not include a cell wall or cell membrane.
  • the biological sample can include one or more constituents of a cell but may not include other constituents of the cell. An example of such constituents may include a nucleus or an organelle.
  • the biological sample may include a live cell.
  • the live cell can be capable of being cultured.
  • biomarker generally refers to any measurable substance taken as a sample from a subject whose presence is indicative of some phenomenon. Non- limiting examples of such phenomenon can include a disease state, a condition, or exposure to a compound or environmental condition. In various embodiments described herein, biomarkers may be used for diagnostic purposes (e.g., to diagnose a health state, a disease state). The term “biomarker” can be used interchangeably with the term “marker.”
  • digesting a peptide generally refers to a biological process that employs enzymes to break specific amino acid peptide bonds.
  • digesting a peptide includes contacting the peptide with an digesting enzyme, e.g., trypsin to produce fragments of the glycopeptide.
  • an digesting enzyme e.g., trypsin to produce fragments of the glycopeptide.
  • a protease enzyme is used to digest a glycopeptide.
  • protease enzyme refers to an enzyme that performs proteolysis or breakdown of large peptides into smaller polypeptides or individual amino acids.
  • protease examples include, but are not limited to, one or more of a serine protease, threonine protease, cysteine protease, aspartate protease, glutamic acid protease, metalloprotease, asparagine peptide lyase, and any combinations of the foregoing.
  • Enzymatic digestion may be used in preparation for mass spectrometry using trypsin digestion protocols. Proteins may be digested using other proteases in preparation for mass spectrometry if access is limited to cleavage sites.
  • disease state generally refers to a condition that affects the structure or function of an organism.
  • causes of disease states may include pathogens, immune system dysfunctions, cell damage caused by aging, cell damage caused by other factors (e.g., trauma and cancer).
  • Disease states can include any state of a disease whether symptomatic or asymptomatic.
  • Disease states can include disease stages of a disease progression. Disease states can cause minor, moderate, or severe disruptions in structure or function of an organism (e.g., a subject).
  • fragment generally refers to an ion fragmentation process which occurs in a MRM-MS instrument. Fragmenting may produce various fragments having the same mass but varying with respect to their charge, e.g., some biomarkers described herein produce more than one product m/z.
  • glycan or “polysaccharide” as used herein, both generally refer to a carbohydrate residue of a glycoconjugate, such as the carbohydrate portion of a glycopeptide, glycoprotein, glycolipid, or proteoglycan. Glycans can include monosaccharides.
  • glycopeptide or “glycopolypeptide” as used herein, generally refers to a peptide or polypeptide comprising at least one glycan residue.
  • glycopeptides comprise carbohydrate moieties (e.g., one or more glycans) covalently attached to a side chain (i.e. R group) of an amino acid residue.
  • glycopeptide fragment or “glycosylated peptide fragment” or “glycopeptide” as used herein, generally refers to a glycosylated peptide (or glycopeptide) having an amino acid sequence that is the same as part (but not all) of the amino acid sequence of the glycosylated protein from which the glycosylated peptide is obtained, e.g., ion fragmentation within a MRM-MS instrument.
  • MRM refers to multiple-reaction-monitoring.
  • glycopeptide fragments or “fragments of a glycopeptide” refer to the fragments produced directly by using a mass spectrometer optionally after the glycoprotein has been digested enzymatically to produce the glycopeptides.
  • glycoprotein generally refers to a protein having at least one glycan residue bonded thereto.
  • a glycoprotein is a protein with at least one oligosaccharide chain covalently bonded thereto. Examples of glycoproteins include but are not limited to the peptide structures including glycan molecules shown in the various Tables presented herein.
  • a glycopeptide, as used herein, refers to a fragment of a glycoprotein, unless specified otherwise to the contrary.
  • liquid chromatography generally refers to a technique used to separate a sample into parts. Liquid chromatography can be used to separate, identify, and quantify components.
  • mass spectrometry generally refers to an analytical technique used to identify molecules. In various embodiments described herein, mass spectrometry can be involved in characterization and sequencing of proteins.
  • m/z or “mass-to-charge ratio,” as used herein, generally refers to an output value from a mass spectrometry instrument.
  • m/z can represent a relationship between the mass of a given ion and the number of elementary charges that it carries.
  • the “m” in m/z stands for mass and the “z” stands for charge.
  • m/z can be displayed on an x-axis of a mass spectrum.
  • the term “patient,” as used herein, generally refers to a mammalian subject.
  • the mammal can be a human, or an animal including, but not limited to an equine, porcine, canine, feline, ungulate, and primate animal.
  • the individual is a human.
  • the methods and uses described herein are useful for both medical and veterinary uses.
  • a “patient” is a human subject unless specified to the contrary.
  • peptide generally refers to amino acids linked by peptide bonds.
  • Peptides can include amino acid chains between 10 and 50 residues.
  • Peptides can include amino acid chains shorter than 10 residues, including, oligopeptides, dipeptides, tripeptides, and tetrapeptides.
  • Peptides can include chains longer than 50 residues and may be referred to as “polypeptides” or “proteins.”
  • the phrase “peptide,” is meant to include glycopeptides unless stated otherwise.
  • Protein or “polypeptide” or “peptide” may be used interchangeably herein and generally refer to a molecule including at least three amino acid residues. Proteins can include polymer chains made of amino acid sequences linked together by peptide bonds. Proteins may be digested in preparation for mass spectrometry using trypsin digestion protocols. Proteins may be digested using other proteases in preparation for mass spectrometry if access is limited to cleavage sites.
  • peptide structure generally refers to peptides or a portion thereof or glycopeptides or a portion thereof.
  • a peptide structure can include any molecule comprising at least two amino acids in sequence.
  • reduction generally refers to the gain of an electron by a substance.
  • a sugar can directly bind to a protein, thereby, reducing the amino acid to which it binds. Such reducing reactions can occur in glycosylation. In various embodiments, reduction may be used to break disulfide bonds between two cysteines.
  • sample generally refers to a sample from a subject of interest and may include a biological sample of a subject.
  • the sample may include a cell sample.
  • the sample may include a cell line or cell culture sample.
  • the sample can include one or more cells.
  • the sample can include one or more microbes.
  • the sample may include a nucleic acid sample or protein sample.
  • the sample may also include a carbohydrate sample or a lipid sample.
  • the sample may be derived from another sample.
  • the sample may include a tissue sample, such as a biopsy, core biopsy, needle aspirate, or fine needle aspirate.
  • the sample may include a fluid sample, such as a blood sample, urine sample, or saliva sample.
  • the sample may include a skin sample.
  • the sample may include a cheek swab.
  • the sample may include a plasma or serum sample.
  • the sample may include a cell-free or cell free sample.
  • a cell-free sample may include extracellular polynucleotides.
  • the sample may originate from blood, plasma, serum, urine, saliva, mucosal excretions, sputum, stool, or tears.
  • the sample may originate from red blood cells or white blood cells.
  • the sample may originate from feces, spinal fluid, CNS fluid, gastric fluid, amniotic fluid, cyst fluid, peritoneal fluid, marrow, bile, other body fluids, tissue obtained from a biopsy, skin, or hair.
  • sequence generally refers to a biological sequence including one-dimensional monomers that can be assembled to generate a polymer.
  • sequences include nucleotide sequences (e.g., ssDNA, dsDNA, and RNA), amino acid sequences (e.g., proteins, peptides, and polypeptides), and carbohydrates (e.g., compounds including Cm (H2O) chunk).
  • the term “subj ect,” as used herein, generally refers to an animal, such as a mammal (e.g., human) or avian (e.g., bird), or other organism, such as a plant.
  • the subject can include a vertebrate, a mammal, a rodent (e.g., a mouse), a primate, a simian or a human.
  • Animals may include, but are not limited to, farm animals, sport animals, and pets.
  • a subject can include a healthy or asymptomatic individual, an individual that has or is suspected of having a disease (e.g., cancer) or a pre-disposition to the disease, and/or an individual that is in need of therapy or suspected of needing therapy.
  • a subject can be a patient.
  • a subject can include a microorganism or microbe (e.g., bacteria, fungi, archaea, viruses). However, in the context of diagnosing ovarian cancer, the subject is female unless explicitly specified otherwise.
  • a subject may be one who has been previously identified as having a disease or a condition, and optionally has already undergone, or is undergoing, a therapeutic intervention for the disease or condition.
  • a subject can also be one who has not been previously diagnosed as having a disease or a condition.
  • a subject can be one who exhibits one or more risk factors for a disease or a condition, or a subject who does not exhibit disease risk factors, or a subject who is asymptomatic for a disease or a condition.
  • a subject can also be one who is suffering from or at risk of developing a disease or a condition.
  • training data generally refers to data that can be input into models, statistical models, algorithms and any system or process able to use existing data to make predictions.
  • a “model” may include one or more algorithms, one or more mathematical techniques, one or more machine learning algorithms, or a combination thereof.
  • machine learning may be the practice of using algorithms to parse data, learn from it, and then make a determination or prediction about something in the world. Machine learning uses algorithms that can learn from data without relying on rules-based programming.
  • a machine learning algorithm may include a parametric model, a nonparametric model, a deep learning model, a neural network, a linear discriminant analysis model, a quadratic discriminant analysis model, a support vector machine, a random forest algorithm, a nearest neighbor algorithm, a combined discriminant analysis model, a k-means clustering algorithm, a supervised model, an unsupervised model, logistic regression model, a multivariable regression model, a penalized multivariable regression model, or another type of model.
  • an “artificial neural network” or “neural network” may refer to mathematical algorithms or computational models that mimic an interconnected group of artificial nodes or neurons that processes information based on a connectionistic approach to computation.
  • Neural networks which may also be referred to as neural nets, can employ one or more layers of nonlinear units to predict an output for a received input.
  • Some neural networks include one or more hidden layers in addition to an output layer. The output of each hidden layer is used as input to the next layer in the network, i.e., the next hidden layer or the output layer. Each layer of the network generates an output from a received input in accordance with current values of a respective set of parameters.
  • a reference to a “neural network” may be a reference to one or more neural networks.
  • a neural network may process information in two ways: when it is being trained it is in training mode and when it puts what it has learned into practice it is in inference (or prediction) mode.
  • Neural networks learn through a feedback process (e.g., backpropagation) which allows the network to adjust the weight factors (modifying its behavior) of the individual nodes in the intermediate hidden layers so that the output matches the outputs of the training data.
  • a neural network learns by being fed training data (learning examples) and eventually learns how to reach the correct output, even when it is presented with a new range or set of inputs.
  • a neural network may include, for example, without limitation, at least one of a Feedforward Neural Network (FNN), a Recurrent Neural Network (RNN), a Modular Neural Network (MNN), a Convolutional Neural Network (CNN), a Residual Neural Network (ResNet), an Ordinary Differential Equations Neural Networks (neural-ODE), or another type of neural network.
  • FNN Feedforward Neural Network
  • RNN Recurrent Neural Network
  • MNN Modular Neural Network
  • CNN Convolutional Neural Network
  • Residual Neural Network Residual Neural Network
  • Neural-ODE Ordinary Differential Equations Neural Networks
  • a “target glycopeptide analyte,” may refer to a peptide structure (e.g., glycosylated or aglycosylated/non-glycosylated), a fraction of a peptide structure, a sub- structure (e.g., a glycan or a glycosylation site) of a peptide structure, a product of one or more of the above listed structures and sub-structures, associated detection molecules (e.g., signal molecule, label, or tag), or an amino acid sequence that can be measured by mass spectrometry.
  • a peptide structure e.g., glycosylated or aglycosylated/non-glycosylated
  • a fraction of a peptide structure e.g., a fraction of a peptide structure
  • a sub- structure e.g., a glycan or a glycosylation site
  • associated detection molecules e.g., signal molecule,
  • a “peptide data set,” may be used interchangeably with “peptide structure data” and can refer to any data of or relating to a peptide from a resulting mass spectrometry run.
  • a peptide data set can comprise data obtained from a sample or biological sample using mass spectrometry.
  • a peptide dataset can comprise data relating to an external standard, data relating to an internal standard, and data relating to a target glycopeptide analyte of a sample.
  • a peptide data set can result from analysis originating from a single run.
  • the peptide data set can include raw abundance and mass to charge ratios for one or more peptides.
  • a “a transition,” may refer to or identify a peptide structure.
  • a transition can refer to the specific pair of m/z values associated with a precursor ion and a product or fragment ion.
  • a “non-glycosylated endogenous peptide” may refer to a peptide structure that does not comprise a glycan molecule.
  • an NGEP and a target glycopeptide analyte can originate from the same subject.
  • an NGEP and a target glycopeptide analyte may be derived from the same protein sequence.
  • the NGEP and the target glycopeptide analyte may be derived from or include the same peptide sequence.
  • an NGEP can be labeled with an isotope in preparation for mass spectrometry analysis.
  • “abundance,” may refer to a quantitative value generated using mass spectrometry.
  • the quantitative value may relate to the amount of a particular peptide structure.
  • the quantitative value may comprise an amount of an ion produced using mass spectrometry.
  • the quantitative value may be expressed as an m/z value. In other embodiments, the quantitative value may be expressed in atomic mass units.
  • “relative abundance,” may refer to a comparison of two or more abundances.
  • the comparison may comprise comparing one peptide structure to a total number of peptide structures.
  • the comparison may comprise comparing one peptide glycoform (e.g., two identical peptides differing by one or more glycans) to a set of peptide glycoforms.
  • the comparison may comprise comparing a number of ions having a particular m/z ratio by a total number of ions detected.
  • a relative abundance can be expressed as a ratio. In other embodiments, a relative abundance can be expressed as a percentage.
  • an “internal standard,” may refer to something that can be contained (e.g., spiked-in) in the same sample as a target glycopeptide analyte undergoing mass spectrometry analysis. Internal standards can be used for calibration purposes. Additionally, internal standards can be used in the systems and method described herein. In some aspects, an internal standard can be selected based on similarity m/z and or retention times and can be a “surrogate” if a specific standard is too costly or unavailable. Internal standards can be heavy labeled or non-heavy labeled.
  • FIG. 1 is a schematic diagram of an exemplary workflow 100 for the detection of peptide structures associated with a disease state for use in diagnosis and/or treatment in accordance with one or more embodiments.
  • Workflow 100 may include various operations including, for example, sample collection 102, sample intake 104, sample preparation and processing 106, data analysis 108, and output generation 110.
  • Sample collection 102 may include, for example, obtaining a biological sample 112 of one or more subjects, such as subject 114.
  • Biological sample 112 may take the form of a specimen obtained via one or more sampling methods.
  • Biological sample 112 may be representative of subject 114 as a whole or of a specific tissue, cell type, or other category or sub-category of interest.
  • Biological sample 112 may be obtained in any of a number of different ways.
  • biological sample 112 includes whole blood sample 116 obtained via a blood draw.
  • biological sample 112 includes set of aliquoted samples 118 that includes, for example, a serum sample, a plasma sample, a blood cell (e.g., white blood cell (WBC), red blood cell (RBC) sample, another type of sample, or a combination thereof.
  • Biological samples 112 may include nucleotides (e.g., ssDNA, dsDNA, RNA), organelles, amino acids, peptides, proteins, carbohydrates, glycoproteins, or any combination thereof.
  • a single run can analyze a sample (e.g., the sample including a peptide analyte), an external standard (e.g., an NGEP of a serum sample), and an internal standard.
  • a sample e.g., the sample including a peptide analyte
  • an external standard e.g., an NGEP of a serum sample
  • an internal standard e.g., an NGEP of a serum sample
  • abundance or raw abundance for the external standard, the internal standard, and target glycopeptide analyte can be determined by mass spectrometry in the same run.
  • external standards may be analyzed prior to analyzing samples.
  • the external standards can be run independently between the samples.
  • external standards can be analyzed after every 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more experiments.
  • external standard data can be used in some or all of the normalization systems and methods described herein.
  • blank samples may be processed to prevent column fouling.
  • Sample intake 104 may include one or more various operations such as, for example, aliquoting, registering, processing, storing, thawing, and/or other types of operations.
  • sample intake 104 includes aliquoting whole blood sample 116 to form a set of aliquoted samples that can then be sub-aliquoted to form set of samples 120.
  • Sample preparation and processing 106 may include, for example, one or more operations to form set of peptide structures 122.
  • set of peptide structures 122 may include various fragments of unfolded proteins that have undergone digestion and may be ready for analysis.
  • sample preparation and processing 106 may include, for example, data acquisition 124 based on set of peptide structures 122.
  • data acquisition 124 may include use of, for example, but is not limited to, a liquid chromatography/mass spectrometry (LC/MS) system.
  • LC/MS liquid chromatography/mass spectrometry
  • Data analysis 108 may include, for example, peptide structure analysis 126.
  • data analysis 108 also includes output generation 110.
  • output generation 110 may be considered a separate operation from data analysis 108.
  • Output generation 110 may include, for example, generating final output 128 based on the results of peptide structure analysis 126. Final output 128 may be used for determining research, diagnosis, and/or treatment.
  • final output 128 is comprised of one or more outputs.
  • Final output 128 may take various forms.
  • final output 128 may be a report that includes, for example, a diagnosis output, a treatment output (e.g., a treatment design output, a treatment plan output, or combination thereof), analyzed data (e.g., relativized and normalized) or combination thereof.
  • report can comprise a target glycopeptide analyte concentration as a function of the NGEP concentration value and the normalized abundance.
  • final output 128 may be an alert (e.g., a visual alert, an audible alert, etc.), a notification (e.g., a visual notification, an audible notification, an email notification, etc.), an email output, or a combination thereof.
  • final output 128 may be sent to remote system 130 for processing.
  • Remote system 130 may include, for example, a computer system, a server, a processor, a cloud computing platform, cloud storage, a laptop, a tablet, a smartphone, some other type of mobile computing device, or a combination thereof.
  • workflow 100 may optionally exclude one or more of the operations described herein and/or may optionally include one or more other steps or operations other than those described herein (e.g., in addition to and/or instead of those described herein). Accordingly, workflow 100 may be implemented in any of a number of different ways for use in the research, diagnosis, and/or treatment of a disease state.
  • Figures 2A and 2B are schematic diagrams of a workflow for sample preparation and processing 106 in accordance with one or more embodiments. Figures 2A and 2B are described with continuing reference to Figure 1. Sample preparation and processing 106 may include, for example, preparation workflow 200 shown in Figure 2A and data acquisition 124 shown in Figure 2B.
  • FIG. 2A is a schematic diagram of preparation workflow 200 in accordance with one or more embodiments.
  • Preparation workflow 200 may be used to prepare a sample, such as a sample of set of samples 120 in Figure 1, for analysis via data acquisition 124. For example, this analysis may be performed via mass spectrometry (e.g., LC-MS).
  • mass spectrometry e.g., LC-MS
  • preparation workflow 200 may include denaturation and reduction 202, alkylation 204, and digestion 206. All areas of the preparation workflow can cause inconsistency between different samples and different experiments, necessitating, the improved normalization systems and methods described herein and throughout.
  • polymers such as proteins, in their native form, can fold to include secondary, tertiary, and/or other higher order structures.
  • Such higher order structures may functionalize proteins to complete tasks (e.g., enable enzymatic activity) in a subject.
  • higher order structures of polymers may be maintained via various interactions between side chains of amino acids within the polymers. Such interactions can include ionic bonding, hydrophobic interactions, hydrogen bonding, and disulfide linkages between cysteine residues.
  • unfolding such polymers e.g., peptide/protein molecules
  • unfolding a polymer may include denaturing the polymer, which may include, for example, linearizing the polymer.
  • denaturation and reduction 202 can be used to disrupt higher order structures (e.g., secondary, tertiary, quaternary, etc.) of one or more proteins (e.g., polypeptides and peptides) in a sample (e.g., one of set of samples 120 in Figure 1).
  • Denaturation and reduction 202 includes, for example, a denaturation procedure and a reduction procedure.
  • the denaturation procedure may be performed using, for example, thermal denaturation, where heat is used as a denaturing agent. The thermal denaturation can disrupt ionic bonding, hydrophobic interactions, and/or hydrogen bonding.
  • the denaturation procedure may include using one or more denaturing agents.
  • the denaturation procedure may include using temperature.
  • the denaturation procedure may include using one or more denaturing agents in combination with heat.
  • These one or more denaturing agents may include, for example, but are not limited to, any number of chaotropic salts (e.g., urea, guanidine), surfactants (e.g., sodium dodecyl sulfate (SDS), beta octyl glucoside, Triton X- 100), or combination thereof.
  • chaotropic salts e.g., urea, guanidine
  • surfactants e.g., sodium dodecyl sulfate (SDS), beta octyl glucoside, Triton X- 100
  • such denaturing agents may be used in combination with heat when sample preparation workflow further includes a cleanup procedure.
  • the resulting one or more denatured (e.g., unfolded, linearized) proteins may then undergo further processing in preparation of analysis.
  • a reduction procedure may be performed in which one or more reducing agents are applied.
  • a reducing agent can produce an alkaline pH.
  • a reducing agent may take the form of, for example, without limitation, dithiothreitol (DTT), tris(2-carboxyethyl)phosphine (TCEP), or some other reducing agent.
  • the reducing agent may reduce (e.g., cleave) the disulfide linkages between cysteine residues of the one or more denatured proteins to form one or more reduced proteins.
  • the one or more reduced proteins resulting from denaturation and reduction 202 may undergo a process to prevent the reformation of disulfide linkages between, for example, the cysteine residues of the one or more reduced proteins.
  • This process may be implemented using alkylation 204 to form one or more alkylated proteins.
  • alkylation 204 may be used to add an acetamide group to a sulfur on each cysteine residue to prevent disulfide linkages from reforming.
  • an acetamide group can be added by reacting one or more alkylating agents with a reduced protein.
  • the one or more alkylating agents may include, for example, one or more acetamide salts.
  • alkylating agent may take the form of, for example, iodoacetamide (IAA), 2-chloroacetamide, some other type of acetamide salt, or some other type of alkylating agent.
  • alkylation 204 may include a quenching procedure. The quenching procedure may be performed using one or more reducing agents (e.g., one or more of the reducing agents described above).
  • the one or more alkylated proteins formed via alkylation 204 can then undergo digestion 206 in preparation for analysis (e.g., mass spectrometry analysis).
  • Digestion 206 of a protein may include cleaving the protein at or around one or more cleavage sites (e.g., site 205 which may be one or more amino acid residues).
  • site 205 which may be one or more amino acid residues.
  • an alkylated protein may be cleaved at the carboxyl side of the lysine or arginine residues. This type of cleavage may break the protein into various segments, which include one or more peptide structures (e.g., glycosylated or aglycosylated).
  • digestion 206 is performed using one or more proteolysis catalysts.
  • an enzyme can be used in digestion 206.
  • the enzyme takes the form of trypsin.
  • one or more other types of enzymes e.g., proteases
  • these one or more other enzymes include, but are not limited to, LysC, LysN, AspN, GluC, and ArgC.
  • digestion 206 may be performed using tosyl phenylalanyl chloromethyl ketone (TPCK)-treated trypsin, one or more engineered forms of trypsin, one or more other formulations of trypsin, or a combination thereof.
  • digestion 206 may be performed in multiple steps, with each involving the use of one or more digestion agents. For example, a secondary digestion, tertiary digestion, etc. may be performed.
  • trypsin is used to digest serum samples.
  • trypsin/LysC cocktails are used to digest plasma samples.
  • digestion 206 further includes a quenching procedure.
  • the quenching procedure may be performed by acidifying the sample (e.g., to a pH ⁇ 3).
  • formic acid may be used to perform this acidification.
  • preparation workflow 200 further includes post-digestion procedure 207.
  • Post-digestion procedure 207 may include, for example, a cleanup procedure.
  • the cleanup procedure may include, for example, the removal of unwanted components in the sample that results from digestion 206.
  • unwanted components may include, but are not limited to, inorganic ions, surfactants, etc.
  • post-digestion procedure 207 further includes a procedure for the addition of heavy-labeled peptide internal standards.
  • preparation workflow 200 has been described with respect to a sample created or taken from biological sample 112 that is blood-based (e.g., a whole blood sample, a plasma sample, a serum sample, etc.), sample preparation workflow 200 may be similarly implemented for other types of samples (e.g., tears, urine, tissue, interstitial fluids, sputum, etc.) to produce set of peptides structures 122.
  • biological sample 112 that is blood-based
  • sample preparation workflow 200 may be similarly implemented for other types of samples (e.g., tears, urine, tissue, interstitial fluids, sputum, etc.) to produce set of peptides structures 122.
  • Figure 2B is a schematic diagram of data acquisition 124 in accordance with one or more embodiments.
  • data acquisition 124 can commence following sample preparation 200 described in Figure 2A.
  • data acquisition 124 can comprise quantification 208, quality control 210, and peak integration and normalization 212.
  • targeted quantification 208 of peptides and glycopeptides can incorporate use of liquid chromatography-mass spectrometry LC/MS instrumentation.
  • LC-MS/MS e.g., LC-MS/MS
  • tandem MS may be used.
  • LC/MS e.g., LC-MS/MS
  • LC/MS can combine the physical separation capabilities of liquid chromatograph (LC) with the mass analysis capabilities of mass spectrometry (MS).
  • this technique allows for the separation of digested peptides to be fed from the LC column into the MS ion source through an interface.
  • any LC/MS device can be incorporated into the workflow described herein.
  • an instrument or instrument system suited for identification and targeted quantification 208 may include, for example, a Triple Quadrupole LC/MSTM.
  • targeted quantification 208 is performed using multiple reaction monitoring mass spectrometry (MRM-MS).
  • identification of a particular protein or peptide and an associated quantity can be assessed. In various embodiments described herein, identification of a particular glycan and an associated quantity can be assessed. In various embodiments described herein, particular glycans can be matched to a glycosylation site on a protein or peptide and the abundances measured.
  • targeted quantification 208 includes using a specific collision energy associated for the appropriate fragmentation to consistently see an abundant product ion.
  • Glycopeptide structures may have a lower collision energy than aglycosylated peptide structures.
  • the source voltage and gas temperature may be lowered as compared to generic proteomic analysis.
  • quality control 210 procedures can be put in place to optimize data quality.
  • measures can be put in place allowing only errors within acceptable ranges outside of an expected value.
  • employing statistical models e.g., using Westgard rules
  • quality control 210 may include, for example, assessing the retention time and abundance of representative peptide structures (e.g., glycosylated and/or aglycosylated) and spiked-in internal standards, in either every sample, or in each quality control sample (e.g., pooled serum digest).
  • Peak integration and normalization 212 may be performed to process the data that has been generated and transform the data into a format for analysis.
  • peak integration and normalization 212 may include converting abundance data for various product ions that were detected for a selected peptide structure into a single quantification metric (e.g., a relative quantity, an adjusted quantity, a normalized quantity, a relative concentration, an adjusted concentration, a normalized concentration, etc.) for that peptide structure.
  • peak integration and normalization 212 may be performed using one or more of the techniques described in U.S. Patent Publication No. 2020/0372973A1 and/or US Patent Publication No. 2020/0240996A1, the disclosures of which are incorporated by reference herein in their entireties.
  • Figure 3 is a block diagram of an analysis system 300 in accordance with one or more embodiments.
  • Analysis system 300 can be used to both detect and analyze various peptide structures that have been associated to various disease states.
  • Analysis system 300 is one example of an implementation for a system that may be used to perform data analysis 108 in Figure 1. Thus, analysis system 300 is described with continuing reference to workflow 100 as described in Figures 1, 2 A, and/or 2B.
  • Analysis system 300 may include computing platform 302 and data store 304. In some embodiments, analysis system 300 also includes display system 306. Computing platform 302 may take various forms. In one or more embodiments, computing platform 302 includes a single computer (or computer system) or multiple computers in communication with each other. In other examples, computing platform 302 takes the form of a cloud computing platform.
  • Data store 304 and display system 306 may each be in communication with computing platform 302.
  • data store 304, display system 306, or both may be considered part of or otherwise integrated with computing platform 302.
  • computing platform 302, data store 304, and display system 306 may be separate components in communication with each other, but in other examples, some combination of these components may be integrated together. Communication between these different components may be implemented using any number of wired communications links, wireless communications links, optical communications links, or a combination thereof.
  • Analysis system 300 includes, for example, peptide structure analyzer 308, which may be implemented using hardware, software, firmware, or a combination thereof.
  • peptide structure analyzer 308 is implemented using computing platform 302.
  • Peptide structure analyzer 308 receives peptide structure data 310 for processing.
  • Peptide structure data 310 may be, for example, the peptide structure data that is output from sample preparation and processing 106 in Figures 1, 2 A, and 2B. Accordingly, peptide structure data 310 may correspond to set of peptide structures 122 identified for biological sample 112 and may thereby correspond to biological sample 112.
  • Peptide structure data 310 can be sent as input into peptide structure analyzer 308, retrieved from data store 304 or some other type of storage (e.g., cloud storage), accessed from cloud storage, or obtained in some other manner. In some cases, peptide structure data 310 may be retrieved from data store 304 in response to (e.g., directly or indirectly based on) receiving user input entered by a user via an input device.
  • peptide structure analyzer 308 retrieved from data store 304 or some other type of storage (e.g., cloud storage), accessed from cloud storage, or obtained in some other manner.
  • peptide structure data 310 may be retrieved from data store 304 in response to (e.g., directly or indirectly based on) receiving user input entered by a user via an input device.
  • Peptide structure analyzer 308 includes model 312 that is configured to receive peptide structure data 310 for processing.
  • Model 312 may be implemented in any of a number of different ways. Model 312 may be implemented using any number of models, functions, equations, algorithms, and/or other mathematical techniques.
  • model 312 includes machine learning system 314, which may itself be comprised of any number of machine learning models and/or algorithms.
  • machine learning system 314 may include, but is not limited to, at least one of a deep learning model, a neural network, a linear discriminant analysis model, a quadratic discriminant analysis model, a support vector machine, a random forest algorithm, a nearest neighbor algorithm (e.g., a k-Nearest Neighbors algorithm), a combined discriminant analysis model, a k-means clustering algorithm, an unsupervised model, a multivariable regression model, a penalized multivariable regression model, or another type of model.
  • a nearest neighbor algorithm e.g., a k-Nearest Neighbors algorithm
  • model 312 includes a machine learning system 314 that comprises any number of or combination of the models or algorithms described above.
  • model 312 analyzes peptide structure data 310 to generate disease indicator 316 that indicates whether the biological sample is positive for an ovarian cancer disease state based on set of peptide structures 318 identified as being associated with the ovarian cancer disease state.
  • Peptide structure data 310 may include quantification data for the plurality of peptide structures. Quantification data for a peptide structures can include at least one of an abundance, a relative abundance, a normalized abundance, a relative quantity, an adjusted quantity, a normalized quantity, a relative concentration, an adjusted concentration, or a normalized concentration.
  • peptide structure data 310 may include a set of quantification metrics for each peptide structure of a plurality of peptide structures.
  • a quantification metric for a peptide structure may be selected as one of a relative quantity, an adjusted quantity, a normalized quantity, a relative abundance, an adjusted abundance, and a normalized abundance.
  • a quantification metric for a peptide structure is selected from one of a relative concentration, an adjusted concentration, and a normalized concentration.
  • the quantification metrics used are normalized abundances. In this manner, peptide structure data 310 may provide abundance information about the plurality of peptide structures with respect to biological sample 112.
  • Disease indicator 316 may take various forms.
  • disease indicator 316 includes a classification that indicates whether or not the subject is positive for the ovarian cancer disease state.
  • disease indicator 316 can include a score 320.
  • Score 320 indicates whether the ovarian cancer disease state is present or not.
  • score 320 may be, a probability score that indicates how likely it is that the biological sample 112 evidences the presence of the ovarian cancer disease state.
  • a peptide structure of set of peptide structures 318 comprises a glycosylated peptide structure, or glycopeptide structure, that is defined by a peptide sequence and a glycan structure attached to a linking site of the peptide sequence quantity.
  • the peptide structure may be a glycopeptide or a portion of a glycopeptide.
  • a peptide structure of set of peptide structures 318 comprises an aglycosylated peptide structure that is defined by a peptide sequence.
  • the peptide structure may be a peptide or a portion of a peptide and may be referred to as a quantification peptide.
  • Set of peptide structures 318 may be identified as being those most predictive or relevant to the ovarian cancer disease state based on training of model 312.
  • set of peptide structures 318 includes at least one, at least two, or at least three peptide structures from a first group of peptide structures (peptide structures PS-1 through PS- 10) identified in Table 1 in Section VI. A. or at least one, at least two, or at least three peptide structures from a second group of peptide structures (peptide structures PS-5 and PS-11 through PS-34) identified in Table 2 in Section VI. A.
  • set of peptide structures 318 includes at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or all 10 of the peptide structures identified in Table 1 below in Section VI.
  • set of peptide structures 318 includes at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, or all 25 of the peptide structures identified in Table 2 below in Section VI. A.
  • set of peptide structures 318 includes at least peptide structure PS-5, which is identified in both Table 1 and Table 2.
  • the number of peptide structures selected from Table 1 for inclusion in set of peptide structures 318 may be based on, for example, a desired level of accuracy.
  • set of peptide structures 318 includes at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, or all 38 of the peptide structures identified in Table 3 below in Section VI. A.
  • set of peptide structures 318 includes at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 412, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, or all 61 of the peptide structures listed in Tables 1, 2, and 3.
  • machine learning system 314 takes the form of binary classification model 322.
  • Binary classification model 322 may include, for example, but is not limited to, a regression model.
  • Binary classification model 322 may include, for example, a penalized multivariable regression model that is trained to identify set of peptide structures 318 from a plurality of (or panel of) peptide structures identified in various subjects.
  • Binary classification model 322 may be trained to identify weight coefficients for peptide structures and those peptide structures having non-zero weights or weight coefficients above a selected threshold (e.g., absolute weight coefficient above 0.0, 0.01, 0.05, 0.1, 0.015, 0.2, etc.) may be selected for inclusion in set of peptide structures 318.
  • Peptide structure analyzer 308 may generate final output 128 based on disease indicator 316 output by model 312. In other embodiments, final output 128 may be an output generated by model 312.
  • final output 128 includes disease indicator 316.
  • final output 128 includes diagnosis output 324, treatment output 326, or both.
  • Diagnosis output 324 may include, for example, a diagnosis for the ovarian cancer disease state.
  • the diagnosis can include a positive diagnosis or a negative diagnosis for the ovarian cancer disease state.
  • generating diagnosis output 324 may include comparing score 320 to selected threshold 328 to determine the diagnosis.
  • Selected threshold 328 may be, for example, without limitation, a score between 0.30 and 0.65 (e.g., 0.4, 0.5, 0.6, etc.).
  • a score 320 above 0.5 may indicate the presence of the ovarian cancer disease state and be output in diagnosis output 324 as a positive diagnosis.
  • a score 320 below 0.5 may indicate that the ovarian cancer disease state is not present and be output in diagnosis output 324 as a negative diagnosis.
  • a negative diagnosis indicates that the subject is healthy.
  • a negative diagnosis indicates that a detected pelvic tumor (or mass) is benign.
  • a biopsy may be recommended. For example, a biopsy of the subject may be performed in response to disease indicator 316 and/or diagnosis output 324 indicating a positive diagnosis for the ovarian cancer disease state.
  • peptide structure analyzer 308 (or another system implemented on computing platform 302) may generate a report recommending that a biopsy is to be performed for the subject in response to disease indicator 316 and/or diagnosis output 324 indicating a positive diagnosis for the ovarian cancer disease state.
  • peptide structure analyzer 308 may send diagnosis final output 128 to remote system 130 over one or more wireless, wired, and/or optical communications links and remote system 130 may generate a report recommending that a biopsy is to be performed for the subject in response to disease indicator 316 and/or diagnosis output 324 indicating a positive diagnosis for the ovarian cancer disease state.
  • the biopsy may be used to confirm the diagnosis to determine whether or not to administer treatment and/or how quickly to administer treatment.
  • disease indicator 316 and/or diagnosis output 324 indicate a negative diagnosis for the ovarian cancer disease state (e.g., benign pelvic tumor)
  • the report that is generated by peptide structure analyzer 308, remote system 130, or some other system implemented on computing platform 142 may recommend a period of monitoring for the subject.
  • a negative diagnosis indication by disease indicator 316 and/or diagnosis output 324 may thus help prevent unnecessary treatment or overtreatment of the subject.
  • Treatment output 326 may include, for example, at least one of an identification of a treatment for the subj ect, a treatment plan for administering the treatment, or both.
  • Treatment for ovarian cancer may include, for example, but is not limited to, at least one of surgery, radiation therapy, a targeted drug therapy (e.g., one or more targeted therapeutic agents), chemotherapy (e.g., one or more chemotherapeutic agents), immunotherapy (e.g., one or more immunotherapeutic agents), hormone therapy, neoadjuvant therapy, or some other form of treatment.
  • the treatment plan may include, for example, but is not limited to, a timeline or schedule for administering the treatment, dosing information, other treatment-related information, or a combination thereof.
  • Final output 128 may be sent to remote system 130 for processing in some examples. In other embodiments, final output 128 may be displayed on graphical user interface 330 in display system 306 for viewing by a human operator.
  • Figure 4 is a block diagram of a computer system in accordance with various embodiments.
  • Computer system 400 may be an example of one implementation for computing platform 302 described above in Figure 3.
  • computer system 400 can include a bus 402 or other communication mechanism for communicating information, and a processor 404 coupled with bus 402 for processing information.
  • computer system 400 can also include a memory, which can be a random-access memory (RAM) 406 or other dynamic storage device, coupled to bus 402 for determining instructions to be executed by processor 404. Memory also can be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 404.
  • computer system 400 can further include a read only memory (ROM) 408 or other static storage device coupled to bus 402 for storing static information and instructions for processor 404.
  • ROM read only memory
  • a storage device 410 such as a magnetic disk or optical disk, can be provided and coupled to bus 402 for storing information and instructions.
  • computer system 400 can be coupled via bus 402 to a display 412, such as a cathode ray tube (CRT), liquid crystal display (LCD), or light emitting diode (LED) for displaying information to a computer user.
  • a display 412 such as a cathode ray tube (CRT), liquid crystal display (LCD), or light emitting diode (LED) for displaying information to a computer user.
  • An input device 414 can be coupled to bus 402 for communicating information and command selections to processor 404.
  • a cursor control 416 such as a mouse, a joystick, a trackball, a gesture input device, a gaze-based input device, or cursor direction keys for communicating direction information and command selections to processor 404 and for controlling cursor movement on display 412.
  • This input device 414 typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.
  • a first axis e.g., x
  • a second axis e.g., y
  • input devices 414 allowing for three-dimensional (e.g., x, y, and z) cursor movement are also contemplated herein.
  • results can be provided by computer system 400 in response to processor 404 executing one or more sequences of one or more instructions contained in RAM 406.
  • Such instructions can be read into RAM 406 from another computer-readable medium or computer-readable storage medium, such as storage device 410.
  • Execution of the sequences of instructions contained in RAM 406 can cause processor 404 to perform the processes described herein.
  • hard-wired circuitry can be used in place of or in combination with software instructions to implement the present teachings.
  • implementations of the present teachings are not limited to any specific combination of hardware circuitry and software.
  • computer-readable medium e.g., data store, data storage, storage device, data storage device, etc.
  • computer-readable storage medium refers to any media that participates in providing instructions to processor 404 for execution.
  • Such a medium can take many forms, including but not limited to, non-volatile media, volatile media, and transmission media.
  • non-volatile media can include, but are not limited to, optical, solid state, magnetic disks, such as storage device 410.
  • volatile media can include, but are not limited to, dynamic memory, such as RAM 406.
  • transmission media can include, but are not limited to, coaxial cables, copper wire, and fiber optics, including the wires that comprise bus 402.
  • Computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, or any other tangible medium from which a computer can read.
  • instructions or data can be provided as signals on transmission media included in a communications apparatus or system to provide sequences of one or more instructions to processor 404 of computer system 400 for execution.
  • a communication apparatus may include a transceiver having signals indicative of instructions and data.
  • the instructions and data are configured to cause one or more processors to implement the functions outlined in the disclosure herein.
  • Representative examples of data communications transmission connections can include, but are not limited to, telephone modem connections, wide area networks (WAN), local area networks (LAN), infrared data connections, NFC connections, optical communications connections, etc.
  • the methodologies described herein may be implemented by various means depending upon the application. For example, these methodologies may be implemented in hardware, firmware, software, or any combination thereof.
  • the processing unit may be implemented within one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), processors, controllers, micro-controllers, microprocessors, electronic devices, other electronic units designed to perform the functions described herein, or a combination thereof.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGAs field programmable gate arrays
  • processors controllers, micro-controllers, microprocessors, electronic devices, other electronic units designed to perform the functions described herein, or a combination thereof.
  • the methods of the present teachings may be implemented as firmware and/or a software program and applications written in conventional programming languages such as C, C++, Python, etc. If implemented as firmware and/or software, the embodiments described herein can be implemented on a non-transitory computer-readable medium in which a program is stored for causing a computer to perform the methods described above. It should be understood that the various engines described herein can be provided on a computer system, such as computer system 400, whereby processor 404 would execute the analyses and determinations provided by these engines, subject to instructions provided by any one of, or a combination of, the memory components RAM 406, ROM, 408, or storage device 410 and user input provided via input device 414. VI. Exemplary Methodologies Relating to Diagnosis based on Peptide Structure Data Analysis
  • Figure 5 is a flowchart of a process for diagnosing a subject with respect to an ovarian cancer disease state in accordance with one or more embodiments.
  • Process 500 may be implemented using, for example, at least a portion of workflow 100 as described in Figures 1, 2A, and 2B and/or analysis system 300 as described in Figure 3.
  • Process 500 may be used to generate a final output that includes at least a diagnosis output for the subject.
  • Step 502 includes receiving peptide structure data corresponding to a biological sample obtained from the subject.
  • the peptide structure data may be, for example, one example of an implementation of peptide structure data 310 in Figure 3.
  • the peptide structure data may include quantification data for each peptide structure of a plurality of peptide structures.
  • the quantification data may include, for example, one or more quantification metrics for each peptide structure of the plurality of peptide structures.
  • a quantification metric for a peptide structure may be, for example, but is not limited to, a relative quantity, an adjusted quantity, a normalized quantity, a relative concentration, an adjusted concentration, or a normalized concentration.
  • the quantification data for a given peptide structure provides an indication of the abundance of the peptide structure in the biological sample.
  • at least one peptide structure includes a glycopeptide structure having a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 1 or Table 2, with the peptide sequence being one of SEQ ID NOS: 11-19 in Table 1 or one of SEQ ID NOS: 14, 15, and 31-46 in Table 2, the SEQ ID NOS being defined in Table 5 below.
  • Step 504 includes analyzing the peptide structure data using a supervised machine learning model to generate a disease indicator that indicates whether the biological sample evidences an ovarian cancer disease state based on at least three peptide structures selected from a first group of peptide structures identified in Table 1 (below) or a second group of peptide structures identified in Table 2 (below).
  • the first and second groups of peptide structures are associated with the ovarian cancer disease state.
  • the first group of peptide structures is listed in Table 1 with respect to relative significance to the disease indicator.
  • the second group of peptide structures is listed in Table 2 with respect to relative significance to the disease indicator.
  • the first group of peptide structures in Table 1 includes peptide structures that have been determined relevant to distinguishing at least between ovarian cancer (e.g., EOC) and a healthy state.
  • the first group of peptide structures may be used to predict the probability of EOC for use in clinically screening patients.
  • the first group of peptide structures in Table 1 may also be peptide structures that have been determined relevant to distinguishing between ovarian cancer (e.g., EOC) and a benign tumor state (e.g., a benign pelvic tumor).
  • the first group of peptide structures may be used to clinically triage patients that have been identified as having pelvic tumors to determine the probability that such a tumor evidences EOC.
  • the second group of peptide structures in Table 2 includes peptide structures that have been determined relevant to distinguishing at least between ovarian cancer (e.g., EOC) and the benign tumor state (e.g., a benign pelvic tumor).
  • the second group of peptide structures may be used to clinically triage patients that have been identified as having pelvic tumors to determine the probability that such a tumor evidences EOC. In this manner, the second group of peptide structures may predict malignancy of an identified pelvic tumor.
  • the at least 3 peptide structures includes at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or all 10 of the peptide structures PS-1 to PS-10 in Table 1.
  • the at least 3 peptide structures include at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, or all 25 of the peptide structures PS-5 and PS-11 through PS-34 in Table 1.
  • the at least 3 peptide structures includes at least PS-5, which is present in both Table 1 and Table 2.
  • step 504 may be implemented using a binary classification model (e.g., a regression model).
  • the regression model may be, for example, penalized multivariable regression model.
  • the disease indicator may be computed using a weight coefficient associated with each peptide structure of the at least 3 peptide structures, the weight coefficient of a corresponding peptide structure of the at least 3 peptide structures may indicate the relative significance of the corresponding peptide structure to the disease indicator.
  • step 504 may include computing a peptide structure profile for the biological sample that identifies a weighted value for each peptide structure of the at least 3 peptide structures.
  • the weighted value for a peptide structure of the at least 3 peptide structures may be a product of a quantification metric for the peptide structure identified from the peptide structure data and a weight coefficient for the peptide structure.
  • the disease indicator may be computed using the peptide structure profile.
  • the disease indicator may be a logit equal to the sum of the weighted values for the peptide structures plus an intercept value. The intercept value may be determined during the training of the model.
  • the peptide structure profile for a given peptide structure may include a corresponding feature — relative abundance, concentration, site occupancy — for that peptide structure.
  • the relative abundance may be a normalized relative abundance; the concentration may be normalized concentration.
  • two peptide structure profiles may be computed for the same peptide structure, each profile corresponding to a different feature.
  • a first peptide structure profile may include a relative abundance for a corresponding peptide structure and a second peptide structure profile may include a concentration for the same corresponding peptide structure.
  • the disease indicator comprises a probability that the biological sample is positive for the ovarian cancer disease state and the supervised machine learning model is configured to generate an output that identifies the biological sample as either evidencing (“positive for”) the ovarian cancer disease state when the disease indicator is greater than a selected threshold or not evidencing (“negative for”) the ovarian cancer disease state when the disease indicator is not greater than the selected threshold.
  • the selected threshold may be, for example, 0.30, 0.35, 0.40, 0.45, 0.50, 0.55, 0.60, or some other threshold between 0.30 and 0.65. In one or more embodiments, the selected threshold is 0.5.
  • Step 506 includes generating a final output based on the disease indicator.
  • the final output may include a diagnosis output, such as, for example, diagnosis output 324 in Figure 3.
  • the diagnosis output may include the disease indicator, or a diagnosis made based on the disease indicator.
  • the diagnosis may be, for example, “positive” for the ovarian cancer disease state if the biological sample evidences the ovarian cancer disease state based on the disease indicator.
  • the diagnosis may be, for example, “negative” if the biological sample does not evidence the ovarian cancer disease state based on the disease indicator.
  • a negative diagnosis may mean that the biological sample has a non-ovarian cancer state.
  • the negative diagnosis for the ovarian cancer disease state can include at least one of a healthy state, a benign tumor state, or some other non-malignant state.
  • Generating the diagnosis output in step 506 may include determining that the score falls above (or at or above) a selected threshold and generating a positive diagnosis for the ovarian cancer disease state.
  • step 506 can include determining that the score falls below (or at or below) a selected threshold and generating a negative diagnosis for the ovarian cancer disease state.
  • the score can include a probability score and the selected threshold can be 0.5. In other scoring systems, the selected threshold can fall within a range between 0.30 and 0.65.
  • the final output in step 506 may include a treatment output if the diagnosis output indicates a positive diagnosis for the ovarian cancer disease state.
  • the treatment output may include, for example, at least one of an identification of a treatment for the subject, a treatment plan for administering the treatment, or both.
  • Treatment for ovarian cancer may include, for example, but is not limited to, at least one of surgery, radiation therapy, a targeted drug therapy (e.g., one or more targeted therapeutic agents), chemotherapy (e.g., one or more chemotherapeutic agents), immunotherapy (e.g., one or more immunotherapeutic agents), hormone therapy, neoadjuvant therapy, or some other form of treatment.
  • the treatment plan may include, for example, but is not limited to, a timeline or schedule for administering the treatment, dosing information, other treatment-related information, or a combination thereof.
  • Table 1 below lists a first group of peptide structures associated with malignant pelvic tumors (e.g., ovarian cancer such as EOC).
  • One or more features e.g., relative abundance, concentration, site occupancy
  • the first group of peptide structures is listed in Table 1 in order with respect to relative significance to the disease indicator.
  • the quantification metrics for peptide structure PS-9, peptide structure PS-10, or a combination of the two may form one input.
  • Table 1 also identifies check markers CK-1 and CK-2, which may also be used by the model.
  • Table 2 below lists a second group of peptide structures associated with malignant pelvic tumors (e.g., ovarian cancer such as EOC).
  • malignant pelvic tumors e.g., ovarian cancer such as EOC.
  • One or more features (e.g., relative abundance, concentration, site occupancy) of these peptide structures may be used in the supervised machine learning model described above to generate a disease indicator that predicts the probability of malignancy (e.g., in the context of triaging to distinguish between malignant and benign pelvic tumors).
  • the second group of peptide structures is listed in Table 2 in order with respect to relative significance to the disease indicator.
  • Table 2 also identifies check markers CK-3 and CK-4, which may also be used by the model.
  • Figure 6 is a flowchart of a process for diagnosing a subject with respect to an ovarian cancer disease state in accordance with one or more embodiments.
  • Process 600 may be implemented using, for example, at least a portion of workflow 100 as described in Figures 1, 2A, and 2B and/or analysis system 300 as described in Figure 3.
  • Process 600 may be used to generate a final output that includes at least a diagnosis output for the subject.
  • Step 602 includes receiving peptide structure data corresponding to a biological sample obtained from the subject.
  • the peptide structure data may be, for example, one example of an implementation of peptide structure data 310 in Figure 3.
  • the peptide structure data may include quantification data for each peptide structure of a plurality of peptide structures.
  • the quantification data may include, for example, one or more quantification metrics for each peptide structure of the plurality of peptide structures.
  • a quantification metric for a peptide structure may be, for example, but is not limited to, a relative quantity, an adjusted quantity, a normalized quantity, a relative concentration, an adjusted concentration, or a normalized concentration.
  • the quantification data for a given peptide structure provides an indication of the abundance of the peptide structure in the biological sample.
  • at least one peptide structure includes a glycopeptide structure having a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3, with the peptide sequence being one of SEQ ID NOS: 11, 14, 15, 31,32, 33, 34, 37, 38, 40, 42, 44, 45, 46, 53-65 in Table 3, the SEQ ID NOS being defined in Table 5 below.
  • Step 604 includes analyzing the peptide structure data using a supervised machine learning model to generate a disease indicator that predicts whether the biological sample evidences a malignant pelvic tumor or benign pelvic tumor based on at least three peptide structures selected from a group of peptide structures identified in Table 3.
  • the group of peptide structures is listed in Table 3 with respect to relative significance to the disease indicator, which may be a probability score.
  • the group of peptide structures is associated with the malignancy (e.g., EOC).
  • the group of peptide structures in Table 3 includes peptide structures that have been determined relevant to distinguishing between a malignant and benign nature of a pelvic tumor.
  • the at least 3 peptide structures includes at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, or all 38 of the peptide structures PS-1, PS-5, PS-11, PS-15, PS-20, PS-25, PS-28, PS- 29, PS-30, PS-31, PS-32, and PS-35 to PS-61 identified in Table 3.
  • step 604 may be implemented using a binary classification model (e.g., a regression model).
  • the regression model may be, for example, penalized multivariable regression model.
  • the disease indicator may be computed using a weight coefficient associated with each peptide structure of the at least 3 peptide structures, the weight coefficient of a corresponding peptide structure of the at least 3 peptide structures may indicate the relative significance of the corresponding peptide structure to the disease indicator.
  • step 604 may include computing a peptide structure profile for the biological sample that identifies a weighted value for each peptide structure of the at least 3 peptide structures.
  • the weighted value for a peptide structure of the at least 3 peptide structures may be a product of a quantification metric for the peptide structure identified from the peptide structure data and a weight coefficient for the peptide structure.
  • the disease indicator may be computed using the peptide structure profile. For example, the disease indicator may be a logit equal to the sum of the weighted values for the peptide structures plus an intercept value. The intercept value may be determined during the training of the model.
  • the disease indicator comprises a probability that the biological sample is evidences malignancy (e.g., EOC) and the supervised machine learning model is configured to generate an output that identifies the biological sample as either evidencing (“positive for”) malignancy when the disease indicator is greater than a selected threshold or not evidencing (“negative for”) malignancy when the disease indicator is not greater than the selected threshold.
  • the selected threshold may be, for example, 0.30, 0.35, 0.40, 0.45, 0.50, 0.55, 0.60, or some other threshold between 0.30 and 0.65. In one or more embodiments, the selected threshold is 0.5.
  • Step 606 includes generating a final output based on the disease indicator.
  • the final output may include a diagnosis output, such as, for example, diagnosis output 324 in Figure 3.
  • the diagnosis output may include the disease indicator, or a diagnosis made based on the disease indicator.
  • the diagnosis may be, for example, “positive” for an ovarian cancer disease state (e.g., EOC) if the biological sample evidences malignancy based on the disease indicator.
  • the diagnosis may be, for example, “negative” if the biological sample does not evidence malignancy based on the disease indicator.
  • a negative diagnosis may mean that the biological sample evidences a benign status (or a non-ovarian cancer state).
  • Generating the diagnosis output in step 606 may include determining that the score falls above (or at or above) a selected threshold and generating a positive diagnosis for the ovarian cancer disease state.
  • step 606 can include determining that the score falls below (or at or below) a selected threshold and generating a negative diagnosis for the ovarian cancer disease state.
  • the score can include a probability score and the selected threshold can be 0.5. In other scoring systems, the selected threshold can fall within a range between 0.30 and 0.65.
  • the final output in step 606 may include a treatment output if the disease indicator predicts malignancy and/or the diagnosis output indicates a positive diagnosis for the ovarian cancer disease state.
  • the treatment output may include, for example, at least one of an identification of a treatment for the subject, a treatment plan for administering the treatment, or both.
  • Treatment for ovarian cancer may include, for example, but is not limited to, at least one of surgery, radiation therapy, a targeted drug therapy (e.g., one or more targeted therapeutic agents), chemotherapy (e.g., one or more chemotherapeutic agents), immunotherapy (e.g., one or more immunotherapeutic agents), hormone therapy, neoadjuvant therapy, or some other form of treatment.
  • the treatment plan may include, for example, but is not limited to, a timeline or schedule for administering the treatment, dosing information, other treatment-related information, or a combination thereof.
  • VLB.2 Exemplary Methodology — Based on Table 3B
  • Figure 6B is a flowchart of a process for diagnosing a subject with respect to an ovarian cancer disease state in accordance with one or more embodiments.
  • Process 600B may be implemented using, for example, at least a portion of workflow 100 as described in Figures 1, 2A, and 2B and/or analysis system 300 as described in Figure 3.
  • Process 600B may be used to generate a final output that includes at least a diagnosis output for the subject such as, for example early stage EOC or late stage EOC.
  • Step 602B includes receiving peptide structure data corresponding to a biological sample obtained from the subject.
  • the peptide structure data may be, for example, one example of an implementation of peptide structure data 310 in Figure 3.
  • the peptide structure data may include quantification data for each peptide structure of a plurality of peptide structures.
  • the quantification data may include, for example, one or more quantification metrics for each peptide structure of the plurality of peptide structures.
  • a quantification metric for a peptide structure may be, for example, but is not limited to, a relative quantity, an adjusted quantity, a normalized quantity, a relative concentration, an adjusted concentration, or a normalized concentration.
  • the quantification data for a given peptide structure provides an indication of the abundance of the peptide structure in the biological sample.
  • at least one peptide structure includes a glycopeptide structure having a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3B, with the peptide sequence being one of SEQ ID NOS: 14, 18, 32, 33, 37, 39, 42, 45, 54, 56, 60, 68-77 in Table 3B, the SEQ ID NOS being defined in Table 5 below.
  • the glycopeptides of Table 3B were part of a glycoprotein that are further described in Table 6 and that the glycan portion of the glycopeptides is described in
  • Step 604B includes analyzing the peptide structure data using a supervised machine learning model to generate a disease indicator that predicts whether the biological sample evidences an early stage or late stage EOC on at least one peptide structure selected from a group of peptide structures identified in Table 3B.
  • the group of peptide structures is associated with the early stage or late stage EOC.
  • the group of peptide structures in Table 3B includes peptide structures that have been determined relevant to distinguishing between early stage (stages 1 and 2) or late stage (stages 3 and 4) EOC.
  • the at least 1 peptide structure includes at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, or all 36 of the peptide structures PS-4, PS-8, PS-18, PS-36, PS-37, PS-41, PS-56, PS- 62 to PS-90 identified in Table 3B.
  • step 604B may be implemented using a binary classification model (e.g., a regression model).
  • the regression model may be, for example, penalized multivariable regression model.
  • the disease indicator may be computed using a weight coefficient associated with each peptide structure of the at least 1 peptide structure, the weight coefficient of a corresponding peptide structure of the at least 1 peptide structures may indicate the relative significance of the corresponding peptide structure to the disease indicator.
  • step 604B may include computing a peptide structure profile for the biological sample that identifies a weighted value for each peptide structure of the at least 1 peptide structure.
  • the weighted value for a peptide structure of the at least 1 peptide structure may be a product of a quantification metric for the peptide structure identified from the peptide structure data and a weight coefficient for the peptide structure.
  • the disease indicator may be computed using the peptide structure profile. For example, the disease indicator may be a logit equal to the sum of the weighted values for the peptide structures plus an intercept value. The intercept value may be determined during the training of the model.
  • the disease indicator comprises a probability that the biological sample is evidences malignancy (e.g., EOC) and the supervised machine learning model is configured to generate an output that identifies the biological sample as either evidencing (“positive for”) malignancy when the disease indicator is greater than a selected threshold or not evidencing (“negative for”) malignancy when the disease indicator is not greater than the selected threshold.
  • the selected threshold may be, for example, 0.30, 0.35, 0.40, 0.45, 0.50, 0.55, 0.60, or some other threshold between 0.30 and 0.65. In one or more embodiments, the selected threshold is 0.5.
  • Step 606B includes generating a final output based on the disease indicator.
  • the final output may include a diagnosis output, such as, for example, diagnosis output 324 in Figure 3.
  • the diagnosis output may include the disease indicator, or a diagnosis made based on the disease indicator.
  • the diagnosis may be, for example, early stage or late stage based on the disease indicator.
  • An early stage diagnosis may mean that the biological sample evidences a stage 1 or 2 EOC.
  • a late stage diagnosis may mean that the biological sample evidences a stage 3 or 4 EOC.
  • Generating the diagnosis output in step 606B may include determining that the score falls above (or at or above) a selected threshold and generating a positive diagnosis for the late stage ovarian cancer disease state.
  • step 606B can include determining that the score falls below (or at or below) a selected threshold and generating a negative diagnosis for the late stage ovarian cancer disease state.
  • the score can include a probability score and the selected threshold can be 0.5. In other scoring systems, the selected threshold can fall within a range between 0.30 and 0.65.
  • the final output in step 606B may include a treatment output if the disease indicator predicts malignancy and/or the diagnosis output indicates a positive diagnosis for the ovarian cancer disease state.
  • the treatment output may include, for example, at least one of an identification of a treatment for the subject, a treatment plan for administering the treatment, or both.
  • Treatment for ovarian cancer may include, for example, but is not limited to, at least one of surgery, radiation therapy, a targeted drug therapy (e.g., one or more targeted therapeutic agents), chemotherapy (e.g., one or more chemotherapeutic agents), immunotherapy (e.g., one or more immunotherapeutic agents), hormone therapy, neoadjuvant therapy, or some other form of treatment.
  • the treatment plan may include, for example, but is not limited to, a timeline or schedule for administering the treatment, dosing information, other treatment-related information, or a combination thereof.
  • FCs that were above the 1 corresponded to markers that correlate with metastatic ovarian cancer and those below 1 corresponded to markers that correlate with non-metastatic ovarian cancer.
  • the Wilcoxon matched-pairs signed rank test was used to compare the two groups and a p value found to be ⁇ 0.0001 showing a statistical difference between non-fucosylated and fucosylated.
  • Figures 19A and 19B illustrate that a same set of markers in doublets/triplets analysis for fucosylation revealed a strong association with either metastatic ovarian cancer or non-metastatic ovarian cancer.
  • Doublet analysis refers to monitoring the fold change of a non-fucosylated and fucosylated glycopeptide that was tri or tetra-antennary for sialic acid and had the same peptide sequence and glycan linking site.
  • Triplet analysis refers to monitoring the fold change of a non-fucosylated, fucosylated, and di-fucosylated glycopeptide that was tri or tetraantennary for sialic acid and had the same peptide sequence and glycan linking site.
  • Figures 15A-15E shows that the fucosylated biomarkers (have a number 1 in the 2 nd to last number in the Peptide structure (PS) Name) show a relatively upward trend from stage 1/2 to stage 3/4.
  • Figures 15A-15E shows that the non- fucosylated biomarkers (have a number 0 in the 2 nd to last number in the Peptide structure (PS) Name) show an relatively downward trend from stage 1/2 to stage 3/4.
  • the glycan numbers 6513, 7613, 7614 are examples of fucosylated glycans having tri or tetra- antennary sialic acids.
  • the glycan numbers 6503, 7603, 7604 are examples of non- fucosylated glycans having tri or tetra-antennary sialic acids.
  • VLB.3 Exemplary Methodology — Based on Table 3C
  • process 600B may be implemented using Table 3C instead of Table 3B.
  • at least one peptide structure includes a glycopeptide structure having a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3C, with the peptide sequence being one of SEQ ID NOS: 101-125 in Table 3C.
  • the group of peptide structures in Table 3C includes peptide structures that have been determined relevant to distinguishing between early stage (stages 1 and 2) or late stage (stages 3 and 4) EOC.
  • the at least 1 peptide structure includes at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, or all 25 of the peptide SEQ ID NOS: 101-125 identified in Table 3C.
  • VLB.3 Exemplary Methodology — Based on Table 3D
  • process 600B may be implemented using Table 3D instead of Table 3B.
  • at least one peptide structure includes a glycopeptide structure having a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3D, with the peptide sequence being one of SEQ ID NOS: 126-175 in Table 3D.
  • At least one peptide structure includes a glycopeptide structure having a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3D, with the peptide sequence being one of SEQ ID NOS: 131-134, 137, 139, 140, 143, 151, 165-167 in Table 3D [0224]
  • the group of peptide structures in Table 3D includes peptide structures that have been determined relevant to distinguishing between early stage (stages 1 and 2) or late stage (stages 3 and 4) EOC.
  • the at least 1 peptide structure includes at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, or all 50 of the peptide SEQ ID NOS: 126-175 identified in Table 3D.
  • PS Peptide Structure
  • KNGl_294_6503 is a reference code for the protein name (e.g., KNG1), followed by the glycan linking site position in the protein (e.g., the number 294 that is preceded by an underscore and represents a sequential amino acid position in protein KNG1), and followed by the glycan structure GL number (e.g., the number 6503 that is preceded by an underscore and represents a glycan composition Hex(6)HexNAc(5)Fuc(0)NeuAc(3)).
  • PS Peptide Structure
  • the Peptide Structure (PS) Name of contains a prefix that represents an abbreviation (that may include a combination of letters and numbers) for a protein abbreviation that corresponds to the Protein Abbreviation of Table 6.
  • the term Linking Site Pos. in Protein Sequence is a number that refers to the sequential position of an amino acid of the corresponding protein in which a glycan is attached.
  • the amino acid position of the peptide sequence is defined by the sequentially numbered order of amino acids based on the Uniprot ID of the corresponding protein for the peptide sequence.
  • Peptide Sequence is a number that refers to the sequential position of an amino acid of the corresponding peptide in which a glycan is attached.
  • the amino acid position of the peptide sequence is defined by the sequentially numbered order of amino acids for the peptide sequence.
  • Glycan Structure GL No. is a number that corresponds to a symbol structure and a composition of the glycan as indicated in Tables 7.
  • the term AGP12 for SEQ ID NOs: 68-69 represents that the glycopeptide is a fragment of either AGP1 or AGP2.
  • the Glycan Linking Site Pos. in Protein Sequence column should be used for identification of the peptide.
  • the Glycan Linking Site Pos. in Protein Sequence column should be used for identification of the peptide.
  • the second number subsequent to the second underscore in the Peptide Structure (PS) NAME is inconsistent with the Glycan Structure GL NO column, then the Glycan Structure GL NO column should be used for identification of the glycan portion of the glycopeptide. If the Peptide Structure (PS) NAME does not contain any numbers, then the peptide is non-glycosylated. In some instances of the Peptide Structure (PS) NAME, subsequent to the prefix, there is a number noted with the notation MC that indicates that there was a missed cleavage at position in the peptide sequence as noted by the number.
  • Figure 7 is a flowchart of a process for training a model to diagnose a subject with respect to an ovarian cancer disease state in accordance with one or more embodiments.
  • Process 700 may be implemented using, for example, at least a portion of workflow 100 as described in Figures 1, 2A, and 2B and/or analysis system 300 as described in Figure 3.
  • process 700 may be one example of an implementation for training the model used in the process 500 in Figures 5, 6, or 6B.
  • Step 702 includes receiving quantification data for a panel of peptide structures for a plurality of subjects.
  • the plurality of subjects may include a first portion diagnosed with a negative diagnosis of an ovarian cancer disease state and a second portion diagnosed with a positive diagnosis of the ovarian cancer disease state.
  • the plurality of subjects may include a first portion having early stage EOC and a second portion have late stage EOC.
  • the quantification data comprises an initial plurality of peptide structure profiles for the plurality of subjects.
  • a peptide structure profile in the initial plurality of peptide structure profiles may include a feature associated with a corresponding peptide structure.
  • the feature may be relative abundance, concentration, site occupancy, or some other quantification-based feature.
  • the initial plurality of peptide structure profiles may include, one, two, three, or more profiles for a given peptide structure.
  • Step 704 includes training a machine learning model using the quantification data to diagnose a biological sample with respect to the ovarian cancer disease state using a group of peptide structures associated with the ovarian cancer disease state (e.g., the first group of peptide structures is identified in Table 1, the second group of peptide structures is identified in Table 2, the third group of peptide structures is identified in Table 3).
  • the first, second, and third groups of peptide structures are listed in Tables 1, 2, and 3, respectively, with respect to relative significance to diagnosing the biological sample as evidencing malignancy (e.g., EOC).
  • Step 704 can include training the machine learning using a portion of the quantification data corresponding to a training group of peptide structures included in the plurality of peptide structures.
  • Step 704 can include training a machine learning model using the quantification data to assess a biological sample with respect to the staging of the ovarian cancer disease state using a group of peptide structures associated with the ovarian cancer disease state such as a group of peptide structures identified in Table 3B, 3C, or 3D.
  • Step 704 may include reducing the plurality of peptide structure profiles using LASSO regression to identify a final group of peptide structures identified in Table 1 above.
  • Step 704 may include reducing the plurality of peptide structure profiles using LASSO regression to identify a final group of peptide structures identified in Table 2 above.
  • Step 704 may include reducing the plurality of peptide structure profiles using LASSO regression to identify a final group of peptide structures identified in Tables 3B, 3C, or 3D above.
  • Training data can be used for training the supervised machine learning model.
  • the training data can include a plurality of peptide structure profiles for a plurality of subjects and a plurality of subject diagnoses for the plurality of subjects.
  • the plurality of subject diagnoses can include a positive diagnosis for any subject of the plurality of subjects determined to have the ovarian cancer disease state and a negative diagnosis for any subject of the plurality of subjects determined not to have the ovarian cancer disease state.
  • the machine learning model can include a binary classification model.
  • Some binary classification models can include logistical regression models.
  • Some logistical regression models can include LASSO regression models.
  • An alternative or additional step in process 700 can include filtering the initial plurality of peptide structure profiles by a coefficient of variation to generate a plurality of peptide structure profiles for use in training the machine learning model. As one example, only those peptide structure profiles having a low coefficient of variation ( ⁇ 20%) were included int the plurality of peptide structure profiles used for training.
  • An alternative or additional step in process 700 can include performing a differential expression analysis using initial training data to compare a first portion of the plurality of subjects diagnosed with the positive diagnosis for the ovarian cancer disease state versus a second portion of the plurality of subjects diagnosed with the negative diagnosis for the ovarian cancer disease state.
  • An alternative or additional step in process 700 can include identifying a first portion of the plurality of samples for subjects with benign pelvic tumors and malignant pelvic tumors and a second portion of the plurality of samples for subjects with a healthy status.
  • An alternative or additional step in process 700 can include generating a training set of peptide structure profiles for 80% of the first portion and a test set of peptide structure profiles for a remaining 20% of the first portion and the second portion.
  • the machine learning model is a supervised machine learning model that is trained to determine weight coefficients for a panel of peptide structures such that a first portion of the weight coefficients for a first portion of the panel of peptide structures are non-zero and a second portion of the weight coefficients for a second portion of the panel of peptide structures are zero (or, alternatively, substantially close to zero so as to not be statistically significant).
  • the final output generated in step 506 in Figure 5 or in step 606 in Figure 6 may include a treatment output.
  • the treatment output may identify one or more treatment types for a subject based on the disease indicator and/or diagnosis output generated via process 500 in Figure 5 or process 600 in Figure 6, respectively.
  • Treatment for ovarian cancer e.g., EOC
  • the treatment output may include, for example, a treatment plan.
  • the treatment plan may include, for example, but is not limited to, a timeline or schedule for administering the treatment, dosing information, other treatment-related information, or a combination thereof.
  • Being able to accurately predict malignancy via the process 500 in Figure 5 and/or the process 600 in Figure 6 may allow treatment for malignant pelvic tumors (e.g., EOC) to be started earlier without requiring, in many or most cases, further invasive testing such as a biopsy.
  • malignant pelvic tumors e.g., EOC
  • a patient biological sample is obtained from a subject.
  • the biological sample may be processed (e.g., via digestion and fragmentation) such that one or more peptide structures of interest are detected. For example, detection and quantification may be performed for one or more peptide structures from Table 1, Table 2, Table 3, Table 3B, Table 3C, and/or Table 3D.
  • the quantification data that is generated for these peptide structures may be input into a trained binary classification model to generate a disease indicator, which may be, for example, a probability score.
  • a determination may be made as to whether the disease indicator (e.g., score) is above or below a selected threshold (e.g., 0.5). If the disease indicator is above the selected threshold, the biological sample may be classified as evidencing malignant pelvic tumor.
  • this classification may further include a classification that the subject is in need of treatment. If the subject is in need of treatment based on the classification, treatment is administered. For example, a therapeutically effective amount of a therapeutic agent is administered to the patient, where the therapeutic agent is selected from a chemotherapeutic agent, an immunotherapeutic agent, a hormone therapy, a targeted therapeutic agent, a neoadjuvant therapy, or a combination.
  • compositions comprising one or more of the peptide structures listed in Table 1, in Table 2, in Table 3, in Table 3B, in Table 3C, or in Table 3D.
  • a composition comprises a plurality of the peptide structures listed in Table 1, a plurality of the peptide structures listed in Table 2, or a plurality of the peptide structures listed in Table 3.
  • a composition comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 412, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, or 90 of the peptide structures listed in Tables 1, 2, 3, 3B, 3C, and 3D.
  • a composition comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, or all 36 of the peptide structures listed in Table 3B. In one or more embodiments, a composition comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or all 25 of the glycopeptide structures listed in Table 3C.
  • a composition comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or all 50 of the glycopeptide structures listed in Table 3D.
  • a composition comprises a peptide structure having an amino acid sequence with at least 80% sequence identity, such as, for example, at least 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to any one of SEQ ID NOs: 11-19, 31-46, 53-65, 68-77, 101-125, and 126-175 listed in Tables 1, 2, 3, 3B, 3C, and 3D.
  • a composition comprises a peptide structure having an amino acid sequence with at least 80% sequence identity, such as, for example, at least 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to any one of SEQ ID NOs: 131-134, 137, 139, 140, 143, 151, 165-167 listed in Tables 3D
  • compositions comprising one or more precursor ions having a defined charge and/or defined mass-to-charge (m/z) ratio, as listed in Tables 4, 4B, and 4C.
  • compositions comprising one or more product ions having a defined mass-to-charge (m/z) ratio, which product ions are produced by converting a peptide structure described herein (e.g., a peptide structure listed in Tables 1, 2, 3, 3B, 3C, or 3D) into a gas phase ion in a mass spectrometry system.
  • Conversion of the peptide structure into a gas phase ion can take place using any of a variety of techniques, including, but not limited to, matrix assisted laser desorption ionization (MALDI); electron ionization (El); electrospray ionization (ESI); atmospheric pressure chemical ionization (APCI); and/or atmospheric pressure photo ionization (APPI).
  • MALDI matrix assisted laser desorption ionization
  • El electron ionization
  • ESI electrospray ionization
  • APCI atmospheric pressure chemical ionization
  • APPI atmospheric pressure photo ionization
  • compositions comprising one or more product ions produced from one or more of the peptide structures described herein (e.g., a peptide structure listed in Tables 1, 2, 3, 3B, 3C, or 3D).
  • a composition comprises a set of the product ions listed in Table 4, 4B, or 4C having an m/z ratio selected from the list provided for each peptide structure in Table 4, 4B, or 4C.
  • a composition comprises at least one of peptide structures PS-1 to PS-10 identified in Table 1. In some embodiments, a composition comprises at least one of peptide structures PS-11 to PS-34 and PS-5 identified in Table 2. In some embodiments, a composition comprises at least one of peptide structures PS-1, PS-5, PS-11, PS-15, PS-20, PS-25, PS-28, PS-29, PS-30, PS-31, PS-32, and PS-35 to PS-61 identified in Table 3. In some embodiments, a composition comprises at least one of peptide structures PS-4, PS-8, PS- 18, PS-36, PS-37, PS-41, PS-56, PS-62 to PS-90 identified in Table 3B.
  • a composition comprises at least one of peptide structures of SEQ ID NOS 101-125 identified in Table 3C. In some embodiments, a composition comprises at least one of peptide structures of PS-ID 91 to 140 identified in Table 3D. In some embodiments, a composition comprises at least one of peptide structures of PS-ID NO: 96-99, 102, 104, 105, 108, 116, and 130-132 identified in Table 3D. [0247] In one or more embodiments, a composition comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or all 10 of the peptide structures PS-1 to PS-10 identified in Table 1.
  • a composition comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, or all 25 of the peptide structures PS-11 to PS-34 and PS-5 identified in Table 2.
  • a composition comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, or all 38 of the peptide structures PS-1, PS-5, PS-11, PS-15, PS-20, PS-25, PS-28, PS-29, PS-30, PS-31, PS-32, and PS-35 to PS-61 identified in Table 3.
  • the at least 3 peptide structures additionally include at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, or all 7 of the remaining peptide structures PS-1, PS-5, PS-11, PS-15, PS-20, PS-25, PS-28, PS-29, PS-30, PS-31, PS-32, and PS-35 to PS-61 identified in Table 3.
  • a composition comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, or all 36 of the peptide structures PS-4, PS-8, PS-
  • a composition comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, or all 25 of the peptide structures of SEQ ID NOS 121-125 identified in Table 3C.
  • a composition comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least
  • a composition comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, or all 12 of the peptide structures of SEQ ID NOS 131-134, 137, 139, 140, 143, 151, 165-167 identified in Table 3D
  • a composition comprises a peptide structure or a product ion.
  • the peptide structure or product ion can include an amino acid sequence having at least 90% sequence identity to any one of SEQ ID NOS: 11-19, as identified in Table 5, corresponding to peptide structures PS-1 to PS-10 in Table 1.
  • the peptide structure or product ion can include an amino acid sequence having at least 90% sequence identity to any one of SEQ ID NOS: 14, 15, 31-46, as identified in Table 5, corresponding to various ones of peptide structures PS-5 and PS-11 to PS-34 in Table 2.
  • the peptide structure or product ion can include an amino acid sequence having at least 90% sequence identity to any one of SEQ ID NOS: 11, 14, 15, 31,32, 33, 34, 37, 38, 40, 42, 44, 45, 46, 53-65, as identified in Table 5, corresponding to various ones of peptide structures PS-1, PS-5, PS-11, PS- 15, PS-20, PS-25, PS-28, PS-29, PS-30, PS-31, PS-32, and PS-35 to PS-61 in Table 3.
  • the peptide structure or product ion can include an amino acid sequence having at least 90% sequence identity to any one of SEQ ID NOS: 14, 18, 32, 33, 37, 39, 42, 45, 54, 56, 60, 68-77, as identified in Table 5, corresponding to various ones of peptide structures PS-4, PS-8, PS-18, PS-36, PS-37, PS-41, PS-56, PS-62 to PS-90 in Table 3B.
  • the peptide structure or product ion can include an amino acid sequence having at least 90% sequence identity to any one of SEQ ID NOS: 101-125, corresponding to various ones of peptide structures in Table 3C or product ions in Table 4B.
  • the peptide structure or product ion can include an amino acid sequence having at least 90% sequence identity to any one of SEQ ID NOS: 126-175, as identified in Table 5, corresponding to various ones of peptide structures PS-91 to PS-140 in Table 3D.
  • the product ion is selected as one from a group consisting of product ions identified in Tables 4, 4B, and 4C including product ions falling within an identified m/z range of the m/z ratio identified in Tables 4, 4B, and 4C and characterized as having a precursor ion having an m/z ratio within an identified m/z range of the m/z ratio identified in Tables 4, 4B, and 4C.
  • a first range for the product ion m/z ratio may be ⁇ 0.5.
  • a second range for the product ion m/z ratio may be ⁇ 0.8.
  • a third range for the product ion m/z ratio may be ⁇ 1.0.
  • a first range for the precursor ion m/z ratio may be ⁇ 1.0; a second range for the precursor ion m/z ratio may be ( ⁇ 1.5).
  • a composition may include a product ion having an m/z ratio that falls within at least one of the first range ( ⁇ 0.5), the second range ( ⁇ 0.8), or the third range ( ⁇ 1.0) of the product ion m/z ratio identified in Tables 4, 4B, and 4C, and characterized as having a precursor ion having an m/z ratio that falls within at least one of first range ( ⁇ 0.5), a second range ( ⁇ 1.0), or a third range ( ⁇ 1.0 of the precursor ion m/z ratio identified in Tables 4, 4B, and 4C.
  • Table 4B Mass Spectrometry-Related Characteristics for the Peptide Structures associated with Ovarian Cancer (e.g., EOC) - in accordance with Table 3C
  • Table 4C Mass Spectrometry-Related Characteristics for the Peptide Structures associated with Ovarian Cancer (e.g., EOC) - in accordance with Table 3D
  • Tables 4, 4B, and 4C show various parameters associated with the identification of the peptide and glycopeptides using LC and MRM-MS.
  • the retention time (RT) represents the amount of time in minutes for the peptide elute from the chromatography column.
  • the collision energy represents the energy applied to the peptide for creating fragments (i.e., product ions) such as, for example, in the 2 nd quadrupole of the triple quadrupole MS.
  • the first precursor m/z represents a ratio value associated with an ionized form having a first precursor charge for the peptide or glycopeptide.
  • the second precursor m/z represents a ratio value associated with an ionized form having a second precursor charge for the peptide or glycopeptide.
  • the first precursor ion is associated with a first product ion having a m/z ratio that was formed from a collision and the second precursor ion is associated with a second product ion having a m/z ratio that was formed from a collision.
  • the first precursor and the second precursor may be the same, but the associated first and second product m/z ratios are different.
  • Table 5 defines the peptide sequences for SEQ ID NOS: 11-19, 31-46, 53-65, 68- 77, and 126-175 from at least one of Tables 1, 2, 3, 3B, 3C, and 3D. Table 5 further identifies a corresponding protein SEQ ID NO. for each peptide sequence.
  • Table 6 identifies the proteins of SEQ ID NOS: 1-10, 20-30, 47-52, and 66-67 from at least of one of Tables 1, 2, 3, 3B, 3C, and 3D.
  • Table 6 identifies a corresponding protein abbreviation and protein name for each of protein SEQ ID NOS: 1-10, 20-30, 47-52, and 66- 67.
  • Table 6 identifies a corresponding Uniprot ID and protein sequence for each of protein SEQ ID NOS: 1-10, 20-30, 47-52, and 66-67.
  • Table 7 identifies and defines the glycan symbol structures included in Tables 1, 2, 3, 3B, 3C, and 3D.
  • Table 7 identifies a coded representation of the composition for each glycan structure included in Tables 1, 2, 3, 3B, 3C, and 3D.
  • the 4-digit GL NO. is a designation that represents the number of hexoses, the number of HexNAcs, the number of Fucoses, and the number of Neuraminic Acids. It should be noted that glycan structure GL No 1102 is an O-glycan and the remaining glycans of Table 7 were N-glycans.
  • Table 7 illustrates the symbol structure and composition of detected glycan moi eties that correspond to glycopeptides of Table 1, 2, 3, 3B, 3C, and 3D based on the Glycan GL NO.
  • the term Symbol Structure illustrates a geometric linking structure of the carbohydrates where the bottommost carbohydrate such as N-acetylglucosamine is bound to the designated amino acid for an N-linked glycan and the rightmost carbohydrate such as N- acetylgalactosamine is bound to the designated amino acid for an O-linked glycan.
  • the Glycan Structure GL NO 1102 is an O-linked glycan and that the rest of the glycans in Table 7 are N-linked glycans.
  • N-linked glycans have a glycan attached to the amino acid asparagine and O-linked glycans have a glycan attached to either a serine or a threonine.
  • the identity of the various monosaccharides is illustrated by the Legend section located at the end of Table 7.
  • the abbreviations of the Legend are Glc that represents glucose and is indicated by a dark circle, Gal that represents galactose and is indicated by an open circle, Man that represents mannose and is indicated by a circle with intermediate grey shading, Fuc that represents fucose and is indicated by a dark triangle, Neu5Ac that represents N- acetylneuraminic acid and is indicated by a dark diamond, GlcNAc that represents N- acetylglucosamine and is indicated by a dark square, GalNAc that represents N- acetylgalactosamine and is indicated by an open square, and ManNAc that represents N- acetylmannosamine and is indicated by a square with intermediate grey shading.
  • Composition refers to the number of various classes of carbohydrates that make up the glycan.
  • the quantity for each class of carbohydrate is depicted as a number in parenthesis to the right of an abbreviation that corresponds to the class of the carbohydrate.
  • the abbreviations for these clasess are Hex, HexNAc, Fuc, and NeuAc that respectively correspond to hexose, N-acetylhexosamine, fucose, and N-acetylneuraminic acid.
  • hexose sugars include glucose, galactose, and mannose; and N-acetylhexosamine sugars includes N-acetylglucosamine, N-acetylgalactosamine, and N-acetylmannosamine.
  • the terms Neu5Ac, NeuAc, and N-acetylneuraminic acid may be referred to as sialic acid.
  • Glycan Structure GL NO 3510 there are two symbol structures provided for one Glycan Structure GL NO such as, for example, Glycan Structure GL NO 3510.
  • the identify of a peptide that references a Glycan Structure GL NO that has two symbol structures could be either one of the two possibilities based on the MRM of the LC- MS analysis.
  • a bracket symbol is used as part of the Symbol Structure to indicate that the precise bonding linkage is not exactly known, but that the linking line segment is attached to one of the plurality of adj acent carbohydrates immediately adj acent to the bracket.
  • the fucose of Glycan Structure GL NO 3510 could have either a core fucose or an outer-arm fucose linkage.
  • the fucose orientation of either core or outer-arm linkage can be specified.
  • glycan symbol structure can illustrate an antennary format in the form of branches.
  • Glycan Structure GL NO’s 6513 and 7604 show a tri- antennary and tetra-antennary sialic acid format, respectively.
  • kits comprising one or more compositions, each comprising one or more peptide structures of the disclosure that can be used as assay standards, and instructions for use.
  • Kits in accordance with one or more embodiments described herein may include a label indicating the intended use of the contents of the kit.
  • label as used herein with respect to a kit includes any writing, or recorded material supplied on or with a kit, or that otherwise accompanies a kit.
  • the peptide structures and the transitions produced therefrom, as described herein, may be useful for diagnosing and treating an ovarian cancer disease state.
  • a transition includes a precursor ion and at least one product ion grouping.
  • the peptide structures in Tables 1, 2, 3, 3B, 3C, and 3D as well as their corresponding precursor ion and product ion groupings in Tables 4, 4B, and 4C can be used in mass spectrometry-based analyses to diagnose and facilitate treatment of diseases, such as, for example, PC.
  • aspects of the disclosure include methods for analyzing one or more peptide structures, as described herein.
  • the methods involve processing a sample from a patient to generate a prepared sample that can be inputted into a mass spectrometry system (e.g., a reaction monitoring mass spectrometry system).
  • processing the sample can comprise performing one or more of a denaturation procedure, a reduction procedure, an alkylation procedure, and a digestion procedure.
  • the denaturation and reduction procedures may be implemented in a manner similar to, for example, denaturation and reduction 202 in Figure 2.
  • the alkylation procedure may be implemented in a manner similar to, for example, alkylation procedure 204 in Figure 2.
  • the digestion procedure may be implemented in a manner similar to, for example, digestion procedure 206 in Figure 2.
  • the methods for analyzing one or more peptide structures involve detecting a set of product ions generated by a reaction monitoring mass spectrometry system in which one or more product ions may correspond to each of the one or more peptide structures that have been inputted into the mass spectrometry system.
  • each peptide structure can be converted into a set of product ions having a defined m/z ratio, as provided in Tables 4, 4B, and 4C or an m/z ratio within an identified m/z ratio as provided in Tables 4, 4B, and 4C.
  • the methods involve generating quantification (e.g., abundance) data for the one or more product ions detected using the reaction monitoring mass spectrometry system.
  • the methods further comprise generating a diagnosis output using the quantification data and a model that has been trained using supervised or unsupervised machine learning.
  • the reaction monitoring mass spectrometry system may include multiple/selected reaction monitoring mass spectrometry (MRM/SRM-MS) to detect the one or more product ions and generate the quantification data.
  • Figure 8 is a table describing the distribution of the samples acquired in this exemplary retrospective analysis in accordance with one or more embodiments.
  • serum samples were acquired from a commercial biobank for 151 women with benign pelvic masses, 145 women with malignant epithelial ovarian cancer (EOC), and 55 healthy controls. Information on stage of EOC was available in 98 of the 145 patients with EOC. All samples were obtained prior to therapeutic intervention. Information on the benign or malignant nature of tumors was based on histopathological analysis of tissue specimens.
  • Sample processing involved pooled human serum/plasma (e.g., glycoprotein standards purified from human serum/plasma) for assay normalization, dithiothreitol (DTT), and iodoacetamide (IAA), sequencing-grade trypsin, LC-MS-grade water and acetonitrile, and formic acid (LC-MS grade). Serum samples were treated with DTT and IAA to reduce disulfide bonds and to inhibit cysteine proteases, respectively, followed by digestion with trypsin at 37°C for 18 hours. The digestion was quenched by adding formic acid to each sample to a final concentration of 1% (v/v).
  • DTT dithiothreitol
  • IAA iodoacetamide
  • LC-MS analysis included separating digested serum samples over an Agilent ZORBAX Eclipse Plus C18 column (2.1 mm x 150 mm i.d., 1.8 pm particle size) using an Agilent 1290 Infinity UHPLC system.
  • the mobile phase A consisted of 3% acetonitrile, 0.1% formic acid in water (v/v), and the mobile phase B of 90% acetonitrile 0.1% formic acid in water (v/v), with the flow rate set at 0.5 mL/minute.
  • the binary solvent composition was set at 100% mobile phase A at the beginning of the run, linearly shifting to 20% B at 20 minutes, 30% B at 40 minutes, and 44% B at 47 minutes.
  • the column was flushed with 100% B and equilibrated with 100% A for a total run time of 70 minutes.
  • samples were injected into an Agilent 6495B triple quadrupole MS operated in dynamic multiple reaction monitoring (dMRM) mode.
  • the MRM transitions comprised 513 glycopeptide structures which were normalized by comparing them with the abundance of 71 non-glycosylated peptide structures, representing each of 71 proteins from which the glycopeptides monitored were derived.
  • Samples were injected randomized as to underlying phenotype, and reference pooled serum digests were injected interspersed with study samples. VIII. A.3. Data Analysis
  • This subset included 976 features, with each feature being a concentration, relative abundance, or site occupancy for a corresponding peptide structure and where some peptide structures correspond with multiple features.
  • a given peptide structure may be associated with one, two, or three features within the subset of the 976 features.
  • Figure 9 is a plot diagram illustrating the results of a principal component analysis performed to assess the segregation between healthy, benign pelvic tumor, and EOC samples across first and second principal components in accordance with one or more embodiments.
  • EOC samples segregated distinctly from healthy control samples, while most benign pelvic tumors did not segregate as distinctly from healthy control samples.
  • Figure 10 is a plot diagram illustrating the results of a principal component analysis performed to assess segregation between healthy, benign pelvic tumor, early EOC, late EOC, and missing (undocumented) samples).
  • EOC samples and in particular late stage EOC samples
  • segregated distinctly from healthy control samples while most benign pelvic tumors did not segregate as distinctly from healthy control samples.
  • FIG 11 is an illustration of a receiver operating characteristic (ROC) diagram corresponding to the multivariable model built to predict malignancy v. benign status of pelvic tumors in accordance with one or more embodiments.
  • the multivariable model that was built may be used accurately and reliably to malignant EOC and distinguish such malignancy from a healthy status.
  • diagnostic power may be used to reduce the need for unnecessary invasive testing.
  • diagnostic information can be used to identify patients with EOC earlier, which may lead to earlier treatment, improved treatment recommendations, and improved treatment plans.
  • FIG 12 is an illustration of a diagram showing the probability distributions for the various groups using the multivariable model for predicting malignancy v. benign status of pelvic tumors in accordance with one or more embodiments.
  • the probability distributions for benign pelvic tumor, healthy, missing (undocumented), stage 1 EOC, stage 2 EOC, stage 3 EOC, and stage 4 EOC samples increased with cancer stage, with probability distributions being similar across training and test sets.
  • applying the built multivariable model to healthy patients, who were not utilized in the training resulted in few misclassifications and a spread nearly equivalent to that of the benign pelvic tumor cases.
  • Such results indicate that the glycoproteomic signature of the solidly predicts malignancy and severity of disease.
  • Table 8 below provides the fold changes, FDRs, and p-values for the 10 peptide structures PS-1 to PS- 10 (same as those in Table 1 above) based on differential expression analysis (DEA).
  • Table 8 Peptide Structure Markers for Regression Model to distinguish between Epithelial Ovarian Cancer and Healthy State
  • FIG. 13 is an illustration of a receiver operating characteristic (ROC) diagram corresponding to the multivariable model built to predict malignancy v. benign status of pelvic tumors in accordance with one or more embodiments.
  • the multivariable model that was built may be used accurately and reliably to triage pelvic tumors and distinguish those that are malignant from those that are benign.
  • diagnostic power may be used to reduce the need for invasive testing (e.g., biopsy) prior to treatment can be administered.
  • diagnostic information can be used to improve treatment recommendations and treatment plans (e.g., earlier treatment in the case of malignant EOC) and reduce indications for unnecessary treatment (e.g., no indication for surgery when the pelvic tumor is benign).
  • FIG 14 is an illustration of a diagram showing the probability distributions for the various groups using the multivariable model for predicting malignancy v. benign status of pelvic tumors in accordance with one or more embodiments.
  • the probability distributions for benign pelvic tumor, healthy, missing (undocumented), stage 1 EOC, stage 2 EOC, stage 3 EOC, and stage 4 EOC samples increased with cancer stage, with probability distributions being similar across training and test sets.
  • applying the built multivariable model to healthy patients, who were not utilized in the training resulted in few misclassifications and a spread nearly equivalent to that of the benign pelvic tumor cases.
  • Such results indicate that the glycoproteomic signature of the 25 peptide structures for the LASSO regression model solidly predict malignancy and severity of disease.
  • Table 9 below provides the fold changes, FDRs, and p-values for the 25 peptide structures PS-5 and PS-11 to PS-34 (same as those in Table 2 above) based on differential expression analysis (DEA).
  • Table 10 below provides the fold changes, FDRs, and p-values for the 36 peptide structures PS-4, PS-8, PS-18, PS-36, PS-37, PS-41, PS-56, PS-62 to PS-90 (same as those in Table 3B above) based on differential expression analysis (DEA).
  • the peptide structures PS- 4, PS-8, PS-18, PS-36, PS-37, PS-41, PS-56, PS-62 to PS-90 are ordered in Table 10 with respect to relative significance to the p value score generated by the model.
  • Table 10 Peptide Structure Markers for Regression Model to distinguish between late stage (3/4) EOC and early stage EOC (1/2) using the biomarkers of Table 3B.
  • Table 10B below provides the fold changes, FDRs, and p-values for the 25 peptide structures denoted by SEQ ID NO 101-125 (in accordance with Table 3C above) using differential expression analysis (DEA).
  • Table 10B Peptide Structure Markers for Regression Model to distinguish between late stage (3/4) EOC and early stage EOC (1/2) using the biomarkers of Table 3C.
  • Table 10C below provides the fold changes, FDRs, and p-values for the 50 peptide structures denoted by SEQ ID NO 126-175 (in accordance with Table 3D above) using differential expression analysis (DEA).
  • DEA differential expression analysis
  • Table 10D below provides the fold changes, FDRs, and p-values for the 12 peptide structures denoted by SEQ ID NO 131-134, 137, 139, 140, 143, 151, 165-167 (in accordance with Table 3D above) using differential expression analysis (DEA).
  • DEA differential expression analysis
  • Table 10D Peptide Structure Markers for Regression Model to distinguish between late stage (3/4) EOC and early stage EOC (1/2) using 12 of the biomarkers of Table 3D.
  • the markers from Table 3B were used to train a regularized regression model (e.g., LASSO regression model).
  • Coefficients for the regularized regression model are provided in Table 11.
  • a probability for one of the states can be determined by summing together the product of the concentration of each biomarker in the sample and the respective coefficient (of one column) and then adding the summation and the intercept to yield the logit of a probability score.
  • the logit of the probability, to which the inverse logit function can be applied is equal to the following equation 1 (eq. 1).
  • n a number of biomarkers having a unique PS-ID No
  • i an index number for each of the biomarkers, Table 11.
  • the markers from Table 3C were used to train a regularized regression model (e.g., LASSO regression model).
  • Coefficients for the regularized regression model e.g., LASSO regression model
  • Table 11B Coefficients for the regularized regression model (e.g., LASSO regression model) are provided in Table 11B.
  • a probability for one of the states can be determined by summing together the product of the concentration of each biomarker in the sample and the respective coefficient (of one column) and then adding the summation and the intercept to yield the logit of a probability score (see equation 1).
  • FIG. 17 illustrates a receiver-operating-characteristic (ROC) curve and the area under curve (AUC) for the regularized regression model (e.g., LASSO regression model) for early stage and late stage ovarian cancer samples using testing case data and training case data.
  • ROC receiver-operating-characteristic
  • AUC area under curve
  • Table 12 shows the accuracy, sensitivity, specificity and precision for the training data set and the testing data set.
  • Table 13 shows the training accuracy and testing accuracy for the early stage and late stage cohort for ovarian cancer.
  • the markers from Table 3D were used to train a regularized regression model (e.g., LASSO regression model).
  • Coefficients for the regularized regression model e.g., LASSO regression model
  • Table 11C Coefficients for the regularized regression model (e.g., LASSO regression model) are provided in Table 11C.
  • a probability for one of the states can be determined by summing together the product of the concentration of each biomarker in the sample and the respective coefficient (of one column) and then adding the summation and the intercept to yield the logit of a probability score (see equation 1).
  • predicted probability was generated for early stage and late stage ovarian cancer samples showing a stratification in predicted probabilities between the two cohorts as is illustrated in Figure 20.
  • predicted probability can be generated for classifying early stage and late stage ovarian cancer samples using the markers with non-zero coefficients such as SEQ ID NO’s 130-135, 137, 139, 140, 143, 148, 149, 155, 158-162, 166, and 171.
  • a logistic regression model was used with the glycopeptides of Table 3D where the glycopeptides had 1 or more sialic acids and zero or more fucosylations for the early and late stage EOC cohorts.
  • glycopeptides that included fucose were found to be associated with EOC.
  • glycopeptides that included fucose and also carrying tri- and tetra-antennary glycan structure were found to be more strongly associated with EOC.
  • Figures 21A to 21E show that the relative abundance of tri- and tetra-antennary glycan structures in benign tumors, early-stage EOC and late-stage EOC showed an increase with the progression of the EOC disease.
  • the numbers 6512, 6512, 7612, 7613, 7614 correspond to the five distinct glycans attached to the glycopeptides.
  • the three leftmost bar graphs represent glycopeptides with tetra-antennary glycans with varying degrees of sialylation.
  • the two rightmost bar graphs are Figures 21D and 21E and they represent glycopeptides with tri-antennary glycans with two or three sialic acids.
  • Figures 21A to 21E are used to show the statistical comparisons between the benign and late-stage cohorts (highest horizontal bar), early-stage and late-stage cohorts (middle horizontal bar), and the benign and early-stage cohorts (lowest horizontal bar),
  • Table 14 shows the accuracy, sensitivity, and specificity for the training data set and the testing data set.
  • a specific subset of tri- and tetra-antennary fucosylated N-glycopeptides were identified that can be used to differentiate between early- and late- stage ovarian cancer.
  • the fucose portion of the specific subset of tri- and tetra- antennary fucosylated N-glycopeptides were found to have an outer arm position. It should be noted that fucose can be bound to a glycan in a core fucosylation or outer-arm orientation.
  • Core fucosylation is a modification of a N-glycan core structure, forming the al, 6 fucosylation of the GlcNAc residue linked to the asparagine, that is catalyzed by FUT8.
  • a fucose in the outer-arm orientation is attached to the antennae of the complex type N-glycans by a-(l-3/4) linkage to the GlcNAc residues or by a-(l-2) linkage to galactose.
  • Figure 22 is a representative figure of a mass spectra with m/z represented on the X-axis and intensity (and therefore abundance) represented on the Y-axis. Arrows indicate the breakdown products indicating the fucose is on the outer-arm (purple diamond - sialic acid, yellow circle - galactose, blue square - N-acetylglucosamine, red triangle - fucose, green circle - mannose). It is worth noting that there is a 4 glycan breakdown fragment composed of sialic acid, galactose, N-acetylglucosamine, and fucose (m/z value of 803.294).
  • the sialic acid is connected to galactose
  • galactose is connected to N- acetylglucosamine
  • N-acetylglucosamine is connected to fucose.
  • the 4 glycan breakdown fragment represents a single antennary branch having a fucose in an outer arm fucose position where the aggregate glycan was cleaved at a linkage between a mannose and a N- acetylglucosamine.
  • the 3 glycan breakdown fragment includes galactose, N-acetylglucosamine, and fucose (m/z value of 512.198).
  • the galactose is connected to N- acetylglucosamine
  • N-acetylglucosamine is connected to fucose.
  • the presence of the 4 glycan breakdown fragment and the 3 glycan breakdown fragment as shown in Figure 22 indicates the presence of outer arm fucosylation.
  • SEQ ID NOS. 131, 137, 143, 155, 158, 159, 162, 166, and 171 correspond to glycopeptides that each have a non-zero coefficient along with one fucose.
  • SEQ ID NO 131, 137, 143, 155, 159, 162, 166, and 171 each correspond to a glycoepeptide having an outer arm fucosylation format.
  • glycopeptide biomarkers with outer arm fucosylation can provide better prediction of ovarian cancer disease states.
  • a predicted probability can be generated for early stage and late stage ovarian cancer samples showing a stratification in predicted probabilities between the two cohorts.
  • predicted probability can be generated for classifying early stage and late stage ovarian cancer samples using the markers
  • glycopeptides of Table 3D were found to be associated with EOC.
  • glycopeptides that included fucose and also carrying tri- and tetra-antennary glycan structure were found to be more strongly associated with EOC.
  • Table 15 shows the accuracy, sensitivity, and specificity for the training data set and the testing data set.
  • the relative performance of Table 15 is better than Table 14 indicating that the subset of biomarkers using predominantly tri and tetra antennary glycans generated a better model for determining early vs late stage EOC.
  • a validation study was conducted using both retrospective patient samples and samples collected prospectively in the ongoing Clinical Validation of the InterVenn Ovarian CAncer Liquid Biopsy (VOCAL) study. Samples included those from patients with malignant EOC and patients with benign pelvic tumors. Samples were processed in a manner similar to the manner described for the Exemplary Retrospective Analysis in Section VII. A above.
  • a logistic regression model was built identifying a panel of 38 peptide structures (same as those in Table 3 above). This panel of 38 peptide structures had an overall predictive accuracy of over 86% for the prediction of malignancy versus benign status of pelvic tumors.
  • Table 10 below provides the fold changes and p-values for the 38 peptide structures also identified in Table 3 above based on differential expression analysis (DEA). These peptide structures are ordered both in Table 3 and in Table 10 with respect to relative significance to the probability score generated by the model based on p-values. In this context, more significant peptide structures have lower p-values, while less significant peptide structures have higher p-values. In other words, relative significance to the probability score decreased with increasing p-values. IX. Additional Considerations
  • Some embodiments of the present disclosure include a system including one or more data processors.
  • the system includes a non-transitory computer readable storage medium containing instructions which, when executed on the one or more data processors, cause the one or more data processors to perform part or all of one or more methods and/or part or all of one or more processes disclosed herein.
  • Some embodiments of the present disclosure include a computer-program product tangibly embodied in a non-transitory machine-readable storage medium, including instructions configured to cause one or more data processors to perform part or all of one or more methods and/or part or all of one or more processes disclosed herein.

Abstract

A method and system for diagnosing a subject with respect to an ovarian cancer disease state. Peptide structure data corresponding to a biological sample obtained from the subject is received. The peptide structure data is analyzed using a supervised machine learning model to generate a disease indicator that indicates whether biological sample evidences the ovarian cancer disease state based on at least 1 peptide structures selected from a group of peptide structures identified in Table 3B, 3C, or 3D. The group of peptide structures in Table 3B, 3C, or 3D comprises a group of peptide structures associated with the ovarian cancer disease state. A diagnosis output is generated based on the disease indicator.

Description

DIAGNOSIS OF OVARIAN CANCER USING TARGETED
QUANTIFICATION OF SITE-SPECIFIC PROTEIN GLYCOSYLATION
[0001] This application claims priority to U.S. Provisional Patent Application Serial No. 63/376,053, filed September 16, 2022, and claims priority to U.S. Provisional Patent Application Serial No. 63/489,712, filed March 10, 2023, and claims priority to U.S. Provisional Patent Application Serial No. 63/517,859, filed August 4, 2023, all of which applications are incorporated by reference herein in their entirety.
FIELD
[0002] Embodiments of the present disclosure generally relate to methods and systems for analyzing peptide structures for diagnosing and/or treating ovarian cancer. More particularly, embodiments of the present disclosure relate to analyzing quantification data for a set of peptide structures detected in a biological sample obtained from a subject for use in diagnosing and/or treating the subject, the set of peptide structures being associated with ovarian cancer.
BACKGROUND
[0003] Protein glycosylation and other post-translational modifications play vital roles in virtually all aspects of human physiology. Unsurprisingly, faulty or altered protein glycosylation often accompanies various disease states. The identification of aberrant glycosylation provides opportunities for early detection, intervention, and treatment of affected subjects. Current biomarker identification methods, such as those developed in the fields of proteomics and genomics, can be used to detect indicators of certain diseases, such as cancer, and to differentiate certain types of cancer from other, non-cancerous diseases. However, the use of glycoproteomic analyses has not previously been used to successfully identify disease processes.
[0004] Glycoprotein analysis is fraught with challenges on several levels. For example, a single glycan composition in a peptide can contain a large number of isomeric structures due to different glycosidic linkages, branching patterns, and/or multiple monosaccharides having the same mass. In addition, the presence of multiple glycans that share the same peptide backbone can lead to assay signals from various glycoforms, lowering their individual abundances compared to aglycosylated peptides. Accordingly, the development of algorithms that can identify glycan structures on peptide fragments remains elusive. [0005] In light of the above, there is a need for improved analytical methods that involve site-specific analysis of glycoproteins to obtain information about protein glycosylation patterns, which can in turn provide quantitative information that can be used to identify disease states. For example, there is a need to use such analysis to diagnose and/or treat ovarian cancer. [0006] Epithelial ovarian cancer (EOC) is currently the second-most common gynecologic malignancy, the leading cause of death from gynecological cancer, and the fourth-leading cause of cancer-related death in women in the United States. Although EOC can be treated effectively with surgery and adjuvant therapies, only about 15-20% of women are diagnosed at early-stage when 5-year survival is greater than 90%. Instead, the majority of EOC cases are diagnosed at late-stage (stage III or IV), with 5-year survival rates between about 15% and 40%. Diagnosing early-stage EOC is impeded by initial clinical signs and symptoms that are generally nonspecific and commonly missed such as, for example, pelvic pain, urinary urgency/frequency, abdominal bloating, early satiety, loss of appetite, and weight loss.
[0007] In addition to late diagnosis and consequent under-treatment of serious disease, benign disease is oftentimes unnecessarily over-treated due to the lack of diagnostic tools to determine the nature of pelvic masses. For example, while over 90% of women presenting with a pelvic mass may ultimately undergo surgery, only about 20% are found to have malignant disease.
[0008] Thus, an approach that is non-invasive, accurate, and reliable and that enables early diagnosis is needed. An approach enabling early diagnosis may help reduce negative health outcomes in patients with ovarian cancer, reduce the under-treatment of ovarian cancer, and/or reduce the over-treatment of benign disease. In addition, more strategic treatments can be provided with a diagnostic test that can assess whether a subject has early stage or late stage ovarian cancer. Thus, it may be desirable to have methods and systems capable of addressing one or more of the above-identified issues.
SUMMARY
[0009] In an embodiment, a method for diagnosing a subject with respect to an ovarian cancer disease state is described. The method includes receiving peptide structure data corresponding to a biological sample obtained from the subject. The peptide structure data can be analyzed using a supervised machine learning model to generate a disease indicator that indicates whether the biological sample evidences the ovarian cancer disease state of having early stage or late stage ovarian cancer based on at least one peptide structure selected from one of a group of peptide structures identified in Tables 3B, 3C, or 3D. A diagnosis output can be generated based on the disease indicator. The disease indicator can include a score.
[0010] The method of generating the diagnosis output can include determining that the score falls above a selected threshold and generating the diagnosis output based on the score falling above the selected threshold, wherein the diagnosis output includes a classification of late stage ovarian cancer disease state. The method of generating the diagnosis output can include determining that the score falls below a selected threshold and generating the diagnosis output based on the score falling below the selected threshold, wherein the diagnosis output includes a classification of early stage ovarian cancer disease state. The score may include a probability score and the selected threshold is 0.5. Alternatively, the selected threshold may fall within a range between 0.30 and 0.65. In an embodiment, the analyzing the peptide structure data can include analyzing the peptide structure data using a binary classification model. The peptide structure of the at least one peptide structure can include a glycopeptide structure defined by a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3D, with the peptide sequence being one of SEQ ID NOS: 126-175 in Table 3D as defined in Table 5.
[0011] In another embodiment, the method can include training the supervised machine learning model using training data, wherein the training data comprises a plurality of peptide structure profiles for a plurality of subjects and a plurality of subject diagnoses for the plurality of subjects, wherein the plurality of subject diagnoses includes a diagnosis for any subject of the plurality of subjects determined to have early stage or late stage ovarian cancer.
[0012] In another embodiment, the method can include performing a differential expression analysis using initial training data to compare a first portion of the plurality of subjects diagnosed with the classification of early stage ovarian cancer disease state versus a second portion of the plurality of subjects diagnosed with the classification of late stage ovarian cancer disease state; identifying a training group of peptide structures based on the differential expression analysis for use as prognostic markers for the ovarian cancer disease state; and forming the training data based on the training group of peptide structures identified. The training of the supervised machine learning model can include reducing the training group of peptide structures to a final group of peptide structures identified in Tables 3B, 3C, or 3D.
[0013] In an embodiment, each peptide structure profile of the plurality of peptide structure profiles can include a feature selected from one of a relative abundance and a concentration for a corresponding peptide structure. The plurality of peptide structure profiles can include a first peptide structure profile with a relative abundance for a corresponding peptide structure and a second peptide structure profile with a concentration for the corresponding peptide structure. The supervised machine learning model can include a logistic regression model.
[0014] In an embodiment, the first group of peptide structures in Tables 3B, 3C, or 3D is used to distinguish between the ovarian cancer disease state being late stage or early stage. The quantification data for a peptide structure of the set of peptide structures can include at least one of an abundance, a relative abundance, a normalized abundance, a relative quantity, an adjusted quantity, a normalized quantity, a relative concentration, an adjusted concentration, or a normalized concentration.
[0015] In an embodiment, the peptide structure data can be generated using multiple reaction monitoring mass spectrometry (MRM-MS), wherein the using of the MRM-MS includes ionizing one or more glycopeptides to form ionized glycopeptides; filtering the ionized glycopeptides with a mass filter to form filtered glycopeptides; fragmenting the filtered glycopeptides in a collision chamber into product ions; and detecting the product ions.
[0016] In an embodiment, the method can include preparing a sample of the biological sample using reduction, alkylation, and enzymatic digestion to form a prepared sample that includes a set of peptide structures.
[0017] In an embodiment, the method of classifying early and late stage ovarian cancer can be implemented after the subject has already been diagnosed as having ovarian cancer. The subject can be initially diagnosed for having ovarian cancer using one or more biomarkers in Tables 1, 2, or 3.
[0018] In an embodiment, the generating the diagnosis output can include generating a report identifying that the biological sample evidences the early stage or late stage ovarian cancer disease state.
[0019] In an embodiment, the generating a treatment output can be generated based on at least one of the diagnosis output or the disease indicator. The treatment output can include at least one of an identification of a treatment to treat the subject or a treatment plan. The treatment can include at least one of surgery, radiation therapy, a targeted drug therapy, chemotherapy, immunotherapy, hormone therapy, or neoadjuvant therapy. In some embodiments, the group of peptide structures in Tables 3B, 3C, or 3D is listed in order of relative significance to the disease indicator.
[0020] In an embodiment, the method can further include preparing a sample of the biological sample using reduction, alkylation, and enzymatic digestion to form a prepared sample that includes a set of peptide structures. The method can further include generating the peptide structure data from the prepared sample using multiple reaction monitoring mass spectrometry (MRM-MS).
[0021] In an embodiment, a method of training a model to diagnose a subject with respect to an ovarian cancer disease state having a malignant pelvic tumor is described. The method can include receiving quantification data for a panel of peptide structures for a plurality of samples for a plurality of subjects. The plurality of subjects includes a first portion diagnosed with a classification of early stage ovarian cancer disease state and a second portion diagnosed with a classification of late stage ovarian cancer disease state. The quantification data can include a plurality of peptide structure profiles for the plurality of subjects and training a machine learning model using the quantification data to diagnose a biological sample with respect to the ovarian cancer disease state using a group of peptide structures associated with the ovarian cancer disease state, wherein the group of peptide structures is identified in Tables 3B, 3C, or 3D. The machine learning model can include a logistic regression model.
[0022] The method of training the model can further include identifying an initial plurality of peptide structure profiles, filtering the initial plurality of peptide structure profiles by a coefficient of variation to generate a plurality of peptide structure profiles for use in training the machine learning model. The filtering can be performed to exclude peptide structure profiles having the coefficient of variation at or above 20%. The training of the machine learning model can include reducing the plurality of peptide structure profiles using LASSO regression to identify a final group of peptide structures identified in Tables 3B, 3C, or 3D. The quantification data for the panel of peptide structures for the plurality of subjects diagnosed with the plurality of ovarian cancer disease states can include at least one of an abundance, a relative abundance, a normalized abundance, a relative quantity, an adjusted quantity, a normalized quantity, a relative concentration, an adjusted concentration, or a normalized concentration. The trained model can use a relative abundance for a first portion of the first group of peptide structures and a concentration for a second portion of the second group of peptide structures. Each peptide structure profile of the plurality of peptide structure profiles includes a feature selected from one of a relative abundance and a concentration for a corresponding peptide structure. The plurality of peptide structure profiles can include a first peptide structure profile with a relative abundance for a corresponding peptide structure and a second peptide structure profile with a concentration for the corresponding peptide structure.
[0023] In an embodiment, a composition can include at least one of peptide structures identified in Tables 3B, 3C, or 3D. [0024] In an embodiment, a method for diagnosing a subject with respect to an ovarian cancer disease state is described. The method can include analyzing the peptide structure data using a supervised machine learning model to generate a disease indicator that indicates whether a biological sample evidences the ovarian cancer disease state of having early stage or late stage ovarian cancer based on a group of glycopeptide structures. The group of glycopeptide structures can include tri-antennary or tetra-antennary sialic acid moieties, wherein a portion of the glycopeptide structures of the group are fucosylated. A diagnosis is then outputted based on the disease indicator. The group of glycopeptide structures can include at least one, at least three, at least five, or at least 10 glycopeptide structure identified in Tables 3B, 3C, or 3D
[0025] In an embodiment, the peptide structure data was generated with a mass spectrometer using the biological sample obtained from the subject.
[0026] In an embodiment, the method can further include preparing a sample of the biological sample using reduction, alkylation, and enzymatic digestion to form a prepared sample that includes a set of peptide structures. The peptide structure data can be generated from the prepared sample using multiple reaction monitoring mass spectrometry (MRM-MS). The use of the MRM-MS can include ionizing one or more glycopeptides to form ionized glycopeptides; filtering the ionized glycopeptides with a mass filter to form filtered glycopeptides; fragmenting the filtered glycopeptides in a collision chamber into product ions; and detecting the product ions.
[0027] In one or more embodiments, a system comprising one or more data processors is described according to various embodiments. In various embodiments, the system comprises a non-transitory computer readable storage medium containing instructions which, when executed on the one or more data processors, cause the one or more data processors to perform part or all of any of the methods described herein.
[0028] In one or more embodiments, a computer-program product tangibly embodied in a non-transitory machine-readable storage medium, including instructions configured to cause one or more data processors to perform part or all of any one of the methods described according to various embodiments.
[0029] In one or more embodiments, a system is described according to various embodiments. In various embodiments, the system comprises one or more data processors and a non-transitory computer readable storage medium containing instructions which, when executed on the one or more data processors, cause the one or more data processors to perform part or all of any one or more of the methods described herein. [0030] In one or more embodiments, a computer-program product tangibly embodied in a non-transitory machine-readable storage medium, including instructions configured to cause one or more data processors to perform part or all of any one or more of the methods described herein.
[0031] In various embodiments, the peptide structure data is listed in Table 3D and the detected product ion comprises a first product having a m/z value listed in Table 4C.
[0032] In some embodiments, the at least one peptide structure comprises a peptide sequence and a glycan structure, wherein the glycan structure is attached to a linking site position in the peptide sequence in accordance with one of Tables 1, 2, 3, 3B, 3C, and 3D. In some embodiments, the glycan structure of the peptide sequence corresponds to a glycan structure GL number in accordance with Tables 1, 2, 3, 3B, 3C, and 3D, wherein the glycan structure comprises a symbol structure in accordance with the glycan structure GL number according to Tables 1, 2, 3, 3B, 3C, 3D, and 7. In some embodiments, the glycan structure of the peptide sequence corresponds to a glycan structure GL number in accordance with Tables 1, 2, 3, 3B, 3C, and 3D, wherein the glycan structure comprises a composition in accordance with the glycan structure GL number, Tables 1, 2, 3, 3B, 3C, 3D, and 7. In some embodiments, a rightmost N-acetylgalactosamine (open square) of the glycan structure in Table 7 is attached to a linking site position in the peptide sequence in accordance with Tables 3 and 5. In some embodiments, a bottommost N-acetylglucosamine (dark square) of the glycan structure in Table 7 is attached to a linking site position in the peptide sequence in accordance with Tables 1, 2, 3, 3B, 3C, 3D, and 5.
[0033] In some embodiments, provided herein is a composition comprising one or more peptide structures from Tables 1, 2, 3, 3B, 3C, and 3D. In some embodiments, the at least one peptide structure comprises a peptide sequence and a glycan structure, wherein the glycan structure is attached to a linking site position in the peptide sequence in accordance with Tables 1, 2, 3, 3B, 3C, and 3D. In some embodiments, the glycan structure of the peptide sequence corresponds to a glycan structure GL number in accordance with Tables 1, 2, 3, 3B, 3C, and 3D, wherein the glycan structure comprises a symbol structure in accordance with the glycan structure GL number according to Tables 1, 2, 3, 3B, 3C, 3D, and 7. In some embodiments, the glycan structure of the peptide sequence corresponds to a glycan structure GL number in accordance with Tables 1, 2, 3, 3B, 3C, and 3D, wherein the glycan structure comprises a composition in accordance with the glycan structure GL number, Tables 1, 2, 3, 3B, 3C, 3D, and 7. In some embodiments, a rightmost N-acetylgalactosamine (GalNAc) of the glycan structure in Table 7 is attached to a linking site position in the peptide sequence in accordance with Tables 3 and 5. In some embodiments, a bottommost N-acetylglucosamine (GlcNAc) of the glycan structure in Table 7 is attached to a linking site position in the peptide sequence in accordance with Tables 1, 2, 3, 3B, 3C, 3D, and 5.
[0034] In regard to the various embodiments, the peptide sequence can be one of SEQ ID NOS: 130-135, 137, 139, 140, 143, 148, 149, 155, 158-162, 166, and 171.
[0035] In regards to the various embodiments, the peptide structure of the at least one peptide structure comprises a glycopeptide structure defined by a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3D, with the peptide sequence being one of SEQ ID NOS: 130-135, 137, 139, 140, 143, 148, 149, 155, 158-162, 166, and 171 in Table 3D as defined in Table 5.
[0036] In regard to the various embodiments, the peptide structure of the at least one peptide structure comprises a glycopeptide structure defined by a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3D, with the peptide sequence being one of SEQ ID NOS: 130-135, 137, 139, 140, 143, 148, 149, 155, 159-162, 166, and 171 in Table 3D as defined in Table 5.
[0037] In regard to the various embodiments, the glycan structure, corresponding to the peptide sequence of SEQ ID NOS: 131, 137, 143, 155, 159, 162, 166, and 171 includes a fucose and the fucose is in an outer arm orientation.
[0038] In regard to the various embodiments, a peptide structure of the at least one peptide structure comprises a glycopeptide structure defined by a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3D, with the peptide sequence being one of SEQ ID NOS: 131, 137, 143, 155, 159, 162, 166, and 171 in Table 3D as defined in Table 5, wherein a fucose of the glycan structure comprises an outer arm orientation.
[0039] In regard to the various embodiments, the at least one peptide structure is selected from one of a group of peptide structures identified in Tables 3D.
[0040] In regard to the various embodiments, a peptide structure of the at least one peptide structure comprises a glycopeptide structure defined by a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3D, with the peptide sequence being one of SEQ ID NOS: 131-134, 137, 139, 140, 143, 151, 165-167 in Table 3D as defined in Table 5.
[0041] In regard to the various embodiments, the glycan structure, corresponding to the peptide sequence of SEQ ID NOS: 131, 137, and 143, includes a fucose and the fucose is in an outer arm orientation. [0042] In regard to the various embodiments, the outer arm orientation of the fucose comprises the fucose being linked to a N-acetylglucosamine by a a-(l-3/4) linkage.
[ 0043] In an embodiment, there is a method of treating ovarian cancer in an individual, the method comprising administering to the individual an ovarian cancer therapy, wherein the individual has been determined to be responsive to the ovarian cancer therapy via a trained machine learning classifier that distinguishes between responsive and non-responsive individuals who have received the ovarian cancer therapy, based at least in part on a group of peptide structures identified in Tables 3B, 3C, or 3D.
BRIEF DESCRIPTION OF THE DRAWINGS
[0044] The present disclosure is described in conjunction with the appended figures: [0045] Figure 1 is a schematic diagram of an exemplary workflow 100 for the detection of peptide structures associated with a disease state for use in diagnosis and/or treatment in accordance with one or more embodiments.
[0046] Figure 2A is a schematic diagram of a preparation workflow in accordance with one or more embodiments.
[0047] Figure 2B is a schematic diagram of data acquisition in accordance with one or more embodiments.
[0048] Figure 3 is a block diagram of an analysis system in accordance with one or more embodiments.
[0049] Figure 4 is a block diagram of a computer system in accordance with various embodiments.
[0050] Figure 5 is a flowchart of a process for diagnosing a subject with respect to an ovarian cancer disease state in accordance with one or more embodiments based on Tables 1 or 2.
[0051] Figure 6 is a flowchart of a process for diagnosing a subject with respect to ovarian cancer disease state in accordance with one or more embodiments based on Table 3.
(0052] Figure 6B is a flowchart of a process for diagnosing a subject with respect to ovarian cancer disease state in accordance with one or more embodiments based on Table 3B. [0053] Figure 7 is a flowchart of a process for training a model to diagnose a subject with respect to an ovarian cancer disease state in accordance with one or more embodiments.
[0054] Figure 8 is a table describing the distribution of the samples acquired in this exemplary retrospective analysis in accordance with one or more embodiments. [0055] Figure 9 is a plot diagram illustrating the results of a principal component analysis performed to assess the segregation between healthy, benign pelvic tumor, and EOC samples across first and second principal components in accordance with one or more embodiments.
[0056] Figure 10 is a plot diagram illustrating the results of a principal component analysis performed to assess segregation between healthy, benign pelvic tumor, early EOC, late EOC, and missing (undocumented) samples).
[0057] Figure 11 is an illustration of a receiver operating characteristic (ROC) diagram corresponding to the multivariable model built to predict malignancy v. benign status of pelvic tumors in accordance with one or more embodiments.
[0058] Figure 12 is an illustration of a diagram showing the probability distributions for the various groups using the multivariable model for predicting malignancy v. benign status of pelvic tumors in accordance with one or more embodiments.
[0059] Figure 13 is an illustration of a receiver operating characteristic (ROC) diagram corresponding to the multivariable model built to predict malignancy v. benign status of pelvic tumors in accordance with one or more embodiments.
[0060] Figure 14 is an illustration of a diagram showing the probability distributions for the various groups using the multivariable model for predicting malignancy v. benign status of pelvic tumors in accordance with one or more embodiments.
[0061 ] Figure 15A to 15E are a plurality of charts illustrating the upregulation of fucosylated biomarkers having tri or tetra-antennary sialic acids from stages 1/2 to 3/4 of ovarian cancer and the down regulation of non-fucosylated biomarkers having tri or tetra- antennary sialic acids from stages 1/2 to 3/4 of ovarian cancer.
[0062] Figure 16 is an illustration of a diagram showing the probability distributions for early stage v. late stage ovarian cancer using training data set and the testing data set in accordance with one or more embodiments using the biomarkers of Table 3C.
[0063] Figure 17 is an illustration of a receiver operating characteristic (ROC) diagram corresponding to the multivariable model built to predict early stage v. late stage ovarian cancer in accordance with one or more embodiments.
[0064] Figure 18 is a graph illustrating the fold changes for a plurality of tri- and tetra- antennary glycans glycopeptides that were either non-fucosylated or fucosylated.
[0065] Figure 19A is a graph illustrating the fold changes for pairs of tri- and tetra- antennary glycans glycopeptides that were either non-fucosylated or fucosylated.
[0066] Figure 19B is a graph illustrating the fold changes for triplets of tri- and tetra- antennary glycans glycopeptides that were either non-fucosylated, mono-fucosylated, or di- fucosylated. Both mono-fucosylated and di-fucosylated markers has median FC’s above 1 suggesting correlation of these markers with malignant EOC.
[ 0067] Figure 20 is an illustration of a diagram showing the probability distributions for early stage v. late stage ovarian cancer using training data set and the testing data set in accordance with one or more embodiments using the biomarkers of Table 3D
[0068] Figures 21A to 21E are graphs of the relative abundance of five distinct types of fucosylated glycopeptides in benign tumors, early stage EOC, and late stage EOC.
[0069] Figure 22 is a representative mass spectra showing breakdown fragments of 3 glycans and 4 glycan aggregates that indicate the presence of glycans with an outer arm fucosylated orientation.
DETAILED DESCRIPTION
I. Overview
[0070] The embodiments described herein recognize that glycoproteomics is an emerging field that can be used in the overall diagnosis and/or treatment of subjects with various types of diseases. Glycoproteomics aims to determine the positions, identities, and quantities of glycans and glycosylated proteins in a given sample (e.g., blood sample, cell, tissue, etc.). Protein glycosylation is one of the most common and most complex forms of post-translational protein modification, and can affect protein structure, conformation, and function. For example, glycoproteins may play crucial roles in important biological processes such as cell signaling, host-pathogen interactions, and immune response and disease. Glycoproteins may therefore be important to diagnosing different types of diseases.
[0071 ] Although protein glycosylation provides useful information about cancer and other diseases, analysis of protein glycosylation may be difficult as the glycan typically cannot be traced back to the protein site of origin with currently available methodologies. Glycoprotein analysis can be challenging in general due to several reasons. For example, a single glycan composition in a peptide may contain a large number of isomeric structures because of different glycosidic linkages, branching, and many monosaccharides having the same mass. Further, the presence of multiple glycans that share the same peptide sequence may cause the mass spectrometry (MS) signal to split into various glycoforms, lowering their individual abundances compared to the peptides that are not glycosylated (aglycosylated peptides).
(0072] But to understand various disease conditions and to diagnose certain diseases, such as ovarian cancer, more accurately, it may be important to perform analysis of glycoproteins and to identify not only the glycan but also the linking site (e.g., the amino acid residue of attachment) within the protein. Thus, there is a need to provide a method for site-specific glycoprotein analysis to obtain detailed information about protein glycosylation patterns which may be able to provide information about a disease state (e.g., an ovarian cancer disease state). This information can be used to distinguish the disease state from other states, diagnose a subject as having or not having the disease state, determine a likelihood that a subject has the disease state, determine whether a subject has one of early stage (stages 1 and 2) or late stage (stages 3 and 4) EOC, or a combination thereof. For example, such analysis may be useful in diagnosing an ovarian cancer disease state for a subject (e.g., a negative diagnosis for the ovarian cancer disease state or a positive diagnosis for the ovarian cancer disease state). Sample collection and analysis can be collected at different time points for comparing ovarian cancer disease states over time for a subject. For example, the negative diagnosis may include a healthy state or a benign tumor state (i.e., “benign” as seen throughout). An example of the positive diagnosis includes the subject suffering from a form of ovarian cancer (e.g., epithelial ovarian cancer (EOC)). A diagnosis can also assess a malignancy status of a previously identified pelvic (or adnexal) tumor (or mass).
[0073] Accordingly, the embodiments described herein provide various methods and systems for analyzing proteins in subjects and, in particular, glycoproteins. In one or more embodiments, a machine learning model is trained to analyze peptide structure data and generate a disease indicator that provides information relating to one or more diseases. For example, in various embodiments, the peptide structure data comprises quantification metrics (e.g., abundance or concentration data) for peptide structures. A peptide structure may be defined by an aglycosylated peptide sequence (e.g., a peptide or peptide fragment of a larger parent protein) or a glycosylated peptide sequence. A glycosylated peptide sequence (also referred to as a glycopeptide structure) may be a peptide sequence having a glycan structure that is attached to a linking site (e.g., an amino acid residue) of the peptide sequence, which may occur via, for example, a particular atom of the amino acid residue). Non-limiting examples of glycosylated peptides include N-linked glycopeptides and O-linked glycopeptides. [0074] The embodiments described herein recognize that the abundance of selected peptide structures in a biological sample obtained from a subject may be used to determine the likelihood of that subject evidencing an ovarian cancer disease state. An ovarian cancer disease state may include any condition that can be diagnosed as cancer that occurs in in the ovaries. Many malignant pelvic tumors are ovarian cancer. Certain peptide structures that are associated with an ovarian cancer disease state may be more relevant to that disease state than other peptide structures that are also associated with that disease state.
[0075] Analyzing the abundance of peptide sequences and glycosylated peptide sequences in a biological sample may provide a more accurate way in which to distinguish a positive ovarian cancer disease state (e.g., a state including the presence of ovarian cancer) from a negative ovarian cancer disease state (e.g., healthy state, a benign tumor state, an absence of ovarian cancer, etc.). This type of peptide structure analysis may be more conducive to generating accurate diagnoses as compared to glycoprotein analysis that focuses on analyzing glycoproteins that are too large to be resolved via mass spectrometry. Further, with glycoproteins, there may be too many potential proteoforms to consider. Still further, analysis of peptide structure data in the manner described by the various embodiments herein may be more conducive to generating accurate diagnoses as compared to glycomic analysis that provides little to no information about what proteins and to which amino acid residue sites various glycan structures attach.
[0076] In many instances, ovarian cancer treated with surgical resection will reoccur due to the metastasis. Thus, there is a need for tests that can diagnose metastatic ovarian cancer and monitor the progression of the disease (e.g., assessing the state of early vs late stage ovarian cancer). Such a test may be based on either ELISA or mass spectrometry.
[0077] For reference, in stage 1, the cancer is confined to the ovaries and hasn’t spread to the abdomen, pelvis or lymph nodes, nor to distant sites. In stage 2, the cancer has spread from one or both ovaries to other areas of the pelvis. However, the cancer hasn’t spread to nearby lymph nodes or distant sites. Stages 1 and 2 are considered early stage. In stage 3, the cancer has spread to nearby lymph nodes and/or other parts of the abdomen, but it hasn’t spread to distant sites. In stage 4, the cancer has spread beyond the abdomen. Stages 3 and 4 are considered late stage.
[0078] A particular type of glycopeptides having fucosylation was found through mass spectrometry measurements to be associated with metastatic ovarian cancer. In addition, this type of glycopeptide had tri- and tetra-antennary N-glycans on certain proteins. In an embodiment, various proteins such as AGP1, AGP2, APOC3, FETUA, HPT, CLUS, A2MG, TRFE, VTNC, IGJ, and CFAH can be captured on an ELISA plate from patient samples followed by a lectin based detection (four lectins: LCA, AAL, PHA-E, PHA-L).
[0079] Mass spectrometry can be used to analyze serum for various glycoproteins and/or glycopeptides to differentiate between benign and malignant adnexal masses. Through analyzing the clinical mass spectrometry data, a distinct signature was found with the circulating N-glycoproteins that allows a differentiation between late stage (metastatic disease of stage III/IV) and early stage (stage I/II) epithelial ovarian cancer (EOC). Using Qiagen’s Ingenuity Pathway Analysis package on this data, it was predicted that the signature markers are downstream of cytokine signaling. The markers also suggest the presence of the sialyl Lewis X (sLex) epitope on N-glycans of certain liver-derived circulatory glycoproteins. Given these findings suggesting the presence of sLex epitopes in circulation, it was investigated whether the outer-arm fucosylation was upregulated on the tumor itself. Bulk RNASeq data showed the outer-arm fucosyltransferases FUT3, FUT4, and FUT9 were found to be upregulated in late stage EOC. The core fucosyltransferase FUT8 on the other hand was unchanged between early and late stage EOC. A blood-based test would be useful for staging/treatment recommendations and to preempt recurrence and metastatic transformation of epithelial ovarian cancer.
[0080] Further, the methods, systems, and compositions provided by the embodiments described herein may enable an earlier and more accurate diagnosis of ovarian cancer in a subject as compared to currently available diagnostic modalities (e.g., imaging, biochemical tests) used for determining whether surgical intervention is indicated. For example, various currently available non-invasive tests to distinguish between benign and malignant pelvic tumors rely on detection of the biomarker cancer antigen 125 (CA125). But this biomarker is limited by poor sensitivity and specificity. In fact, serum CA125 is not elevated in over 20% of ovarian carcinomas and is elevated in a variety of other malignant and non-malignant conditions. While various other tests incorporate other protein biomarkers in addition to CAI 25, these other tests may perform less adequately than desired and may be more complex than desired. The embodiments described herein enable more reliable prediction of the malignant or benign nature of pelvic (or adnexal) tumors (or masses).
[0081] The description below provides exemplary implementations of the methods and systems described herein for the research, diagnosis, and/or treatment of an ovarian cancer disease state. Various examples implement the methods and systems described herein as a screening tool. Descriptions and examples of various terms, as used herein, are provided in Section II below.
IL Exemplary Descriptions of Terms
[0082] The term “ones” means more than one.
[0083] As used herein, the term “plurality” may be 2, 3, 4, 5, 6, 7, 8, 9, 10, or more. [0084] As used herein, the term “set of" means one or more. For example, a set of items includes one or more items.
[0085] As used herein, the phrase “at least one of,” when used with a list of items, means different combinations of one or more of the listed items may be used and only one of the items in the list may be needed. The item may be a particular object, thing, step, operation, process, or category. In other words, “at least one of’ means any combination of items or number of items may be used from the list, but not all of the items in the list may be required. For example, without limitation, “at least one of item A, item B, or item C” means item A; item A and item B; item B; item A, item B, and item C; item B and item C; or item A and C. In some cases, “at least one of item A, item B, or item C” means, but is not limited to, two of item A, one of item B, and ten of item C; four of item B and seven of item C; or some other suitable combination. [0086] As used herein, “substantially” means sufficient to work for the intended purpose. The term “substantially” thus allows for minor, insignificant variations from an absolute or perfect state, dimension, measurement, result, or the like such as would be expected by a person of ordinary skill in the field but that do not appreciably affect overall performance. When used with respect to numerical values or parameters or characteristics that can be expressed as numerical values, “substantially” means within ten percent.
[0087] The term “amino acid,” as used herein, generally refers to any organic compound that includes an amino group (e.g., -NH2), a carboxyl group (-COOH), and a side chain group (R) which varies based on a specific amino acid. Amino acids can be linked using peptide bonds.
[0088] The term “alkylation,” as used herein, generally refers to the transfer of an alkyl group from one molecule to another. In various embodiments, alkylation is used to react with reduced cysteines to prevent the re-formation of disulfide bonds after reduction has been performed.
[0089] The term “linking site” or “glycosylation site” as used herein generally refers to the location where a sugar molecule of a glycan or glycan structure is directly bound (e.g., covalently bound) to an amino acid of a peptide, a polypeptide, or a protein. For example, the linking site may be an amino acid residue and a glycan structure may be linked via an atom of the amino acid residue. Non-limiting examples of types of glycosylation can include N-linked glycosylation, O-linked glycosylation, C-linked glycosylation, S-linked glycosylation, and glycation.
[0090] The terms “biological sample,” “biological specimen,” or “biospecimen” as used herein, generally refers to a specimen taken by sampling so as to be representative of the source of the specimen, typically, from a subject. A biological sample can be representative of an organism as a whole, specific tissue, cell type, or category or sub-category of interest. Biological samples may include, but are not limited to synovial fluid, whole blood, blood serum, blood plasma, urine, sputum, tissue, saliva, tears, spinal fluid, tissue section(s) obtained by biopsy; cell(s) that are placed in or adapted to tissue culture; sweat, mucous, fecal material, gastric fluid, abdominal fluid, amniotic fluid, cyst fluid, peritoneal fluid, pancreatic juice, breast milk, lung lavage, marrow, gastric acid, bile, semen, pus, aqueous humor, transudate, and the like including derivatives, portions and combinations of the foregoing. In some examples, biological samples include, but are not limited, to blood and/or plasma. In some examples, biological samples include, but are not limited, to urine or stool. Biological samples include, but are not limited, to saliva. Biological samples include, but are not limited, to tissue dissections and tissue biopsies. Biological samples include, but are not limited, any derivative or fraction of the aforementioned biological samples. The biological sample can include a macromolecule. The biological sample can include a small molecule. The biological sample can include a virus. The biological sample can include a cell or derivative of a cell. The biological sample can include an organelle. The biological sample can include a cell nucleus. The biological sample can include a rare cell from a population of cells. The biological sample can include any type of cell, including without limitation prokaryotic cells, eukaryotic cells, bacterial, fungal, plant, mammalian, or other animal cell type, mycoplasmas, normal tissue cells, tumor cells, or any other cell type, whether derived from single cell or multicellular organisms. The biological sample can include a constituent of a cell. The biological sample can include nucleotides (e.g., ssDNA, dsDNA, RNA), organelles, amino acids, peptides, proteins, carbohydrates, glycoproteins, or any combination thereof. The biological sample can include a matrix (e.g., a gel or polymer matrix) comprising a cell or one or more constituents from a cell (e.g., cell bead), such as DNA, RNA, organelles, proteins, or any combination thereof, from the cell. The biological sample may be obtained from a tissue of a subject. The biological sample can include a hardened cell. Such hardened cells may or may not include a cell wall or cell membrane. The biological sample can include one or more constituents of a cell but may not include other constituents of the cell. An example of such constituents may include a nucleus or an organelle. The biological sample may include a live cell. The live cell can be capable of being cultured.
[0091] The term “biomarker,” as used herein, generally refers to any measurable substance taken as a sample from a subject whose presence is indicative of some phenomenon. Non- limiting examples of such phenomenon can include a disease state, a condition, or exposure to a compound or environmental condition. In various embodiments described herein, biomarkers may be used for diagnostic purposes (e.g., to diagnose a health state, a disease state). The term “biomarker” can be used interchangeably with the term “marker.”
[0092] The term “denaturation,” as used herein, generally refers to any molecule that loses quaternary structure, tertiary structure, and secondary structure which is present in their native state. Non-limiting examples include proteins or nucleic acids being exposed to an external compound or environmental condition such as acid, base, temperature, pressure, radiation, etc. [0093] The term “denatured protein,” as used herein, generally refers to a protein that loses quaternary structure, tertiary structure, and secondary structure which is present in their native state.
[0094] The terms “digestion” or “enzymatic digestion,” as used herein, generally refers to a biological process that employs enzymes to break specific amino acid peptide bonds. For example, digesting a peptide includes contacting the peptide with an digesting enzyme, e.g., trypsin to produce fragments of the glycopeptide. In some examples, a protease enzyme is used to digest a glycopeptide. The term “protease” refers to an enzyme that performs proteolysis or breakdown of large peptides into smaller polypeptides or individual amino acids. Examples of a protease include, but are not limited to, one or more of a serine protease, threonine protease, cysteine protease, aspartate protease, glutamic acid protease, metalloprotease, asparagine peptide lyase, and any combinations of the foregoing. Enzymatic digestion may be used in preparation for mass spectrometry using trypsin digestion protocols. Proteins may be digested using other proteases in preparation for mass spectrometry if access is limited to cleavage sites.
[0095] The term “disease state” as used herein, generally refers to a condition that affects the structure or function of an organism. Non-limiting examples of causes of disease states may include pathogens, immune system dysfunctions, cell damage caused by aging, cell damage caused by other factors (e.g., trauma and cancer). Disease states can include any state of a disease whether symptomatic or asymptomatic. Disease states can include disease stages of a disease progression. Disease states can cause minor, moderate, or severe disruptions in structure or function of an organism (e.g., a subject).
[0096] The term “fragment,” as used herein, generally refers to an ion fragmentation process which occurs in a MRM-MS instrument. Fragmenting may produce various fragments having the same mass but varying with respect to their charge, e.g., some biomarkers described herein produce more than one product m/z. [0097) The terms “glycan” or “polysaccharide” as used herein, both generally refer to a carbohydrate residue of a glycoconjugate, such as the carbohydrate portion of a glycopeptide, glycoprotein, glycolipid, or proteoglycan. Glycans can include monosaccharides.
[0098] The term “glycopeptide” or “glycopolypeptide” as used herein, generally refers to a peptide or polypeptide comprising at least one glycan residue. In various embodiments, glycopeptides comprise carbohydrate moieties (e.g., one or more glycans) covalently attached to a side chain (i.e. R group) of an amino acid residue.
[0099] The term “glycopeptide fragment” or “glycosylated peptide fragment” or “glycopeptide” as used herein, generally refers to a glycosylated peptide (or glycopeptide) having an amino acid sequence that is the same as part (but not all) of the amino acid sequence of the glycosylated protein from which the glycosylated peptide is obtained, e.g., ion fragmentation within a MRM-MS instrument. MRM refers to multiple-reaction-monitoring. Unless specified otherwise, within the specification, “glycopeptide fragments” or “fragments of a glycopeptide” refer to the fragments produced directly by using a mass spectrometer optionally after the glycoprotein has been digested enzymatically to produce the glycopeptides. [0100] The term “glycoprotein,” as used herein, generally refers to a protein having at least one glycan residue bonded thereto. In some examples, a glycoprotein is a protein with at least one oligosaccharide chain covalently bonded thereto. Examples of glycoproteins include but are not limited to the peptide structures including glycan molecules shown in the various Tables presented herein. A glycopeptide, as used herein, refers to a fragment of a glycoprotein, unless specified otherwise to the contrary.
[0101 ] The term “liquid chromatography,” as used herein, generally refers to a technique used to separate a sample into parts. Liquid chromatography can be used to separate, identify, and quantify components.
[0102] The term “mass spectrometry,” as used herein, generally refers to an analytical technique used to identify molecules. In various embodiments described herein, mass spectrometry can be involved in characterization and sequencing of proteins.
[0103] The term “m/z” or “mass-to-charge ratio,” as used herein, generally refers to an output value from a mass spectrometry instrument. In various embodiments, m/z can represent a relationship between the mass of a given ion and the number of elementary charges that it carries. The “m” in m/z stands for mass and the “z” stands for charge. In some embodiments, m/z can be displayed on an x-axis of a mass spectrum.
[01 4] The term “patient,” as used herein, generally refers to a mammalian subject. The mammal can be a human, or an animal including, but not limited to an equine, porcine, canine, feline, ungulate, and primate animal. In one embodiment, the individual is a human. The methods and uses described herein are useful for both medical and veterinary uses. A “patient” is a human subject unless specified to the contrary.
[0105] The term “peptide,” as used herein, generally refers to amino acids linked by peptide bonds. Peptides can include amino acid chains between 10 and 50 residues. Peptides can include amino acid chains shorter than 10 residues, including, oligopeptides, dipeptides, tripeptides, and tetrapeptides. Peptides can include chains longer than 50 residues and may be referred to as “polypeptides” or “proteins.” As used herein, the phrase “peptide,” is meant to include glycopeptides unless stated otherwise.
[0106| The terms “protein” or “polypeptide” or “peptide” may be used interchangeably herein and generally refer to a molecule including at least three amino acid residues. Proteins can include polymer chains made of amino acid sequences linked together by peptide bonds. Proteins may be digested in preparation for mass spectrometry using trypsin digestion protocols. Proteins may be digested using other proteases in preparation for mass spectrometry if access is limited to cleavage sites.
[0107] The term “peptide structure,” as used herein, generally refers to peptides or a portion thereof or glycopeptides or a portion thereof. In various embodiments described herein, a peptide structure can include any molecule comprising at least two amino acids in sequence.
[0108] The term “reduction,” as used herein, generally refers to the gain of an electron by a substance. In various embodiments described herein, a sugar can directly bind to a protein, thereby, reducing the amino acid to which it binds. Such reducing reactions can occur in glycosylation. In various embodiments, reduction may be used to break disulfide bonds between two cysteines.
[0109) The term “sample,” as used herein, generally refers to a sample from a subject of interest and may include a biological sample of a subject. The sample may include a cell sample. The sample may include a cell line or cell culture sample. The sample can include one or more cells. The sample can include one or more microbes. The sample may include a nucleic acid sample or protein sample. The sample may also include a carbohydrate sample or a lipid sample. The sample may be derived from another sample. The sample may include a tissue sample, such as a biopsy, core biopsy, needle aspirate, or fine needle aspirate. The sample may include a fluid sample, such as a blood sample, urine sample, or saliva sample. The sample may include a skin sample. The sample may include a cheek swab. The sample may include a plasma or serum sample. The sample may include a cell-free or cell free sample. A cell-free sample may include extracellular polynucleotides. The sample may originate from blood, plasma, serum, urine, saliva, mucosal excretions, sputum, stool, or tears. The sample may originate from red blood cells or white blood cells. The sample may originate from feces, spinal fluid, CNS fluid, gastric fluid, amniotic fluid, cyst fluid, peritoneal fluid, marrow, bile, other body fluids, tissue obtained from a biopsy, skin, or hair.
[0110] The term “sequence,” as used herein, generally refers to a biological sequence including one-dimensional monomers that can be assembled to generate a polymer. Non- limiting examples of sequences include nucleotide sequences (e.g., ssDNA, dsDNA, and RNA), amino acid sequences (e.g., proteins, peptides, and polypeptides), and carbohydrates (e.g., compounds including Cm (H2O)„).
[0111] The term “subj ect,” as used herein, generally refers to an animal, such as a mammal (e.g., human) or avian (e.g., bird), or other organism, such as a plant. For example, the subject can include a vertebrate, a mammal, a rodent (e.g., a mouse), a primate, a simian or a human. Animals may include, but are not limited to, farm animals, sport animals, and pets. A subject can include a healthy or asymptomatic individual, an individual that has or is suspected of having a disease (e.g., cancer) or a pre-disposition to the disease, and/or an individual that is in need of therapy or suspected of needing therapy. A subject can be a patient. A subject can include a microorganism or microbe (e.g., bacteria, fungi, archaea, viruses). However, in the context of diagnosing ovarian cancer, the subject is female unless explicitly specified otherwise. A subject may be one who has been previously identified as having a disease or a condition, and optionally has already undergone, or is undergoing, a therapeutic intervention for the disease or condition. Alternatively, a subject can also be one who has not been previously diagnosed as having a disease or a condition. For example, a subject can be one who exhibits one or more risk factors for a disease or a condition, or a subject who does not exhibit disease risk factors, or a subject who is asymptomatic for a disease or a condition. A subject can also be one who is suffering from or at risk of developing a disease or a condition.
[0112] The term “training data,” as used herein generally refers to data that can be input into models, statistical models, algorithms and any system or process able to use existing data to make predictions.
[0113] As used herein, a “model” may include one or more algorithms, one or more mathematical techniques, one or more machine learning algorithms, or a combination thereof. [0114] As used herein, “machine learning” may be the practice of using algorithms to parse data, learn from it, and then make a determination or prediction about something in the world. Machine learning uses algorithms that can learn from data without relying on rules-based programming. A machine learning algorithm may include a parametric model, a nonparametric model, a deep learning model, a neural network, a linear discriminant analysis model, a quadratic discriminant analysis model, a support vector machine, a random forest algorithm, a nearest neighbor algorithm, a combined discriminant analysis model, a k-means clustering algorithm, a supervised model, an unsupervised model, logistic regression model, a multivariable regression model, a penalized multivariable regression model, or another type of model.
[0115] As used herein, an “artificial neural network” or “neural network” (NN) may refer to mathematical algorithms or computational models that mimic an interconnected group of artificial nodes or neurons that processes information based on a connectionistic approach to computation. Neural networks, which may also be referred to as neural nets, can employ one or more layers of nonlinear units to predict an output for a received input. Some neural networks include one or more hidden layers in addition to an output layer. The output of each hidden layer is used as input to the next layer in the network, i.e., the next hidden layer or the output layer. Each layer of the network generates an output from a received input in accordance with current values of a respective set of parameters. In the various embodiments, a reference to a “neural network” may be a reference to one or more neural networks.
[0116] A neural network may process information in two ways: when it is being trained it is in training mode and when it puts what it has learned into practice it is in inference (or prediction) mode. Neural networks learn through a feedback process (e.g., backpropagation) which allows the network to adjust the weight factors (modifying its behavior) of the individual nodes in the intermediate hidden layers so that the output matches the outputs of the training data. In other words, a neural network learns by being fed training data (learning examples) and eventually learns how to reach the correct output, even when it is presented with a new range or set of inputs. A neural network may include, for example, without limitation, at least one of a Feedforward Neural Network (FNN), a Recurrent Neural Network (RNN), a Modular Neural Network (MNN), a Convolutional Neural Network (CNN), a Residual Neural Network (ResNet), an Ordinary Differential Equations Neural Networks (neural-ODE), or another type of neural network.
[0117] As used herein, a “target glycopeptide analyte,” may refer to a peptide structure (e.g., glycosylated or aglycosylated/non-glycosylated), a fraction of a peptide structure, a sub- structure (e.g., a glycan or a glycosylation site) of a peptide structure, a product of one or more of the above listed structures and sub-structures, associated detection molecules (e.g., signal molecule, label, or tag), or an amino acid sequence that can be measured by mass spectrometry. [0118] As used herein, a “peptide data set,” may be used interchangeably with “peptide structure data” and can refer to any data of or relating to a peptide from a resulting mass spectrometry run. A peptide data set can comprise data obtained from a sample or biological sample using mass spectrometry. A peptide dataset can comprise data relating to an external standard, data relating to an internal standard, and data relating to a target glycopeptide analyte of a sample. A peptide data set can result from analysis originating from a single run. In some embodiments, the peptide data set can include raw abundance and mass to charge ratios for one or more peptides.
[0119] As used herein, a “a transition,” may refer to or identify a peptide structure. In some embodiments, a transition can refer to the specific pair of m/z values associated with a precursor ion and a product or fragment ion.
[0120] As used herein, a “non-glycosylated endogenous peptide” (“NGEP”) may refer to a peptide structure that does not comprise a glycan molecule. In various embodiments, an NGEP and a target glycopeptide analyte can originate from the same subject. In various embodiments, an NGEP and a target glycopeptide analyte may be derived from the same protein sequence. In some embodiments, the NGEP and the target glycopeptide analyte may be derived from or include the same peptide sequence. In various embodiments, an NGEP can be labeled with an isotope in preparation for mass spectrometry analysis.
[0121] As used herein, “abundance,” may refer to a quantitative value generated using mass spectrometry. In various embodiments, the quantitative value may relate to the amount of a particular peptide structure. In some embodiments, the quantitative value may comprise an amount of an ion produced using mass spectrometry. In some embodiments, the quantitative value may be expressed as an m/z value. In other embodiments, the quantitative value may be expressed in atomic mass units.
[0122] As used herein, “relative abundance,” may refer to a comparison of two or more abundances. In various embodiments, the comparison may comprise comparing one peptide structure to a total number of peptide structures. In some embodiments, the comparison may comprise comparing one peptide glycoform (e.g., two identical peptides differing by one or more glycans) to a set of peptide glycoforms. In some embodiments, the comparison may comprise comparing a number of ions having a particular m/z ratio by a total number of ions detected. In various embodiments, a relative abundance can be expressed as a ratio. In other embodiments, a relative abundance can be expressed as a percentage. Relative abundance can be presented on a y-axis of a mass spectrum plot. [0123] As used herein, an “internal standard,” may refer to something that can be contained (e.g., spiked-in) in the same sample as a target glycopeptide analyte undergoing mass spectrometry analysis. Internal standards can be used for calibration purposes. Additionally, internal standards can be used in the systems and method described herein. In some aspects, an internal standard can be selected based on similarity m/z and or retention times and can be a “surrogate” if a specific standard is too costly or unavailable. Internal standards can be heavy labeled or non-heavy labeled.
III. Overview of Exemplary Workflow
[0124] Figure 1 is a schematic diagram of an exemplary workflow 100 for the detection of peptide structures associated with a disease state for use in diagnosis and/or treatment in accordance with one or more embodiments. Workflow 100 may include various operations including, for example, sample collection 102, sample intake 104, sample preparation and processing 106, data analysis 108, and output generation 110.
[0125] Sample collection 102 may include, for example, obtaining a biological sample 112 of one or more subjects, such as subject 114. Biological sample 112 may take the form of a specimen obtained via one or more sampling methods. Biological sample 112 may be representative of subject 114 as a whole or of a specific tissue, cell type, or other category or sub-category of interest. Biological sample 112 may be obtained in any of a number of different ways. In various embodiments, biological sample 112 includes whole blood sample 116 obtained via a blood draw. In other embodiments, biological sample 112 includes set of aliquoted samples 118 that includes, for example, a serum sample, a plasma sample, a blood cell (e.g., white blood cell (WBC), red blood cell (RBC) sample, another type of sample, or a combination thereof. Biological samples 112 may include nucleotides (e.g., ssDNA, dsDNA, RNA), organelles, amino acids, peptides, proteins, carbohydrates, glycoproteins, or any combination thereof.
[0126] In various embodiments, a single run can analyze a sample (e.g., the sample including a peptide analyte), an external standard (e.g., an NGEP of a serum sample), and an internal standard. As such, abundance or raw abundance for the external standard, the internal standard, and target glycopeptide analyte can be determined by mass spectrometry in the same run.
[0127] In various embodiments, external standards may be analyzed prior to analyzing samples. In various embodiments, the external standards can be run independently between the samples. In some embodiments, external standards can be analyzed after every 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more experiments. In various embodiments, external standard data can be used in some or all of the normalization systems and methods described herein. In additional embodiments, blank samples may be processed to prevent column fouling.
[0128] Sample intake 104 may include one or more various operations such as, for example, aliquoting, registering, processing, storing, thawing, and/or other types of operations. In one or more embodiments, when biological sample 112 includes whole blood sample 116, sample intake 104 includes aliquoting whole blood sample 116 to form a set of aliquoted samples that can then be sub-aliquoted to form set of samples 120.
[0129] Sample preparation and processing 106 may include, for example, one or more operations to form set of peptide structures 122. In various embodiments, set of peptide structures 122 may include various fragments of unfolded proteins that have undergone digestion and may be ready for analysis.
[0130] Further, sample preparation and processing 106 may include, for example, data acquisition 124 based on set of peptide structures 122. For example, data acquisition 124 may include use of, for example, but is not limited to, a liquid chromatography/mass spectrometry (LC/MS) system.
[0131] Data analysis 108 may include, for example, peptide structure analysis 126. In some embodiments, data analysis 108 also includes output generation 110. In other embodiments, output generation 110 may be considered a separate operation from data analysis 108. Output generation 110 may include, for example, generating final output 128 based on the results of peptide structure analysis 126. Final output 128 may be used for determining research, diagnosis, and/or treatment.
[0132] In various embodiments, final output 128 is comprised of one or more outputs. Final output 128 may take various forms. For example, final output 128 may be a report that includes, for example, a diagnosis output, a treatment output (e.g., a treatment design output, a treatment plan output, or combination thereof), analyzed data (e.g., relativized and normalized) or combination thereof. In some embodiments, report can comprise a target glycopeptide analyte concentration as a function of the NGEP concentration value and the normalized abundance. In some embodiments, final output 128 may be an alert (e.g., a visual alert, an audible alert, etc.), a notification (e.g., a visual notification, an audible notification, an email notification, etc.), an email output, or a combination thereof. In some embodiments, final output 128 may be sent to remote system 130 for processing. Remote system 130 may include, for example, a computer system, a server, a processor, a cloud computing platform, cloud storage, a laptop, a tablet, a smartphone, some other type of mobile computing device, or a combination thereof.
[0133] In other embodiments, workflow 100 may optionally exclude one or more of the operations described herein and/or may optionally include one or more other steps or operations other than those described herein (e.g., in addition to and/or instead of those described herein). Accordingly, workflow 100 may be implemented in any of a number of different ways for use in the research, diagnosis, and/or treatment of a disease state.
IV. Detection and Quantification of Peptide Structures
[0134] Figures 2A and 2B are schematic diagrams of a workflow for sample preparation and processing 106 in accordance with one or more embodiments. Figures 2A and 2B are described with continuing reference to Figure 1. Sample preparation and processing 106 may include, for example, preparation workflow 200 shown in Figure 2A and data acquisition 124 shown in Figure 2B.
IV. A. Sample Preparation and Processing
[0135] Figure 2A is a schematic diagram of preparation workflow 200 in accordance with one or more embodiments. Preparation workflow 200 may be used to prepare a sample, such as a sample of set of samples 120 in Figure 1, for analysis via data acquisition 124. For example, this analysis may be performed via mass spectrometry (e.g., LC-MS). In various embodiments, preparation workflow 200 may include denaturation and reduction 202, alkylation 204, and digestion 206. All areas of the preparation workflow can cause inconsistency between different samples and different experiments, necessitating, the improved normalization systems and methods described herein and throughout.
[0136] In general, polymers, such as proteins, in their native form, can fold to include secondary, tertiary, and/or other higher order structures. Such higher order structures may functionalize proteins to complete tasks (e.g., enable enzymatic activity) in a subject. Further, such higher order structures of polymers may be maintained via various interactions between side chains of amino acids within the polymers. Such interactions can include ionic bonding, hydrophobic interactions, hydrogen bonding, and disulfide linkages between cysteine residues. However, when using analytic systems and methods, including mass spectrometry, unfolding such polymers (e.g., peptide/protein molecules) may be desired to obtain sequence information. In some embodiments, unfolding a polymer may include denaturing the polymer, which may include, for example, linearizing the polymer. [0137] In one or more embodiments, denaturation and reduction 202 can be used to disrupt higher order structures (e.g., secondary, tertiary, quaternary, etc.) of one or more proteins (e.g., polypeptides and peptides) in a sample (e.g., one of set of samples 120 in Figure 1). Denaturation and reduction 202 includes, for example, a denaturation procedure and a reduction procedure. In some embodiments, the denaturation procedure may be performed using, for example, thermal denaturation, where heat is used as a denaturing agent. The thermal denaturation can disrupt ionic bonding, hydrophobic interactions, and/or hydrogen bonding.
[0138] In various embodiments, the denaturation procedure may include using one or more denaturing agents. In one or more embodiments, the denaturation procedure may include using temperature. In one or more embodiments, the denaturation procedure may include using one or more denaturing agents in combination with heat. These one or more denaturing agents may include, for example, but are not limited to, any number of chaotropic salts (e.g., urea, guanidine), surfactants (e.g., sodium dodecyl sulfate (SDS), beta octyl glucoside, Triton X- 100), or combination thereof. In some cases, such denaturing agents may be used in combination with heat when sample preparation workflow further includes a cleanup procedure.
[0139] The resulting one or more denatured (e.g., unfolded, linearized) proteins may then undergo further processing in preparation of analysis. For example, a reduction procedure may be performed in which one or more reducing agents are applied. In various embodiments, a reducing agent can produce an alkaline pH. A reducing agent may take the form of, for example, without limitation, dithiothreitol (DTT), tris(2-carboxyethyl)phosphine (TCEP), or some other reducing agent. The reducing agent may reduce (e.g., cleave) the disulfide linkages between cysteine residues of the one or more denatured proteins to form one or more reduced proteins.
[0140] In various embodiments, the one or more reduced proteins resulting from denaturation and reduction 202 may undergo a process to prevent the reformation of disulfide linkages between, for example, the cysteine residues of the one or more reduced proteins. This process may be implemented using alkylation 204 to form one or more alkylated proteins. For example, alkylation 204 may be used to add an acetamide group to a sulfur on each cysteine residue to prevent disulfide linkages from reforming. In various embodiments, an acetamide group can be added by reacting one or more alkylating agents with a reduced protein. The one or more alkylating agents may include, for example, one or more acetamide salts. An alkylating agent may take the form of, for example, iodoacetamide (IAA), 2-chloroacetamide, some other type of acetamide salt, or some other type of alkylating agent. [0141] In some embodiments, alkylation 204 may include a quenching procedure. The quenching procedure may be performed using one or more reducing agents (e.g., one or more of the reducing agents described above).
[0142] In various embodiments, the one or more alkylated proteins formed via alkylation 204 can then undergo digestion 206 in preparation for analysis (e.g., mass spectrometry analysis). Digestion 206 of a protein may include cleaving the protein at or around one or more cleavage sites (e.g., site 205 which may be one or more amino acid residues). For example, without limitation, an alkylated protein may be cleaved at the carboxyl side of the lysine or arginine residues. This type of cleavage may break the protein into various segments, which include one or more peptide structures (e.g., glycosylated or aglycosylated).
[0143] In various embodiments, digestion 206 is performed using one or more proteolysis catalysts. For example, an enzyme can be used in digestion 206. In some embodiments, the enzyme takes the form of trypsin. In other embodiments, one or more other types of enzymes (e.g., proteases) may be used in addition to or in place of trypsin. These one or more other enzymes include, but are not limited to, LysC, LysN, AspN, GluC, and ArgC. In some embodiments, digestion 206 may be performed using tosyl phenylalanyl chloromethyl ketone (TPCK)-treated trypsin, one or more engineered forms of trypsin, one or more other formulations of trypsin, or a combination thereof. In some embodiments, digestion 206 may be performed in multiple steps, with each involving the use of one or more digestion agents. For example, a secondary digestion, tertiary digestion, etc. may be performed. In one or more embodiments, trypsin is used to digest serum samples. In one or more embodiments, trypsin/LysC cocktails are used to digest plasma samples.
[0144] In some embodiments, digestion 206 further includes a quenching procedure. The quenching procedure may be performed by acidifying the sample (e.g., to a pH <3). In some embodiments, formic acid may be used to perform this acidification.
[0145] In various embodiments, preparation workflow 200 further includes post-digestion procedure 207. Post-digestion procedure 207 may include, for example, a cleanup procedure. The cleanup procedure may include, for example, the removal of unwanted components in the sample that results from digestion 206. For example, unwanted components may include, but are not limited to, inorganic ions, surfactants, etc. In some embodiments, post-digestion procedure 207 further includes a procedure for the addition of heavy-labeled peptide internal standards.
[0146] Although preparation workflow 200 has been described with respect to a sample created or taken from biological sample 112 that is blood-based (e.g., a whole blood sample, a plasma sample, a serum sample, etc.), sample preparation workflow 200 may be similarly implemented for other types of samples (e.g., tears, urine, tissue, interstitial fluids, sputum, etc.) to produce set of peptides structures 122.
IV.B. Peptide Structure Identification and Quantitation
[0147] Figure 2B is a schematic diagram of data acquisition 124 in accordance with one or more embodiments. In various embodiments, data acquisition 124 can commence following sample preparation 200 described in Figure 2A. In various embodiments, data acquisition 124 can comprise quantification 208, quality control 210, and peak integration and normalization 212.
[0148] In various embodiments, targeted quantification 208 of peptides and glycopeptides can incorporate use of liquid chromatography-mass spectrometry LC/MS instrumentation. For example, LC-MS/MS, or tandem MS may be used. In general, LC/MS (e.g., LC-MS/MS) can combine the physical separation capabilities of liquid chromatograph (LC) with the mass analysis capabilities of mass spectrometry (MS). According to some embodiments described herein, this technique allows for the separation of digested peptides to be fed from the LC column into the MS ion source through an interface.
[0149] In various embodiments, any LC/MS device can be incorporated into the workflow described herein. In various embodiments, an instrument or instrument system suited for identification and targeted quantification 208 may include, for example, a Triple Quadrupole LC/MS™. In various embodiments, targeted quantification 208 is performed using multiple reaction monitoring mass spectrometry (MRM-MS).
[0150] In various embodiments described herein, identification of a particular protein or peptide and an associated quantity can be assessed. In various embodiments described herein, identification of a particular glycan and an associated quantity can be assessed. In various embodiments described herein, particular glycans can be matched to a glycosylation site on a protein or peptide and the abundances measured.
[0151 ] In some cases, targeted quantification 208 includes using a specific collision energy associated for the appropriate fragmentation to consistently see an abundant product ion. Glycopeptide structures may have a lower collision energy than aglycosylated peptide structures. When analyzing a sample that includes glycopeptide structures, the source voltage and gas temperature may be lowered as compared to generic proteomic analysis.
[0152] In various embodiments, quality control 210 procedures can be put in place to optimize data quality. In various embodiments, measures can be put in place allowing only errors within acceptable ranges outside of an expected value. In various embodiments, employing statistical models (e.g., using Westgard rules) can assist in quality control 210. For example, quality control 210 may include, for example, assessing the retention time and abundance of representative peptide structures (e.g., glycosylated and/or aglycosylated) and spiked-in internal standards, in either every sample, or in each quality control sample (e.g., pooled serum digest).
[0153] Peak integration and normalization 212 may be performed to process the data that has been generated and transform the data into a format for analysis. For example, peak integration and normalization 212 may include converting abundance data for various product ions that were detected for a selected peptide structure into a single quantification metric (e.g., a relative quantity, an adjusted quantity, a normalized quantity, a relative concentration, an adjusted concentration, a normalized concentration, etc.) for that peptide structure. In some embodiments, peak integration and normalization 212 may be performed using one or more of the techniques described in U.S. Patent Publication No. 2020/0372973A1 and/or US Patent Publication No. 2020/0240996A1, the disclosures of which are incorporated by reference herein in their entireties.
V. Peptide Structure Data Analysis
V. A. Exemplary System for Peptide Structure Data Analysis
V.A.l. Analysis System for Peptide Structure Data Analysis
[0154] Figure 3 is a block diagram of an analysis system 300 in accordance with one or more embodiments. Analysis system 300 can be used to both detect and analyze various peptide structures that have been associated to various disease states. Analysis system 300 is one example of an implementation for a system that may be used to perform data analysis 108 in Figure 1. Thus, analysis system 300 is described with continuing reference to workflow 100 as described in Figures 1, 2 A, and/or 2B.
[0155] Analysis system 300 may include computing platform 302 and data store 304. In some embodiments, analysis system 300 also includes display system 306. Computing platform 302 may take various forms. In one or more embodiments, computing platform 302 includes a single computer (or computer system) or multiple computers in communication with each other. In other examples, computing platform 302 takes the form of a cloud computing platform.
[0156] Data store 304 and display system 306 may each be in communication with computing platform 302. In some examples, data store 304, display system 306, or both may be considered part of or otherwise integrated with computing platform 302. Thus, in some examples, computing platform 302, data store 304, and display system 306 may be separate components in communication with each other, but in other examples, some combination of these components may be integrated together. Communication between these different components may be implemented using any number of wired communications links, wireless communications links, optical communications links, or a combination thereof.
[0157] Analysis system 300 includes, for example, peptide structure analyzer 308, which may be implemented using hardware, software, firmware, or a combination thereof. In one or more embodiments, peptide structure analyzer 308 is implemented using computing platform 302.
[0158] Peptide structure analyzer 308 receives peptide structure data 310 for processing. Peptide structure data 310 may be, for example, the peptide structure data that is output from sample preparation and processing 106 in Figures 1, 2 A, and 2B. Accordingly, peptide structure data 310 may correspond to set of peptide structures 122 identified for biological sample 112 and may thereby correspond to biological sample 112.
[0159] Peptide structure data 310 can be sent as input into peptide structure analyzer 308, retrieved from data store 304 or some other type of storage (e.g., cloud storage), accessed from cloud storage, or obtained in some other manner. In some cases, peptide structure data 310 may be retrieved from data store 304 in response to (e.g., directly or indirectly based on) receiving user input entered by a user via an input device.
[0160] Peptide structure analyzer 308 includes model 312 that is configured to receive peptide structure data 310 for processing. Model 312 may be implemented in any of a number of different ways. Model 312 may be implemented using any number of models, functions, equations, algorithms, and/or other mathematical techniques.
[0161] In one or more embodiments, model 312 includes machine learning system 314, which may itself be comprised of any number of machine learning models and/or algorithms. For example, machine learning system 314 may include, but is not limited to, at least one of a deep learning model, a neural network, a linear discriminant analysis model, a quadratic discriminant analysis model, a support vector machine, a random forest algorithm, a nearest neighbor algorithm (e.g., a k-Nearest Neighbors algorithm), a combined discriminant analysis model, a k-means clustering algorithm, an unsupervised model, a multivariable regression model, a penalized multivariable regression model, or another type of model. In various embodiments, model 312 includes a machine learning system 314 that comprises any number of or combination of the models or algorithms described above. [0162] In various embodiments, model 312 analyzes peptide structure data 310 to generate disease indicator 316 that indicates whether the biological sample is positive for an ovarian cancer disease state based on set of peptide structures 318 identified as being associated with the ovarian cancer disease state. Peptide structure data 310 may include quantification data for the plurality of peptide structures. Quantification data for a peptide structures can include at least one of an abundance, a relative abundance, a normalized abundance, a relative quantity, an adjusted quantity, a normalized quantity, a relative concentration, an adjusted concentration, or a normalized concentration. For example, peptide structure data 310 may include a set of quantification metrics for each peptide structure of a plurality of peptide structures. A quantification metric for a peptide structure may be selected as one of a relative quantity, an adjusted quantity, a normalized quantity, a relative abundance, an adjusted abundance, and a normalized abundance. In some cases, a quantification metric for a peptide structure is selected from one of a relative concentration, an adjusted concentration, and a normalized concentration. In one or more embodiments, the quantification metrics used are normalized abundances. In this manner, peptide structure data 310 may provide abundance information about the plurality of peptide structures with respect to biological sample 112.
[0163] Disease indicator 316 may take various forms. In some examples, disease indicator 316 includes a classification that indicates whether or not the subject is positive for the ovarian cancer disease state. In various embodiments, disease indicator 316 can include a score 320. Score 320 indicates whether the ovarian cancer disease state is present or not. For example, score 320 may be, a probability score that indicates how likely it is that the biological sample 112 evidences the presence of the ovarian cancer disease state.
[0164] In one or more embodiments, a peptide structure of set of peptide structures 318 comprises a glycosylated peptide structure, or glycopeptide structure, that is defined by a peptide sequence and a glycan structure attached to a linking site of the peptide sequence quantity. For example, the peptide structure may be a glycopeptide or a portion of a glycopeptide. In some embodiments, a peptide structure of set of peptide structures 318 comprises an aglycosylated peptide structure that is defined by a peptide sequence. For example, the peptide structure may be a peptide or a portion of a peptide and may be referred to as a quantification peptide.
[0165] Set of peptide structures 318 may be identified as being those most predictive or relevant to the ovarian cancer disease state based on training of model 312. In one or more embodiments, set of peptide structures 318 includes at least one, at least two, or at least three peptide structures from a first group of peptide structures (peptide structures PS-1 through PS- 10) identified in Table 1 in Section VI. A. or at least one, at least two, or at least three peptide structures from a second group of peptide structures (peptide structures PS-5 and PS-11 through PS-34) identified in Table 2 in Section VI. A. For example, in one or more embodiments, set of peptide structures 318 includes at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or all 10 of the peptide structures identified in Table 1 below in Section VI. A. In one or more other embodiments, set of peptide structures 318 includes at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, or all 25 of the peptide structures identified in Table 2 below in Section VI. A. In one or more embodiments, set of peptide structures 318 includes at least peptide structure PS-5, which is identified in both Table 1 and Table 2. In some cases, the number of peptide structures selected from Table 1 for inclusion in set of peptide structures 318 may be based on, for example, a desired level of accuracy. 0166) In one or more embodiments, set of peptide structures 318 includes at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, or all 38 of the peptide structures identified in Table 3 below in Section VI. A. In one or more embodiments, set of peptide structures 318 includes at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 412, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, or all 61 of the peptide structures listed in Tables 1, 2, and 3.
[0167] In various embodiments, machine learning system 314 takes the form of binary classification model 322. Binary classification model 322 may include, for example, but is not limited to, a regression model. Binary classification model 322 may include, for example, a penalized multivariable regression model that is trained to identify set of peptide structures 318 from a plurality of (or panel of) peptide structures identified in various subjects. Binary classification model 322 may be trained to identify weight coefficients for peptide structures and those peptide structures having non-zero weights or weight coefficients above a selected threshold (e.g., absolute weight coefficient above 0.0, 0.01, 0.05, 0.1, 0.015, 0.2, etc.) may be selected for inclusion in set of peptide structures 318. [0168] Peptide structure analyzer 308 may generate final output 128 based on disease indicator 316 output by model 312. In other embodiments, final output 128 may be an output generated by model 312.
[0169] In some embodiments, final output 128 includes disease indicator 316. In one or more embodiments, final output 128 includes diagnosis output 324, treatment output 326, or both. Diagnosis output 324 may include, for example, a diagnosis for the ovarian cancer disease state. The diagnosis can include a positive diagnosis or a negative diagnosis for the ovarian cancer disease state. In one or more embodiments, generating diagnosis output 324 may include comparing score 320 to selected threshold 328 to determine the diagnosis. Selected threshold 328 may be, for example, without limitation, a score between 0.30 and 0.65 (e.g., 0.4, 0.5, 0.6, etc.). For example, when selected threshold 328 is set to 0.5, a score 320 above 0.5 (or at or above 0.5) may indicate the presence of the ovarian cancer disease state and be output in diagnosis output 324 as a positive diagnosis. A score 320 below 0.5 (or at or below 0.5) may indicate that the ovarian cancer disease state is not present and be output in diagnosis output 324 as a negative diagnosis. In one or more embodiments, a negative diagnosis indicates that the subject is healthy. In one or more embodiments, a negative diagnosis indicates that a detected pelvic tumor (or mass) is benign.
[0170] In one or more embodiments, when disease indicator 316 and/or diagnosis output 324 indicate a positive diagnosis for the ovarian cancer disease state, a biopsy may be recommended. For example, a biopsy of the subject may be performed in response to disease indicator 316 and/or diagnosis output 324 indicating a positive diagnosis for the ovarian cancer disease state. In some embodiments, peptide structure analyzer 308 (or another system implemented on computing platform 302) may generate a report recommending that a biopsy is to be performed for the subject in response to disease indicator 316 and/or diagnosis output 324 indicating a positive diagnosis for the ovarian cancer disease state. In other embodiments, peptide structure analyzer 308 may send diagnosis final output 128 to remote system 130 over one or more wireless, wired, and/or optical communications links and remote system 130 may generate a report recommending that a biopsy is to be performed for the subject in response to disease indicator 316 and/or diagnosis output 324 indicating a positive diagnosis for the ovarian cancer disease state. The biopsy may be used to confirm the diagnosis to determine whether or not to administer treatment and/or how quickly to administer treatment. When disease indicator 316 and/or diagnosis output 324 indicate a negative diagnosis for the ovarian cancer disease state (e.g., benign pelvic tumor), the report that is generated by peptide structure analyzer 308, remote system 130, or some other system implemented on computing platform 142 may recommend a period of monitoring for the subject. For example, a negative diagnosis indication by disease indicator 316 and/or diagnosis output 324 may thus help prevent unnecessary treatment or overtreatment of the subject.
[0171] Treatment output 326 may include, for example, at least one of an identification of a treatment for the subj ect, a treatment plan for administering the treatment, or both. Treatment for ovarian cancer may include, for example, but is not limited to, at least one of surgery, radiation therapy, a targeted drug therapy (e.g., one or more targeted therapeutic agents), chemotherapy (e.g., one or more chemotherapeutic agents), immunotherapy (e.g., one or more immunotherapeutic agents), hormone therapy, neoadjuvant therapy, or some other form of treatment. The treatment plan may include, for example, but is not limited to, a timeline or schedule for administering the treatment, dosing information, other treatment-related information, or a combination thereof.
[0172] Final output 128 may be sent to remote system 130 for processing in some examples. In other embodiments, final output 128 may be displayed on graphical user interface 330 in display system 306 for viewing by a human operator.
V. A.2. Computer Implemented System
[0173] Figure 4 is a block diagram of a computer system in accordance with various embodiments. Computer system 400 may be an example of one implementation for computing platform 302 described above in Figure 3.
[0174] In one or more examples, computer system 400 can include a bus 402 or other communication mechanism for communicating information, and a processor 404 coupled with bus 402 for processing information. In various embodiments, computer system 400 can also include a memory, which can be a random-access memory (RAM) 406 or other dynamic storage device, coupled to bus 402 for determining instructions to be executed by processor 404. Memory also can be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 404. In various embodiments, computer system 400 can further include a read only memory (ROM) 408 or other static storage device coupled to bus 402 for storing static information and instructions for processor 404. A storage device 410, such as a magnetic disk or optical disk, can be provided and coupled to bus 402 for storing information and instructions.
[0175] In various embodiments, computer system 400 can be coupled via bus 402 to a display 412, such as a cathode ray tube (CRT), liquid crystal display (LCD), or light emitting diode (LED) for displaying information to a computer user. An input device 414, including alphanumeric and other keys, can be coupled to bus 402 for communicating information and command selections to processor 404. Another type of user input device is a cursor control 416, such as a mouse, a joystick, a trackball, a gesture input device, a gaze-based input device, or cursor direction keys for communicating direction information and command selections to processor 404 and for controlling cursor movement on display 412. This input device 414 typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane. However, it should be understood that input devices 414 allowing for three-dimensional (e.g., x, y, and z) cursor movement are also contemplated herein.
[0176] Consistent with certain implementations of the present teachings, results can be provided by computer system 400 in response to processor 404 executing one or more sequences of one or more instructions contained in RAM 406. Such instructions can be read into RAM 406 from another computer-readable medium or computer-readable storage medium, such as storage device 410. Execution of the sequences of instructions contained in RAM 406 can cause processor 404 to perform the processes described herein. Alternatively, hard-wired circuitry can be used in place of or in combination with software instructions to implement the present teachings. Thus, implementations of the present teachings are not limited to any specific combination of hardware circuitry and software.
[0177] The term “computer-readable medium” (e.g., data store, data storage, storage device, data storage device, etc.) or “computer-readable storage medium” as used herein refers to any media that participates in providing instructions to processor 404 for execution. Such a medium can take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Examples of non-volatile media can include, but are not limited to, optical, solid state, magnetic disks, such as storage device 410. Examples of volatile media can include, but are not limited to, dynamic memory, such as RAM 406. Examples of transmission media can include, but are not limited to, coaxial cables, copper wire, and fiber optics, including the wires that comprise bus 402.
[0178] Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, or any other tangible medium from which a computer can read.
[0179] In addition to computer readable medium, instructions or data can be provided as signals on transmission media included in a communications apparatus or system to provide sequences of one or more instructions to processor 404 of computer system 400 for execution. For example, a communication apparatus may include a transceiver having signals indicative of instructions and data. The instructions and data are configured to cause one or more processors to implement the functions outlined in the disclosure herein. Representative examples of data communications transmission connections can include, but are not limited to, telephone modem connections, wide area networks (WAN), local area networks (LAN), infrared data connections, NFC connections, optical communications connections, etc.
[0180] It should be appreciated that the methodologies described herein, flow charts, diagrams, and accompanying disclosure can be implemented using computer system 400 as a standalone device or on a distributed network of shared computer processing resources such as a cloud computing network.
[0181] The methodologies described herein may be implemented by various means depending upon the application. For example, these methodologies may be implemented in hardware, firmware, software, or any combination thereof. For a hardware implementation, the processing unit may be implemented within one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), processors, controllers, micro-controllers, microprocessors, electronic devices, other electronic units designed to perform the functions described herein, or a combination thereof.
[0182] In various embodiments, the methods of the present teachings may be implemented as firmware and/or a software program and applications written in conventional programming languages such as C, C++, Python, etc. If implemented as firmware and/or software, the embodiments described herein can be implemented on a non-transitory computer-readable medium in which a program is stored for causing a computer to perform the methods described above. It should be understood that the various engines described herein can be provided on a computer system, such as computer system 400, whereby processor 404 would execute the analyses and determinations provided by these engines, subject to instructions provided by any one of, or a combination of, the memory components RAM 406, ROM, 408, or storage device 410 and user input provided via input device 414. VI. Exemplary Methodologies Relating to Diagnosis based on Peptide Structure Data Analysis
VI. A. Exemplary Methodology — Based on Tables 1 and 2
[0183] Figure 5 is a flowchart of a process for diagnosing a subject with respect to an ovarian cancer disease state in accordance with one or more embodiments. Process 500 may be implemented using, for example, at least a portion of workflow 100 as described in Figures 1, 2A, and 2B and/or analysis system 300 as described in Figure 3. Process 500 may be used to generate a final output that includes at least a diagnosis output for the subject.
[0184] Step 502 includes receiving peptide structure data corresponding to a biological sample obtained from the subject. The peptide structure data may be, for example, one example of an implementation of peptide structure data 310 in Figure 3. The peptide structure data may include quantification data for each peptide structure of a plurality of peptide structures. The quantification data may include, for example, one or more quantification metrics for each peptide structure of the plurality of peptide structures. A quantification metric for a peptide structure may be, for example, but is not limited to, a relative quantity, an adjusted quantity, a normalized quantity, a relative concentration, an adjusted concentration, or a normalized concentration. In this manner, the quantification data for a given peptide structure provides an indication of the abundance of the peptide structure in the biological sample. In some cases, at least one peptide structure includes a glycopeptide structure having a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 1 or Table 2, with the peptide sequence being one of SEQ ID NOS: 11-19 in Table 1 or one of SEQ ID NOS: 14, 15, and 31-46 in Table 2, the SEQ ID NOS being defined in Table 5 below.
[0185] Step 504 includes analyzing the peptide structure data using a supervised machine learning model to generate a disease indicator that indicates whether the biological sample evidences an ovarian cancer disease state based on at least three peptide structures selected from a first group of peptide structures identified in Table 1 (below) or a second group of peptide structures identified in Table 2 (below). In step 504, the first and second groups of peptide structures are associated with the ovarian cancer disease state. The first group of peptide structures is listed in Table 1 with respect to relative significance to the disease indicator. The second group of peptide structures is listed in Table 2 with respect to relative significance to the disease indicator.
[01 6] The first group of peptide structures in Table 1 includes peptide structures that have been determined relevant to distinguishing at least between ovarian cancer (e.g., EOC) and a healthy state. For example, the first group of peptide structures may be used to predict the probability of EOC for use in clinically screening patients. In one or more embodiments, the first group of peptide structures in Table 1 may also be peptide structures that have been determined relevant to distinguishing between ovarian cancer (e.g., EOC) and a benign tumor state (e.g., a benign pelvic tumor). For example, the first group of peptide structures may be used to clinically triage patients that have been identified as having pelvic tumors to determine the probability that such a tumor evidences EOC.
[0187] The second group of peptide structures in Table 2 includes peptide structures that have been determined relevant to distinguishing at least between ovarian cancer (e.g., EOC) and the benign tumor state (e.g., a benign pelvic tumor). For example, the second group of peptide structures may be used to clinically triage patients that have been identified as having pelvic tumors to determine the probability that such a tumor evidences EOC. In this manner, the second group of peptide structures may predict malignancy of an identified pelvic tumor.
[0188] In one or more embodiments, the at least 3 peptide structures includes at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or all 10 of the peptide structures PS-1 to PS-10 in Table 1. In some embodiments, the at least 3 peptide structures include at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, or all 25 of the peptide structures PS-5 and PS-11 through PS-34 in Table 1. In some embodiments, the at least 3 peptide structures includes at least PS-5, which is present in both Table 1 and Table 2.
[0189] In one or more embodiments, step 504 may be implemented using a binary classification model (e.g., a regression model). In some examples, the regression model may be, for example, penalized multivariable regression model. In various embodiments, the disease indicator may be computed using a weight coefficient associated with each peptide structure of the at least 3 peptide structures, the weight coefficient of a corresponding peptide structure of the at least 3 peptide structures may indicate the relative significance of the corresponding peptide structure to the disease indicator.
[0190] In some embodiments, step 504 may include computing a peptide structure profile for the biological sample that identifies a weighted value for each peptide structure of the at least 3 peptide structures. The weighted value for a peptide structure of the at least 3 peptide structures may be a product of a quantification metric for the peptide structure identified from the peptide structure data and a weight coefficient for the peptide structure. The disease indicator may be computed using the peptide structure profile. For example, the disease indicator may be a logit equal to the sum of the weighted values for the peptide structures plus an intercept value. The intercept value may be determined during the training of the model.
[0191] The peptide structure profile for a given peptide structure may include a corresponding feature — relative abundance, concentration, site occupancy — for that peptide structure. The relative abundance may be a normalized relative abundance; the concentration may be normalized concentration. In some cases, two peptide structure profiles may be computed for the same peptide structure, each profile corresponding to a different feature. For example, a first peptide structure profile may include a relative abundance for a corresponding peptide structure and a second peptide structure profile may include a concentration for the same corresponding peptide structure.
[0192] In various embodiments, the disease indicator comprises a probability that the biological sample is positive for the ovarian cancer disease state and the supervised machine learning model is configured to generate an output that identifies the biological sample as either evidencing (“positive for”) the ovarian cancer disease state when the disease indicator is greater than a selected threshold or not evidencing (“negative for”) the ovarian cancer disease state when the disease indicator is not greater than the selected threshold. The selected threshold may be, for example, 0.30, 0.35, 0.40, 0.45, 0.50, 0.55, 0.60, or some other threshold between 0.30 and 0.65. In one or more embodiments, the selected threshold is 0.5.
[0193] Step 506 includes generating a final output based on the disease indicator. The final output may include a diagnosis output, such as, for example, diagnosis output 324 in Figure 3. The diagnosis output may include the disease indicator, or a diagnosis made based on the disease indicator. The diagnosis may be, for example, “positive” for the ovarian cancer disease state if the biological sample evidences the ovarian cancer disease state based on the disease indicator. The diagnosis may be, for example, “negative” if the biological sample does not evidence the ovarian cancer disease state based on the disease indicator. A negative diagnosis may mean that the biological sample has a non-ovarian cancer state. The negative diagnosis for the ovarian cancer disease state can include at least one of a healthy state, a benign tumor state, or some other non-malignant state.
[0194] Generating the diagnosis output in step 506 may include determining that the score falls above (or at or above) a selected threshold and generating a positive diagnosis for the ovarian cancer disease state. Alternatively, step 506 can include determining that the score falls below (or at or below) a selected threshold and generating a negative diagnosis for the ovarian cancer disease state. In some scoring systems, the score can include a probability score and the selected threshold can be 0.5. In other scoring systems, the selected threshold can fall within a range between 0.30 and 0.65.
[0195] In one or more embodiments, the final output in step 506 may include a treatment output if the diagnosis output indicates a positive diagnosis for the ovarian cancer disease state. The treatment output may include, for example, at least one of an identification of a treatment for the subject, a treatment plan for administering the treatment, or both. Treatment for ovarian cancer may include, for example, but is not limited to, at least one of surgery, radiation therapy, a targeted drug therapy (e.g., one or more targeted therapeutic agents), chemotherapy (e.g., one or more chemotherapeutic agents), immunotherapy (e.g., one or more immunotherapeutic agents), hormone therapy, neoadjuvant therapy, or some other form of treatment. The treatment plan may include, for example, but is not limited to, a timeline or schedule for administering the treatment, dosing information, other treatment-related information, or a combination thereof.
[0196] Table 1 below lists a first group of peptide structures associated with malignant pelvic tumors (e.g., ovarian cancer such as EOC). One or more features (e.g., relative abundance, concentration, site occupancy) of these peptide structures may be used in the supervised machine learning model described above to generate a disease indicator that predicts the probability of malignancy (e.g., in the context of screening for malignant pelvic tumors). The first group of peptide structures is listed in Table 1 in order with respect to relative significance to the disease indicator. In training, testing, and predictive use of this model, the quantification metrics for peptide structure PS-9, peptide structure PS-10, or a combination of the two may form one input. Table 1 also identifies check markers CK-1 and CK-2, which may also be used by the model.
Table 1: 1st Group of Peptide Structures Associated with Ovarian Cancer
(may be used to distinguish between malignant pelvic tumor (e.g., EOC) and healthy)
Figure imgf000042_0001
Figure imgf000043_0001
[0197] Table 2 below lists a second group of peptide structures associated with malignant pelvic tumors (e.g., ovarian cancer such as EOC). One or more features (e.g., relative abundance, concentration, site occupancy) of these peptide structures may be used in the supervised machine learning model described above to generate a disease indicator that predicts the probability of malignancy (e.g., in the context of triaging to distinguish between malignant and benign pelvic tumors). The second group of peptide structures is listed in Table 2 in order with respect to relative significance to the disease indicator. Table 2 also identifies check markers CK-3 and CK-4, which may also be used by the model.
Table 2: 2nd Group of Peptide Structures Associated with Ovarian Cancer
(may be used to distinguish between malignant v. benign pelvic tumors)
Figure imgf000043_0002
Figure imgf000044_0001
VLB.l Exemplary Methodology — Based on Table 3
[0198] Figure 6 is a flowchart of a process for diagnosing a subject with respect to an ovarian cancer disease state in accordance with one or more embodiments. Process 600 may be implemented using, for example, at least a portion of workflow 100 as described in Figures 1, 2A, and 2B and/or analysis system 300 as described in Figure 3. Process 600 may be used to generate a final output that includes at least a diagnosis output for the subject.
[0199] Step 602 includes receiving peptide structure data corresponding to a biological sample obtained from the subject. The peptide structure data may be, for example, one example of an implementation of peptide structure data 310 in Figure 3. The peptide structure data may include quantification data for each peptide structure of a plurality of peptide structures. The quantification data may include, for example, one or more quantification metrics for each peptide structure of the plurality of peptide structures. A quantification metric for a peptide structure may be, for example, but is not limited to, a relative quantity, an adjusted quantity, a normalized quantity, a relative concentration, an adjusted concentration, or a normalized concentration. In this manner, the quantification data for a given peptide structure provides an indication of the abundance of the peptide structure in the biological sample. In some cases, at least one peptide structure includes a glycopeptide structure having a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3, with the peptide sequence being one of SEQ ID NOS: 11, 14, 15, 31,32, 33, 34, 37, 38, 40, 42, 44, 45, 46, 53-65 in Table 3, the SEQ ID NOS being defined in Table 5 below.
[0200] Step 604 includes analyzing the peptide structure data using a supervised machine learning model to generate a disease indicator that predicts whether the biological sample evidences a malignant pelvic tumor or benign pelvic tumor based on at least three peptide structures selected from a group of peptide structures identified in Table 3. The group of peptide structures is listed in Table 3 with respect to relative significance to the disease indicator, which may be a probability score. In step 604, the group of peptide structures is associated with the malignancy (e.g., EOC). For example, the group of peptide structures in Table 3 includes peptide structures that have been determined relevant to distinguishing between a malignant and benign nature of a pelvic tumor.
[0201] In one or more embodiments, the at least 3 peptide structures includes at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, or all 38 of the peptide structures PS-1, PS-5, PS-11, PS-15, PS-20, PS-25, PS-28, PS- 29, PS-30, PS-31, PS-32, and PS-35 to PS-61 identified in Table 3.
[0202] In one or more embodiments, step 604 may be implemented using a binary classification model (e.g., a regression model). In some examples, the regression model may be, for example, penalized multivariable regression model. In various embodiments, the disease indicator may be computed using a weight coefficient associated with each peptide structure of the at least 3 peptide structures, the weight coefficient of a corresponding peptide structure of the at least 3 peptide structures may indicate the relative significance of the corresponding peptide structure to the disease indicator.
[0203] In some embodiments, step 604 may include computing a peptide structure profile for the biological sample that identifies a weighted value for each peptide structure of the at least 3 peptide structures. The weighted value for a peptide structure of the at least 3 peptide structures may be a product of a quantification metric for the peptide structure identified from the peptide structure data and a weight coefficient for the peptide structure. The disease indicator may be computed using the peptide structure profile. For example, the disease indicator may be a logit equal to the sum of the weighted values for the peptide structures plus an intercept value. The intercept value may be determined during the training of the model.
[0204] In various embodiments, the disease indicator comprises a probability that the biological sample is evidences malignancy (e.g., EOC) and the supervised machine learning model is configured to generate an output that identifies the biological sample as either evidencing (“positive for”) malignancy when the disease indicator is greater than a selected threshold or not evidencing (“negative for”) malignancy when the disease indicator is not greater than the selected threshold. The selected threshold may be, for example, 0.30, 0.35, 0.40, 0.45, 0.50, 0.55, 0.60, or some other threshold between 0.30 and 0.65. In one or more embodiments, the selected threshold is 0.5.
[0205] Step 606 includes generating a final output based on the disease indicator. The final output may include a diagnosis output, such as, for example, diagnosis output 324 in Figure 3. The diagnosis output may include the disease indicator, or a diagnosis made based on the disease indicator. The diagnosis may be, for example, “positive” for an ovarian cancer disease state (e.g., EOC) if the biological sample evidences malignancy based on the disease indicator. The diagnosis may be, for example, “negative” if the biological sample does not evidence malignancy based on the disease indicator. A negative diagnosis may mean that the biological sample evidences a benign status (or a non-ovarian cancer state).
[0206] Generating the diagnosis output in step 606 may include determining that the score falls above (or at or above) a selected threshold and generating a positive diagnosis for the ovarian cancer disease state. Alternatively, step 606 can include determining that the score falls below (or at or below) a selected threshold and generating a negative diagnosis for the ovarian cancer disease state. In some scoring systems, the score can include a probability score and the selected threshold can be 0.5. In other scoring systems, the selected threshold can fall within a range between 0.30 and 0.65.
[0207] In one or more embodiments, the final output in step 606 may include a treatment output if the disease indicator predicts malignancy and/or the diagnosis output indicates a positive diagnosis for the ovarian cancer disease state. The treatment output may include, for example, at least one of an identification of a treatment for the subject, a treatment plan for administering the treatment, or both. Treatment for ovarian cancer may include, for example, but is not limited to, at least one of surgery, radiation therapy, a targeted drug therapy (e.g., one or more targeted therapeutic agents), chemotherapy (e.g., one or more chemotherapeutic agents), immunotherapy (e.g., one or more immunotherapeutic agents), hormone therapy, neoadjuvant therapy, or some other form of treatment. The treatment plan may include, for example, but is not limited to, a timeline or schedule for administering the treatment, dosing information, other treatment-related information, or a combination thereof.
Table 3: 3rd Group of Peptide Structures Associated with Ovarian Cancer
(may be used to distinguish between malignant and benign pelvic tumors)
Figure imgf000047_0001
Figure imgf000048_0001
VLB.2 Exemplary Methodology — Based on Table 3B
[0208] Figure 6B is a flowchart of a process for diagnosing a subject with respect to an ovarian cancer disease state in accordance with one or more embodiments. Process 600B may be implemented using, for example, at least a portion of workflow 100 as described in Figures 1, 2A, and 2B and/or analysis system 300 as described in Figure 3. Process 600B may be used to generate a final output that includes at least a diagnosis output for the subject such as, for example early stage EOC or late stage EOC.
[0209] Step 602B includes receiving peptide structure data corresponding to a biological sample obtained from the subject. The peptide structure data may be, for example, one example of an implementation of peptide structure data 310 in Figure 3. The peptide structure data may include quantification data for each peptide structure of a plurality of peptide structures. The quantification data may include, for example, one or more quantification metrics for each peptide structure of the plurality of peptide structures. A quantification metric for a peptide structure may be, for example, but is not limited to, a relative quantity, an adjusted quantity, a normalized quantity, a relative concentration, an adjusted concentration, or a normalized concentration. In this manner, the quantification data for a given peptide structure provides an indication of the abundance of the peptide structure in the biological sample. In some cases, at least one peptide structure includes a glycopeptide structure having a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3B, with the peptide sequence being one of SEQ ID NOS: 14, 18, 32, 33, 37, 39, 42, 45, 54, 56, 60, 68-77 in Table 3B, the SEQ ID NOS being defined in Table 5 below. It should be noted that the glycopeptides of Table 3B were part of a glycoprotein that are further described in Table 6 and that the glycan portion of the glycopeptides is described in
Table 7.
[0210] Step 604B includes analyzing the peptide structure data using a supervised machine learning model to generate a disease indicator that predicts whether the biological sample evidences an early stage or late stage EOC on at least one peptide structure selected from a group of peptide structures identified in Table 3B. In step 604B, the group of peptide structures is associated with the early stage or late stage EOC. For example, the group of peptide structures in Table 3B includes peptide structures that have been determined relevant to distinguishing between early stage (stages 1 and 2) or late stage (stages 3 and 4) EOC.
[0211] In one or more embodiments, the at least 1 peptide structure includes at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, or all 36 of the peptide structures PS-4, PS-8, PS-18, PS-36, PS-37, PS-41, PS-56, PS- 62 to PS-90 identified in Table 3B.
[0212] In one or more embodiments, step 604B may be implemented using a binary classification model (e.g., a regression model). In some examples, the regression model may be, for example, penalized multivariable regression model. In various embodiments, the disease indicator may be computed using a weight coefficient associated with each peptide structure of the at least 1 peptide structure, the weight coefficient of a corresponding peptide structure of the at least 1 peptide structures may indicate the relative significance of the corresponding peptide structure to the disease indicator.
[0213] In some embodiments, step 604B may include computing a peptide structure profile for the biological sample that identifies a weighted value for each peptide structure of the at least 1 peptide structure. The weighted value for a peptide structure of the at least 1 peptide structure may be a product of a quantification metric for the peptide structure identified from the peptide structure data and a weight coefficient for the peptide structure. The disease indicator may be computed using the peptide structure profile. For example, the disease indicator may be a logit equal to the sum of the weighted values for the peptide structures plus an intercept value. The intercept value may be determined during the training of the model.
[0214] In various embodiments, the disease indicator comprises a probability that the biological sample is evidences malignancy (e.g., EOC) and the supervised machine learning model is configured to generate an output that identifies the biological sample as either evidencing (“positive for”) malignancy when the disease indicator is greater than a selected threshold or not evidencing (“negative for”) malignancy when the disease indicator is not greater than the selected threshold. The selected threshold may be, for example, 0.30, 0.35, 0.40, 0.45, 0.50, 0.55, 0.60, or some other threshold between 0.30 and 0.65. In one or more embodiments, the selected threshold is 0.5.
[0215] Step 606B includes generating a final output based on the disease indicator. The final output may include a diagnosis output, such as, for example, diagnosis output 324 in Figure 3. The diagnosis output may include the disease indicator, or a diagnosis made based on the disease indicator. The diagnosis may be, for example, early stage or late stage based on the disease indicator. An early stage diagnosis may mean that the biological sample evidences a stage 1 or 2 EOC. A late stage diagnosis may mean that the biological sample evidences a stage 3 or 4 EOC.
[0216] Generating the diagnosis output in step 606B may include determining that the score falls above (or at or above) a selected threshold and generating a positive diagnosis for the late stage ovarian cancer disease state. Alternatively, step 606B can include determining that the score falls below (or at or below) a selected threshold and generating a negative diagnosis for the late stage ovarian cancer disease state. In some scoring systems, the score can include a probability score and the selected threshold can be 0.5. In other scoring systems, the selected threshold can fall within a range between 0.30 and 0.65.
[0217] In one or more embodiments, the final output in step 606B may include a treatment output if the disease indicator predicts malignancy and/or the diagnosis output indicates a positive diagnosis for the ovarian cancer disease state. The treatment output may include, for example, at least one of an identification of a treatment for the subject, a treatment plan for administering the treatment, or both. Treatment for ovarian cancer may include, for example, but is not limited to, at least one of surgery, radiation therapy, a targeted drug therapy (e.g., one or more targeted therapeutic agents), chemotherapy (e.g., one or more chemotherapeutic agents), immunotherapy (e.g., one or more immunotherapeutic agents), hormone therapy, neoadjuvant therapy, or some other form of treatment. The treatment plan may include, for example, but is not limited to, a timeline or schedule for administering the treatment, dosing information, other treatment-related information, or a combination thereof.
Table 3B. Group of Peptide Structures Associated with Ovarian Cancer (may be used to distinguish between early stage v. late stage ovarian cancer)
Figure imgf000051_0001
Figure imgf000052_0001
[0218] It is worthwhile to note that with a few exceptions (PS-62 and PS-67), the majority of glycopeptides were tri- and tetra-antennary glycans with or without a fucose and were found to be associated with either early stage or late stage EOC. Fold changes (FCs) for several glycopeptides in stage IV (referred to as metastatic ovarian cancer) vs benign/ stage I/II/III (referred to as non- metastatic ovarian cancer) were calculated by normalizing to normal blood samples, as illustrated in Figure 18. The FCs were observed to stratify between fucosylated and non-fucosylated (plots include median and 95% confidence interval). FCs that were above the 1 corresponded to markers that correlate with metastatic ovarian cancer and those below 1 corresponded to markers that correlate with non-metastatic ovarian cancer. The Wilcoxon matched-pairs signed rank test was used to compare the two groups and a p value found to be <0.0001 showing a statistical difference between non-fucosylated and fucosylated. Figures 19A and 19B illustrate that a same set of markers in doublets/triplets analysis for fucosylation revealed a strong association with either metastatic ovarian cancer or non-metastatic ovarian cancer. Doublet analysis refers to monitoring the fold change of a non-fucosylated and fucosylated glycopeptide that was tri or tetra-antennary for sialic acid and had the same peptide sequence and glycan linking site. Triplet analysis refers to monitoring the fold change of a non-fucosylated, fucosylated, and di-fucosylated glycopeptide that was tri or tetraantennary for sialic acid and had the same peptide sequence and glycan linking site. Figures 15A-15E shows that the fucosylated biomarkers (have a number 1 in the 2nd to last number in the Peptide structure (PS) Name) show a relatively upward trend from stage 1/2 to stage 3/4. In contrast, Figures 15A-15E shows that the non- fucosylated biomarkers (have a number 0 in the 2nd to last number in the Peptide structure (PS) Name) show an relatively downward trend from stage 1/2 to stage 3/4. For instance, the glycan numbers 6513, 7613, 7614 are examples of fucosylated glycans having tri or tetra- antennary sialic acids. The glycan numbers 6503, 7603, 7604 are examples of non- fucosylated glycans having tri or tetra-antennary sialic acids.
VLB.3 Exemplary Methodology — Based on Table 3C
[0219] In another embodiment, process 600B may be implemented using Table 3C instead of Table 3B. In some cases, at least one peptide structure includes a glycopeptide structure having a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3C, with the peptide sequence being one of SEQ ID NOS: 101-125 in Table 3C.
[0220] The group of peptide structures in Table 3C includes peptide structures that have been determined relevant to distinguishing between early stage (stages 1 and 2) or late stage (stages 3 and 4) EOC.
[0221 ] In one or more embodiments, the at least 1 peptide structure includes at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, or all 25 of the peptide SEQ ID NOS: 101-125 identified in Table 3C.
Table 3C. Group of Peptide Structures Associated with Ovarian Cancer (may be used to distinguish between early state v. late stage ovarian cancer)
Figure imgf000053_0001
Figure imgf000054_0001
[0222] In Table 3C, the first three or four characters before the first underscore of the peptide structure (PS) name corresponds to the abbreviation of the protein name. More details on the protein sequence can be found in Table 6B below.
VLB.3 Exemplary Methodology — Based on Table 3D
[0223] In another embodiment, process 600B may be implemented using Table 3D instead of Table 3B. In some cases, at least one peptide structure includes a glycopeptide structure having a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3D, with the peptide sequence being one of SEQ ID NOS: 126-175 in Table 3D. In other cases, at least one peptide structure includes a glycopeptide structure having a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3D, with the peptide sequence being one of SEQ ID NOS: 131-134, 137, 139, 140, 143, 151, 165-167 in Table 3D [0224] The group of peptide structures in Table 3D includes peptide structures that have been determined relevant to distinguishing between early stage (stages 1 and 2) or late stage (stages 3 and 4) EOC.
[0225] In one or more embodiments, the at least 1 peptide structure includes at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, or all 50 of the peptide SEQ ID NOS: 126-175 identified in Table 3D.
Table 3D: Group of Peptide Structures Associated with Ovarian Cancer
(may be used to distinguish between early state v. late stage ovarian cancer)
Figure imgf000055_0001
Figure imgf000056_0001
Figure imgf000057_0001
[0226] With respect to Tables 1, 2, 3, 3A, 3B, 3C, and 3D, they include the Peptide Structure (PS) Name (e.g., KNGl_294_6503), which is a reference code for the protein name (e.g., KNG1), followed by the glycan linking site position in the protein (e.g., the number 294 that is preceded by an underscore and represents a sequential amino acid position in protein KNG1), and followed by the glycan structure GL number (e.g., the number 6503 that is preceded by an underscore and represents a glycan composition Hex(6)HexNAc(5)Fuc(0)NeuAc(3)). The Peptide Structure (PS) Name of contains a prefix that represents an abbreviation (that may include a combination of letters and numbers) for a protein abbreviation that corresponds to the Protein Abbreviation of Table 6. The term Linking Site Pos. in Protein Sequence is a number that refers to the sequential position of an amino acid of the corresponding protein in which a glycan is attached. For the Linking Site Pos. in Protein Sequence, the amino acid position of the peptide sequence is defined by the sequentially numbered order of amino acids based on the Uniprot ID of the corresponding protein for the peptide sequence. The term Linking Site Pos. in Peptide Sequence is a number that refers to the sequential position of an amino acid of the corresponding peptide in which a glycan is attached. For the Linking Site Pos. in peptide Sequence, the amino acid position of the peptide sequence is defined by the sequentially numbered order of amino acids for the peptide sequence. The term Glycan Structure GL No. is a number that corresponds to a symbol structure and a composition of the glycan as indicated in Tables 7. In some embodiments, the term AGP12 for SEQ ID NOs: 68-69 represents that the glycopeptide is a fragment of either AGP1 or AGP2.
[0227] In some instances, if the first number subsequent to the first underscore in the Peptide Structure (PS) NAME is inconsistent with the Glycan Linking Site Pos. in Protein Sequence column, then the Glycan Linking Site Pos. in Protein Sequence column should be used for identification of the peptide. In some instances, if the second number subsequent to the second underscore in the Peptide Structure (PS) NAME is inconsistent with the Glycan Structure GL NO column, then the Glycan Structure GL NO column should be used for identification of the glycan portion of the glycopeptide. If the Peptide Structure (PS) NAME does not contain any numbers, then the peptide is non-glycosylated. In some instances of the Peptide Structure (PS) NAME, subsequent to the prefix, there is a number noted with the notation MC that indicates that there was a missed cleavage at position in the peptide sequence as noted by the number.
VI. C. Training a Model to Predict Ovarian Cancer (e.g., Epithelial Ovarian Cancer)
[0228] Figure 7 is a flowchart of a process for training a model to diagnose a subject with respect to an ovarian cancer disease state in accordance with one or more embodiments. Process 700 may be implemented using, for example, at least a portion of workflow 100 as described in Figures 1, 2A, and 2B and/or analysis system 300 as described in Figure 3. In some embodiments, process 700 may be one example of an implementation for training the model used in the process 500 in Figures 5, 6, or 6B.
[ 229] Step 702 includes receiving quantification data for a panel of peptide structures for a plurality of subjects. The plurality of subjects may include a first portion diagnosed with a negative diagnosis of an ovarian cancer disease state and a second portion diagnosed with a positive diagnosis of the ovarian cancer disease state. The plurality of subjects may include a first portion having early stage EOC and a second portion have late stage EOC. The quantification data comprises an initial plurality of peptide structure profiles for the plurality of subjects. For example, a peptide structure profile in the initial plurality of peptide structure profiles may include a feature associated with a corresponding peptide structure. The feature may be relative abundance, concentration, site occupancy, or some other quantification-based feature. The initial plurality of peptide structure profiles may include, one, two, three, or more profiles for a given peptide structure.
[0230] Step 704 includes training a machine learning model using the quantification data to diagnose a biological sample with respect to the ovarian cancer disease state using a group of peptide structures associated with the ovarian cancer disease state (e.g., the first group of peptide structures is identified in Table 1, the second group of peptide structures is identified in Table 2, the third group of peptide structures is identified in Table 3). The first, second, and third groups of peptide structures are listed in Tables 1, 2, and 3, respectively, with respect to relative significance to diagnosing the biological sample as evidencing malignancy (e.g., EOC). Step 704 can include training the machine learning using a portion of the quantification data corresponding to a training group of peptide structures included in the plurality of peptide structures. Step 704 can include training a machine learning model using the quantification data to assess a biological sample with respect to the staging of the ovarian cancer disease state using a group of peptide structures associated with the ovarian cancer disease state such as a group of peptide structures identified in Table 3B, 3C, or 3D.
[0231 ] Step 704 may include reducing the plurality of peptide structure profiles using LASSO regression to identify a final group of peptide structures identified in Table 1 above. Step 704 may include reducing the plurality of peptide structure profiles using LASSO regression to identify a final group of peptide structures identified in Table 2 above. Step 704 may include reducing the plurality of peptide structure profiles using LASSO regression to identify a final group of peptide structures identified in Tables 3B, 3C, or 3D above. 0232) Training data can be used for training the supervised machine learning model. The training data can include a plurality of peptide structure profiles for a plurality of subjects and a plurality of subject diagnoses for the plurality of subjects. The plurality of subject diagnoses can include a positive diagnosis for any subject of the plurality of subjects determined to have the ovarian cancer disease state and a negative diagnosis for any subject of the plurality of subjects determined not to have the ovarian cancer disease state.
[0233] The machine learning model can include a binary classification model. Some binary classification models can include logistical regression models. Some logistical regression models can include LASSO regression models.
[0234] An alternative or additional step in process 700 can include filtering the initial plurality of peptide structure profiles by a coefficient of variation to generate a plurality of peptide structure profiles for use in training the machine learning model. As one example, only those peptide structure profiles having a low coefficient of variation (< 20%) were included int the plurality of peptide structure profiles used for training.
[0235] An alternative or additional step in process 700 can include performing a differential expression analysis using initial training data to compare a first portion of the plurality of subjects diagnosed with the positive diagnosis for the ovarian cancer disease state versus a second portion of the plurality of subjects diagnosed with the negative diagnosis for the ovarian cancer disease state. [0236] An alternative or additional step in process 700 can include identifying a first portion of the plurality of samples for subjects with benign pelvic tumors and malignant pelvic tumors and a second portion of the plurality of samples for subjects with a healthy status. An alternative or additional step in process 700 can include generating a training set of peptide structure profiles for 80% of the first portion and a test set of peptide structure profiles for a remaining 20% of the first portion and the second portion. 0237] In various embodiments, the machine learning model is a supervised machine learning model that is trained to determine weight coefficients for a panel of peptide structures such that a first portion of the weight coefficients for a first portion of the panel of peptide structures are non-zero and a second portion of the weight coefficients for a second portion of the panel of peptide structures are zero (or, alternatively, substantially close to zero so as to not be statistically significant).
VI. D. Methods of Treating Ovarian Cancer
[0238] In one or more embodiments, the final output generated in step 506 in Figure 5 or in step 606 in Figure 6 may include a treatment output. The treatment output may identify one or more treatment types for a subject based on the disease indicator and/or diagnosis output generated via process 500 in Figure 5 or process 600 in Figure 6, respectively. Treatment for ovarian cancer (e.g., EOC) may include, for example, but is not limited to, at least one of surgery, radiation therapy, a targeted drug therapy (e.g., one or more targeted therapeutic agents), chemotherapy (e.g., one or more chemotherapeutic agents), immunotherapy (e.g., one or more immunotherapeutic agents), hormone therapy, neoadjuvant therapy, or some other form of treatment. The treatment output may include, for example, a treatment plan. The treatment plan may include, for example, but is not limited to, a timeline or schedule for administering the treatment, dosing information, other treatment-related information, or a combination thereof. Being able to accurately predict malignancy via the process 500 in Figure 5 and/or the process 600 in Figure 6 may allow treatment for malignant pelvic tumors (e.g., EOC) to be started earlier without requiring, in many or most cases, further invasive testing such as a biopsy.
[0239] In one or more embodiments, a patient biological sample is obtained from a subject. The biological sample may be processed (e.g., via digestion and fragmentation) such that one or more peptide structures of interest are detected. For example, detection and quantification may be performed for one or more peptide structures from Table 1, Table 2, Table 3, Table 3B, Table 3C, and/or Table 3D. The quantification data that is generated for these peptide structures may be input into a trained binary classification model to generate a disease indicator, which may be, for example, a probability score. A determination may be made as to whether the disease indicator (e.g., score) is above or below a selected threshold (e.g., 0.5). If the disease indicator is above the selected threshold, the biological sample may be classified as evidencing malignant pelvic tumor.
[0240] Further, this classification may further include a classification that the subject is in need of treatment. If the subject is in need of treatment based on the classification, treatment is administered. For example, a therapeutically effective amount of a therapeutic agent is administered to the patient, where the therapeutic agent is selected from a chemotherapeutic agent, an immunotherapeutic agent, a hormone therapy, a targeted therapeutic agent, a neoadjuvant therapy, or a combination.
VII. Peptide Structure and Product Ion Compositions, Kits and Reagents
[0241 ] Aspects of the disclosure include compositions comprising one or more of the peptide structures listed in Table 1, in Table 2, in Table 3, in Table 3B, in Table 3C, or in Table 3D. In some embodiments, a composition comprises a plurality of the peptide structures listed in Table 1, a plurality of the peptide structures listed in Table 2, or a plurality of the peptide structures listed in Table 3. In some embodiments, a composition comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 412, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, or 90 of the peptide structures listed in Tables 1, 2, 3, 3B, 3C, and 3D. In one or more embodiments, a composition comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, or all 36 of the peptide structures listed in Table 3B. In one or more embodiments, a composition comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or all 25 of the glycopeptide structures listed in Table 3C. In one or more embodiments, a composition comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or all 50 of the glycopeptide structures listed in Table 3D.
[0242] In some embodiments, a composition comprises a peptide structure having an amino acid sequence with at least 80% sequence identity, such as, for example, at least 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to any one of SEQ ID NOs: 11-19, 31-46, 53-65, 68-77, 101-125, and 126-175 listed in Tables 1, 2, 3, 3B, 3C, and 3D.
[0243] In some embodiments, a composition comprises a peptide structure having an amino acid sequence with at least 80% sequence identity, such as, for example, at least 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to any one of SEQ ID NOs: 131-134, 137, 139, 140, 143, 151, 165-167 listed in Tables 3D
[0244] Aspects of the disclosure include compositions comprising one or more precursor ions having a defined charge and/or defined mass-to-charge (m/z) ratio, as listed in Tables 4, 4B, and 4C. Aspects of the disclosure include compositions comprising one or more product ions having a defined mass-to-charge (m/z) ratio, which product ions are produced by converting a peptide structure described herein (e.g., a peptide structure listed in Tables 1, 2, 3, 3B, 3C, or 3D) into a gas phase ion in a mass spectrometry system. Conversion of the peptide structure into a gas phase ion can take place using any of a variety of techniques, including, but not limited to, matrix assisted laser desorption ionization (MALDI); electron ionization (El); electrospray ionization (ESI); atmospheric pressure chemical ionization (APCI); and/or atmospheric pressure photo ionization (APPI).
[0245] Aspects of the disclosure include compositions comprising one or more product ions produced from one or more of the peptide structures described herein (e.g., a peptide structure listed in Tables 1, 2, 3, 3B, 3C, or 3D). In some embodiments, a composition comprises a set of the product ions listed in Table 4, 4B, or 4C having an m/z ratio selected from the list provided for each peptide structure in Table 4, 4B, or 4C.
[0246] In some embodiments, a composition comprises at least one of peptide structures PS-1 to PS-10 identified in Table 1. In some embodiments, a composition comprises at least one of peptide structures PS-11 to PS-34 and PS-5 identified in Table 2. In some embodiments, a composition comprises at least one of peptide structures PS-1, PS-5, PS-11, PS-15, PS-20, PS-25, PS-28, PS-29, PS-30, PS-31, PS-32, and PS-35 to PS-61 identified in Table 3. In some embodiments, a composition comprises at least one of peptide structures PS-4, PS-8, PS- 18, PS-36, PS-37, PS-41, PS-56, PS-62 to PS-90 identified in Table 3B. In some embodiments, a composition comprises at least one of peptide structures of SEQ ID NOS 101-125 identified in Table 3C. In some embodiments, a composition comprises at least one of peptide structures of PS-ID 91 to 140 identified in Table 3D. In some embodiments, a composition comprises at least one of peptide structures of PS-ID NO: 96-99, 102, 104, 105, 108, 116, and 130-132 identified in Table 3D. [0247] In one or more embodiments, a composition comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or all 10 of the peptide structures PS-1 to PS-10 identified in Table 1. In one or more embodiments, a composition comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, or all 25 of the peptide structures PS-11 to PS-34 and PS-5 identified in Table 2. In one or more embodiments, a composition comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, or all 38 of the peptide structures PS-1, PS-5, PS-11, PS-15, PS-20, PS-25, PS-28, PS-29, PS-30, PS-31, PS-32, and PS-35 to PS-61 identified in Table 3. In some embodiments, the at least 3 peptide structures additionally include at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, or all 7 of the remaining peptide structures PS-1, PS-5, PS-11, PS-15, PS-20, PS-25, PS-28, PS-29, PS-30, PS-31, PS-32, and PS-35 to PS-61 identified in Table 3. In one or more embodiments, a composition comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, or all 36 of the peptide structures PS-4, PS-8, PS-
18, PS-36, PS-37, PS-41, PS-56, PS-62 to PS-90 identified in Table 3B. In one or more embodiments, a composition comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, or all 25 of the peptide structures of SEQ ID NOS 121-125 identified in Table 3C. In one or more embodiments, a composition comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least
19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, or all 50 of the peptide structures of SEQ ID NOS 126-175 identified in Table 3D. In one or more embodiments, a composition comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, or all 12 of the peptide structures of SEQ ID NOS 131-134, 137, 139, 140, 143, 151, 165-167 identified in Table 3D
[0248] In some embodiments, a composition comprises a peptide structure or a product ion. The peptide structure or product ion can include an amino acid sequence having at least 90% sequence identity to any one of SEQ ID NOS: 11-19, as identified in Table 5, corresponding to peptide structures PS-1 to PS-10 in Table 1. The peptide structure or product ion can include an amino acid sequence having at least 90% sequence identity to any one of SEQ ID NOS: 14, 15, 31-46, as identified in Table 5, corresponding to various ones of peptide structures PS-5 and PS-11 to PS-34 in Table 2. The peptide structure or product ion can include an amino acid sequence having at least 90% sequence identity to any one of SEQ ID NOS: 11, 14, 15, 31,32, 33, 34, 37, 38, 40, 42, 44, 45, 46, 53-65, as identified in Table 5, corresponding to various ones of peptide structures PS-1, PS-5, PS-11, PS- 15, PS-20, PS-25, PS-28, PS-29, PS-30, PS-31, PS-32, and PS-35 to PS-61 in Table 3. The peptide structure or product ion can include an amino acid sequence having at least 90% sequence identity to any one of SEQ ID NOS: 14, 18, 32, 33, 37, 39, 42, 45, 54, 56, 60, 68-77, as identified in Table 5, corresponding to various ones of peptide structures PS-4, PS-8, PS-18, PS-36, PS-37, PS-41, PS-56, PS-62 to PS-90 in Table 3B. The peptide structure or product ion can include an amino acid sequence having at least 90% sequence identity to any one of SEQ ID NOS: 101-125, corresponding to various ones of peptide structures in Table 3C or product ions in Table 4B. The peptide structure or product ion can include an amino acid sequence having at least 90% sequence identity to any one of SEQ ID NOS: 126-175, as identified in Table 5, corresponding to various ones of peptide structures PS-91 to PS-140 in Table 3D.
[0249] In some embodiments, the product ion is selected as one from a group consisting of product ions identified in Tables 4, 4B, and 4C including product ions falling within an identified m/z range of the m/z ratio identified in Tables 4, 4B, and 4C and characterized as having a precursor ion having an m/z ratio within an identified m/z range of the m/z ratio identified in Tables 4, 4B, and 4C. A first range for the product ion m/z ratio may be ±0.5. A second range for the product ion m/z ratio may be ±0.8. A third range for the product ion m/z ratio may be ±1.0. A first range for the precursor ion m/z ratio may be ±1.0; a second range for the precursor ion m/z ratio may be (±1.5). Thus, a composition may include a product ion having an m/z ratio that falls within at least one of the first range (±0.5), the second range (±0.8), or the third range (±1.0) of the product ion m/z ratio identified in Tables 4, 4B, and 4C, and characterized as having a precursor ion having an m/z ratio that falls within at least one of first range (±0.5), a second range (±1.0), or a third range (±1.0 of the precursor ion m/z ratio identified in Tables 4, 4B, and 4C.
Table 4: Mass Spectrometry-Related Characteristics for the Peptide Structures associated with Ovarian Cancer (e.g., EOC)
Figure imgf000065_0001
Figure imgf000066_0001
Table 4B: Mass Spectrometry-Related Characteristics for the Peptide Structures associated with Ovarian Cancer (e.g., EOC) - in accordance with Table 3C
Figure imgf000066_0002
Figure imgf000067_0001
Table 4C: Mass Spectrometry-Related Characteristics for the Peptide Structures associated with Ovarian Cancer (e.g., EOC) - in accordance with Table 3D
Figure imgf000067_0002
Figure imgf000068_0001
[0250] Tables 4, 4B, and 4C show various parameters associated with the identification of the peptide and glycopeptides using LC and MRM-MS. The retention time (RT) represents the amount of time in minutes for the peptide elute from the chromatography column. The collision energy represents the energy applied to the peptide for creating fragments (i.e., product ions) such as, for example, in the 2nd quadrupole of the triple quadrupole MS. The first precursor m/z represents a ratio value associated with an ionized form having a first precursor charge for the peptide or glycopeptide. Similarly, the second precursor m/z represents a ratio value associated with an ionized form having a second precursor charge for the peptide or glycopeptide. The first precursor ion is associated with a first product ion having a m/z ratio that was formed from a collision and the second precursor ion is associated with a second product ion having a m/z ratio that was formed from a collision. Under certain circumstances, the first precursor and the second precursor may be the same, but the associated first and second product m/z ratios are different.
[0251 ] Table 5 defines the peptide sequences for SEQ ID NOS: 11-19, 31-46, 53-65, 68- 77, and 126-175 from at least one of Tables 1, 2, 3, 3B, 3C, and 3D. Table 5 further identifies a corresponding protein SEQ ID NO. for each peptide sequence.
Table 5: Peptide SEQ ID NOS
Figure imgf000069_0001
Figure imgf000070_0001
Figure imgf000071_0001
Figure imgf000072_0001
[0252] Table 6 identifies the proteins of SEQ ID NOS: 1-10, 20-30, 47-52, and 66-67 from at least of one of Tables 1, 2, 3, 3B, 3C, and 3D. Table 6 identifies a corresponding protein abbreviation and protein name for each of protein SEQ ID NOS: 1-10, 20-30, 47-52, and 66- 67. Further, Table 6 identifies a corresponding Uniprot ID and protein sequence for each of protein SEQ ID NOS: 1-10, 20-30, 47-52, and 66-67.
Table 6: Protein SEQ ID NOS
Figure imgf000072_0002
Figure imgf000073_0001
Figure imgf000074_0001
Figure imgf000075_0001
Figure imgf000076_0001
Figure imgf000077_0001
Figure imgf000078_0001
Figure imgf000079_0001
Figure imgf000080_0001
Figure imgf000081_0001
Figure imgf000082_0001
Figure imgf000083_0001
Figure imgf000084_0001
Figure imgf000085_0001
[0253] Table 7 identifies and defines the glycan symbol structures included in Tables 1, 2, 3, 3B, 3C, and 3D. Table 7 identifies a coded representation of the composition for each glycan structure included in Tables 1, 2, 3, 3B, 3C, and 3D. As used herein, the 4-digit GL NO. is a designation that represents the number of hexoses, the number of HexNAcs, the number of Fucoses, and the number of Neuraminic Acids. It should be noted that glycan structure GL No 1102 is an O-glycan and the remaining glycans of Table 7 were N-glycans.
Table 7: Glycan Structure GL NOS: Composition
Figure imgf000085_0002
Figure imgf000086_0001
Figure imgf000087_0001
Figure imgf000088_0001
Figure imgf000089_0001
Figure imgf000090_0001
Figure imgf000091_0002
Legend for Table 7
Figure imgf000091_0001
[0254] Table 7 illustrates the symbol structure and composition of detected glycan moi eties that correspond to glycopeptides of Table 1, 2, 3, 3B, 3C, and 3D based on the Glycan GL NO. The term Symbol Structure illustrates a geometric linking structure of the carbohydrates where the bottommost carbohydrate such as N-acetylglucosamine is bound to the designated amino acid for an N-linked glycan and the rightmost carbohydrate such as N- acetylgalactosamine is bound to the designated amino acid for an O-linked glycan. It should be noted that the Glycan Structure GL NO 1102 is an O-linked glycan and that the rest of the glycans in Table 7 are N-linked glycans. For reference, N-linked glycans have a glycan attached to the amino acid asparagine and O-linked glycans have a glycan attached to either a serine or a threonine.
[0255] The identity of the various monosaccharides is illustrated by the Legend section located at the end of Table 7. The abbreviations of the Legend are Glc that represents glucose and is indicated by a dark circle, Gal that represents galactose and is indicated by an open circle, Man that represents mannose and is indicated by a circle with intermediate grey shading, Fuc that represents fucose and is indicated by a dark triangle, Neu5Ac that represents N- acetylneuraminic acid and is indicated by a dark diamond, GlcNAc that represents N- acetylglucosamine and is indicated by a dark square, GalNAc that represents N- acetylgalactosamine and is indicated by an open square, and ManNAc that represents N- acetylmannosamine and is indicated by a square with intermediate grey shading.
[0256] The term Composition refers to the number of various classes of carbohydrates that make up the glycan. The quantity for each class of carbohydrate is depicted as a number in parenthesis to the right of an abbreviation that corresponds to the class of the carbohydrate. The abbreviations for these clasess are Hex, HexNAc, Fuc, and NeuAc that respectively correspond to hexose, N-acetylhexosamine, fucose, and N-acetylneuraminic acid. It should be noted that hexose sugars include glucose, galactose, and mannose; and N-acetylhexosamine sugars includes N-acetylglucosamine, N-acetylgalactosamine, and N-acetylmannosamine. In various embodiments, the terms Neu5Ac, NeuAc, and N-acetylneuraminic acid may be referred to as sialic acid.
[0257] Referring back to Table 7, for some entries, there are two symbol structures provided for one Glycan Structure GL NO such as, for example, Glycan Structure GL NO 3510. Thus, the identify of a peptide that references a Glycan Structure GL NO that has two symbol structures could be either one of the two possibilities based on the MRM of the LC- MS analysis. In some instances, a bracket symbol is used as part of the Symbol Structure to indicate that the precise bonding linkage is not exactly known, but that the linking line segment is attached to one of the plurality of adj acent carbohydrates immediately adj acent to the bracket. For example, the fucose of Glycan Structure GL NO 3510 could have either a core fucose or an outer-arm fucose linkage. In some instances, for fucosylated glycans illustrated in Table 7, the fucose orientation of either core or outer-arm linkage can be specified.
[0258] It should be noted that glycan symbol structure can illustrate an antennary format in the form of branches. For example, Glycan Structure GL NO’s 6513 and 7604 show a tri- antennary and tetra-antennary sialic acid format, respectively.
[0259] Aspects of the disclosure include kits comprising one or more compositions, each comprising one or more peptide structures of the disclosure that can be used as assay standards, and instructions for use. Kits in accordance with one or more embodiments described herein may include a label indicating the intended use of the contents of the kit. The term “label” as used herein with respect to a kit includes any writing, or recorded material supplied on or with a kit, or that otherwise accompanies a kit.
[0260] The peptide structures and the transitions produced therefrom, as described herein, may be useful for diagnosing and treating an ovarian cancer disease state. A transition includes a precursor ion and at least one product ion grouping. As reviewed herein, the peptide structures in Tables 1, 2, 3, 3B, 3C, and 3D as well as their corresponding precursor ion and product ion groupings in Tables 4, 4B, and 4C (these ions having defined m/z ratios or m/z ratios that fall within the m/z ranges identified herein), can be used in mass spectrometry-based analyses to diagnose and facilitate treatment of diseases, such as, for example, PC.
[0261 ] Aspects of the disclosure include methods for analyzing one or more peptide structures, as described herein. In some embodiments, the methods involve processing a sample from a patient to generate a prepared sample that can be inputted into a mass spectrometry system (e.g., a reaction monitoring mass spectrometry system). In certain embodiments, processing the sample can comprise performing one or more of a denaturation procedure, a reduction procedure, an alkylation procedure, and a digestion procedure. The denaturation and reduction procedures may be implemented in a manner similar to, for example, denaturation and reduction 202 in Figure 2. The alkylation procedure may be implemented in a manner similar to, for example, alkylation procedure 204 in Figure 2. The digestion procedure may be implemented in a manner similar to, for example, digestion procedure 206 in Figure 2.
[0262] In some embodiments, the methods for analyzing one or more peptide structures involve detecting a set of product ions generated by a reaction monitoring mass spectrometry system in which one or more product ions may correspond to each of the one or more peptide structures that have been inputted into the mass spectrometry system. As described herein, each peptide structure can be converted into a set of product ions having a defined m/z ratio, as provided in Tables 4, 4B, and 4C or an m/z ratio within an identified m/z ratio as provided in Tables 4, 4B, and 4C. In some embodiments, the methods involve generating quantification (e.g., abundance) data for the one or more product ions detected using the reaction monitoring mass spectrometry system.
[0263] In some embodiments, the methods further comprise generating a diagnosis output using the quantification data and a model that has been trained using supervised or unsupervised machine learning. In certain embodiments, the reaction monitoring mass spectrometry system may include multiple/selected reaction monitoring mass spectrometry (MRM/SRM-MS) to detect the one or more product ions and generate the quantification data. VIII. Representative Experimental Results
VIII. A. Exemplary Retrospective Analysis VIII. A.1. Sample Acquisition
[0264] Figure 8 is a table describing the distribution of the samples acquired in this exemplary retrospective analysis in accordance with one or more embodiments. As shown in Figure 8, serum samples were acquired from a commercial biobank for 151 women with benign pelvic masses, 145 women with malignant epithelial ovarian cancer (EOC), and 55 healthy controls. Information on stage of EOC was available in 98 of the 145 patients with EOC. All samples were obtained prior to therapeutic intervention. Information on the benign or malignant nature of tumors was based on histopathological analysis of tissue specimens.
VIII. A.2. Sample Processing
[0265] Sample processing involved pooled human serum/plasma (e.g., glycoprotein standards purified from human serum/plasma) for assay normalization, dithiothreitol (DTT), and iodoacetamide (IAA), sequencing-grade trypsin, LC-MS-grade water and acetonitrile, and formic acid (LC-MS grade). Serum samples were treated with DTT and IAA to reduce disulfide bonds and to inhibit cysteine proteases, respectively, followed by digestion with trypsin at 37°C for 18 hours. The digestion was quenched by adding formic acid to each sample to a final concentration of 1% (v/v).
(0266] LC-MS analysis included separating digested serum samples over an Agilent ZORBAX Eclipse Plus C18 column (2.1 mm x 150 mm i.d., 1.8 pm particle size) using an Agilent 1290 Infinity UHPLC system. The mobile phase A consisted of 3% acetonitrile, 0.1% formic acid in water (v/v), and the mobile phase B of 90% acetonitrile 0.1% formic acid in water (v/v), with the flow rate set at 0.5 mL/minute. The binary solvent composition was set at 100% mobile phase A at the beginning of the run, linearly shifting to 20% B at 20 minutes, 30% B at 40 minutes, and 44% B at 47 minutes. The column was flushed with 100% B and equilibrated with 100% A for a total run time of 70 minutes. After electrospray ionization, operated in positive ion mode, samples were injected into an Agilent 6495B triple quadrupole MS operated in dynamic multiple reaction monitoring (dMRM) mode. The MRM transitions comprised 513 glycopeptide structures which were normalized by comparing them with the abundance of 71 non-glycosylated peptide structures, representing each of 71 proteins from which the glycopeptides monitored were derived. Samples were injected randomized as to underlying phenotype, and reference pooled serum digests were injected interspersed with study samples. VIII. A.3. Data Analysis
[0267] Analysis resulted in 683 peptide structures (both peptide and glycopeptide isoforms) being reflected by 1106 MRM transitions, representing 71 high-abundance (concentrations
Figure imgf000095_0001
of 10 μg/ml) serum glycoproteins. Our transition list consisted of glycopeptides and non-glycosylated peptides from each glycoprotein. A spectrogram feature recognition and integration software based on recurrent neural networks was used to integrate chromatogram peaks and to obtain molecular abundance quantification for each peptide structure.
[0268] Normalized abundances of peptide structures, corrected for within-run drift, were assessed in samples from healthy controls, patients with benign pelvic tumors and those with EOC. Raw abundances were normalized by using spiked-in heavy-isotope-labeled internal standards with known peptide concentrations. The calculation relies either on relative abundance or on site occupancy, i.e., on the fractional abundance across all glycans observed at that site. Log-transformed concentration-normalized data for 501 glycopeptide structures (452 of which are based on on-site occupancy and 49 on relative abundance) and for 70 aglycosylated peptide structures were ultimately used for the analysis, totaling 571 unique peptide structures. Fold changes for individual peptide structures were calculated on normalized abundances of healthy (control) vs. EOC samples and benign tumor vs. EOC samples. False discovery rates (FDR) were calculated using the Benjamini -Hochberg method. Principal component analysis (PCA) was performed on log-concentration-normalized abundances of glycopeptide structures to investigate differences among the three phenotypes (e.g., healthy control, EOC, and benign pelvic tumor) studied. Prior to performing PCA, normalized abundances were scaled such that the distributions of all biomarkers were Gaussian with zero mean and unit variance.
[0269] To compare any two phenotypes, age-adjusted linear regression was used on a feature-by-feature basis with phenotype serving as the sole binary independent variable. Correcting for multiple comparisons, differences of any biomarker among phenotype groups compared were considered statistically significant where the FDR was less than 0.05. Examples of features include relative abundance (or normalized relative abundance), concentration (or normalized concentration), and site occupancy (fractional abundance across all glycans observed at the corresponding linking site of the corresponding peptide sequence). [0270] For supervised multivariate modeling, a total of 1084 features (571 concentration, 49 relative abundance, and 464 site occupancy features) were log-transformed and split into a training set formed by 80% of all samples from women with benign pelvic tumors and EOC, and a testing set formed by the remaining 20% of these women and all healthy controls. To perform binary classification and predict the probability of EOC, repeated five-fold cross- validated LASSO-regularized logistic regression was used with hyperparameters tuned to prevent overfitting and promote balanced sensitivity and specificity metrics. Training of the binary classification model was performed using the subset of the 1084 total features having low coefficients of variation (<20%) in pooled serum replicates. This subset included 976 features, with each feature being a concentration, relative abundance, or site occupancy for a corresponding peptide structure and where some peptide structures correspond with multiple features. For example, a given peptide structure may be associated with one, two, or three features within the subset of the 976 features.
VIII.A.4. Results
[0271] Normalized abundances of 428 peptide structures were found to display statistically significantly different abundances (FDR < 0.05) in samples of patients with benign pelvic tumors and samples of patients with EOC. 139 peptide structures had statistically significant abundance differences between benign vs. early stage (e.g., stage 1 or 2) EOC. 412 peptide structures had statistically significant abundance differences between benign vs. late stage (e.g., stage 3 or 4) EOC, 137 of which overlapped with those for benign v. early stage. When comparing samples of healthy controls with samples from all EOCs, benign tumors, early stage (e.g., stagel or 2) EOC, and late stage (e.g., stage 3 or 4) EOC, statistically significant abundances were found for 386, 149, 215, and 365 markers, respectively. 120 peptide structures were found to be statistically significantly differentially abundant in healthy controls vs. patients with benign pelvic tumors, and in healthy control vs. EOC. 200 peptide structures were found to be statistically significantly differentially abundant in in healthy control vs. early stage EOC and healthy control vs. late stage EOC. Lastly, of the 428 and 386 markers that were found statistically significantly differentially expressed between EOC vs. benign pelvic tumors and EOC vs. healthy controls, respectively, 328 were shared.
[0272] Figure 9 is a plot diagram illustrating the results of a principal component analysis performed to assess the segregation between healthy, benign pelvic tumor, and EOC samples across first and second principal components in accordance with one or more embodiments. Generally, EOC samples segregated distinctly from healthy control samples, while most benign pelvic tumors did not segregate as distinctly from healthy control samples. [0273] Figure 10 is a plot diagram illustrating the results of a principal component analysis performed to assess segregation between healthy, benign pelvic tumor, early EOC, late EOC, and missing (undocumented) samples). Generally, EOC samples (and in particular late stage EOC samples) segregated distinctly from healthy control samples, while most benign pelvic tumors did not segregate as distinctly from healthy control samples.
VIII. A.5. Results in Context of Screening for Malignant EOC
[0274] To assess the suitability of serum glycoproteomics in the context of screening for malignant EOC, a multivariable model was built to predict EOC vs. healthy status. This multivariable model is a supervised machine learning model that includes a logistic regression model, the logistic regression model including a LASSO regression model. Repeated cross- validation in the training set established the optimal LASSO hyperparameter (lambda = 0.0608, cross-validated Fl = 0.971). Applying this amount of shrinkage to the panel of 976 features resulted in a logistic model with 10 peptide structures with non-zero coefficients.
[0275] Figure 11 is an illustration of a receiver operating characteristic (ROC) diagram corresponding to the multivariable model built to predict malignancy v. benign status of pelvic tumors in accordance with one or more embodiments. The multivariable model achieved high accuracy in both the training set (accuracy = 0.975, sensitivity = 0.983, specificity = 0.955) and the test set (accuracy = 0.976, sensitivity = 0.967, specificity = 1.0). Further, ROC analysis demonstrated strong performance across a range of cutoffs, and little overfitting, with the training AUC (area under the curve) = 0.999 and test AUC = 0.997.
[0276] Thus, the multivariable model that was built may be used accurately and reliably to malignant EOC and distinguish such malignancy from a healthy status. Such diagnostic power may be used to reduce the need for unnecessary invasive testing. Further, such diagnostic information can be used to identify patients with EOC earlier, which may lead to earlier treatment, improved treatment recommendations, and improved treatment plans.
[0277] Figure 12 is an illustration of a diagram showing the probability distributions for the various groups using the multivariable model for predicting malignancy v. benign status of pelvic tumors in accordance with one or more embodiments. As shown in Figure 12, the probability distributions for benign pelvic tumor, healthy, missing (undocumented), stage 1 EOC, stage 2 EOC, stage 3 EOC, and stage 4 EOC samples increased with cancer stage, with probability distributions being similar across training and test sets. Notably, applying the built multivariable model to healthy patients, who were not utilized in the training, resulted in few misclassifications and a spread nearly equivalent to that of the benign pelvic tumor cases. Such results indicate that the glycoproteomic signature of the solidly predicts malignancy and severity of disease.
[0278] Table 8 below provides the fold changes, FDRs, and p-values for the 10 peptide structures PS-1 to PS- 10 (same as those in Table 1 above) based on differential expression analysis (DEA). The peptide structures PS-1 to PS- 10 are ordered both in Table 1 and in Table 8 with respect to relative significance to the probability score generated by the model. More significant peptide structures had higher coefficients in the LASSO regression model, while less significant peptide structures had lower coefficients in the LASSO regression model. In other words, relative significance to the probability score decreased with decreasing coefficients. Further, each peptide structure is associated with a feature that was used for the model (relab = relative abundance; cone = concentration).
Table 8: Peptide Structure Markers for Regression Model to distinguish between Epithelial Ovarian Cancer and Healthy State
Figure imgf000098_0001
VIII. A.6. Results in Context of Triaging Pelvic Tumors
[0279] To assess the suitability of serum glycoproteomics in the context of clinically triaging pelvic tumors, a multivariable model was built to predict malignancy vs. benign status of such pelvic tumors. This multivariable model is a supervised machine learning model that includes a logistic regression model, the logistic regression model including a LASSO regression model. Repeated cross-validation in the training set established the optimal LASSO hyperparameter (lambda = 0.045, cross-validated Fl = 0.849). Applying this amount of shrinkage to the panel of 976 features resulted in a logistic model with 25 peptide structures with non-zero coefficients.
[0280] Figure 13 is an illustration of a receiver operating characteristic (ROC) diagram corresponding to the multivariable model built to predict malignancy v. benign status of pelvic tumors in accordance with one or more embodiments. The multivariable model achieved high accuracy in both the training set (accuracy = 0.869, sensitivity = 0.835 , specificity = 0.901) and the test set (accuracy = 0.867, sensitivity = 0.867, specificity = 0.867). Further, ROC analysis demonstrated strong performance across a range of cutoffs, and little overfitting, with the training AUC (area under the curve) = 0.953 and test AUC = 0.873.
[0281] Thus, the multivariable model that was built may be used accurately and reliably to triage pelvic tumors and distinguish those that are malignant from those that are benign. Such diagnostic power may be used to reduce the need for invasive testing (e.g., biopsy) prior to treatment can be administered. Further, such diagnostic information can be used to improve treatment recommendations and treatment plans (e.g., earlier treatment in the case of malignant EOC) and reduce indications for unnecessary treatment (e.g., no indication for surgery when the pelvic tumor is benign).
[0282] Figure 14 is an illustration of a diagram showing the probability distributions for the various groups using the multivariable model for predicting malignancy v. benign status of pelvic tumors in accordance with one or more embodiments. As shown in Figure 12, the probability distributions for benign pelvic tumor, healthy, missing (undocumented), stage 1 EOC, stage 2 EOC, stage 3 EOC, and stage 4 EOC samples increased with cancer stage, with probability distributions being similar across training and test sets. Notably, applying the built multivariable model to healthy patients, who were not utilized in the training, resulted in few misclassifications and a spread nearly equivalent to that of the benign pelvic tumor cases. Such results indicate that the glycoproteomic signature of the 25 peptide structures for the LASSO regression model solidly predict malignancy and severity of disease.
[0283] Table 9 below provides the fold changes, FDRs, and p-values for the 25 peptide structures PS-5 and PS-11 to PS-34 (same as those in Table 2 above) based on differential expression analysis (DEA). The peptide structures PS-5 and PS-11 to PS-34 are ordered both in Table 2 and in Table 9 with respect to relative significance to the probability score generated by the model. More significant peptide structures had higher coefficients in the LASSO regression model, while less significant peptide structures had lower coefficients in the LASSO regression model. In other words, relative significance to the probability score decreased with decreasing coefficients. Further, each peptide structure is associated with a feature that was used for the model (relab = relative abundance; cone = concentration).
Table 9: Peptide Structure Markers for Regression Model to distinguish between Epithelial Ovarian Cancer and Benign Pelvic Tumor
Figure imgf000100_0001
Figure imgf000101_0001
[0284] Table 10 below provides the fold changes, FDRs, and p-values for the 36 peptide structures PS-4, PS-8, PS-18, PS-36, PS-37, PS-41, PS-56, PS-62 to PS-90 (same as those in Table 3B above) based on differential expression analysis (DEA). The peptide structures PS- 4, PS-8, PS-18, PS-36, PS-37, PS-41, PS-56, PS-62 to PS-90 are ordered in Table 10 with respect to relative significance to the p value score generated by the model.
Table 10: Peptide Structure Markers for Regression Model to distinguish between late stage (3/4) EOC and early stage EOC (1/2) using the biomarkers of Table 3B.
Figure imgf000101_0002
Figure imgf000102_0001
Table 10B below provides the fold changes, FDRs, and p-values for the 25 peptide structures denoted by SEQ ID NO 101-125 (in accordance with Table 3C above) using differential expression analysis (DEA).
Table 10B: Peptide Structure Markers for Regression Model to distinguish between late stage (3/4) EOC and early stage EOC (1/2) using the biomarkers of Table 3C.
Figure imgf000102_0002
Figure imgf000103_0001
[ 0285] Table 10C below provides the fold changes, FDRs, and p-values for the 50 peptide structures denoted by SEQ ID NO 126-175 (in accordance with Table 3D above) using differential expression analysis (DEA). For this differential expression analysis, the subjects from EOC stages 1-4 were used for this analysis as shown in the table of Figure 8. EOC stages 1 and 2 (12 + 6) were combined to be the early stage cohort and EOC stages 3 and 4 (68 + 12) were combined to be the late stage cohort.
Table IOC: Peptide Structure Markers for Regression Model to distinguish between late stage
(3/4) EOC and early stage EOC (1/2) using the biomarkers of Table 3D.
Figure imgf000103_0002
Figure imgf000104_0001
[0286] Table 10D below provides the fold changes, FDRs, and p-values for the 12 peptide structures denoted by SEQ ID NO 131-134, 137, 139, 140, 143, 151, 165-167 (in accordance with Table 3D above) using differential expression analysis (DEA). For this differential expression analysis, the subjects from EOC stages 1-4 were used for this analysis as shown in the table of Figure 8. EOC stages 1 and 2 (12 + 6) were combined to be the early stage cohort and EOC stages 3 and 4 (68 + 12) were combined to be the late stage cohort.
Table 10D: Peptide Structure Markers for Regression Model to distinguish between late stage (3/4) EOC and early stage EOC (1/2) using 12 of the biomarkers of Table 3D.
Figure imgf000105_0002
[0287] The markers from Table 3B were used to train a regularized regression model (e.g., LASSO regression model). Coefficients for the regularized regression model (e.g., LASSO regression model) are provided in Table 11. Using the values of Table 11, a probability for one of the states can be determined by summing together the product of the concentration of each biomarker in the sample and the respective coefficient (of one column) and then adding the summation and the intercept to yield the logit of a probability score. For example, the logit of the probability, to which the inverse logit function can be applied, is equal to the following equation 1 (eq. 1).
Figure imgf000105_0001
where n = a number of biomarkers having a unique PS-ID No, i = an index number for each of the biomarkers, Table 11. Coefficients for Peptide Structure Markers for a Regression Model to distinguish between late stage (3/4) EOC and early stage EOC (1/2) based on the biomarkers of Table 3B
Figure imgf000106_0001
[0288] The markers from Table 3C were used to train a regularized regression model (e.g., LASSO regression model). Coefficients for the regularized regression model (e.g., LASSO regression model) are provided in Table 11B. Using the values of Table 11B, a probability for one of the states can be determined by summing together the product of the concentration of each biomarker in the sample and the respective coefficient (of one column) and then adding the summation and the intercept to yield the logit of a probability score (see equation 1).
Table 11B. Coefficients for Peptide Structure Markers for a Regression Model to distinguish between late stage (3/4) EOC and early stage EOC (1/2) based on the biomarkers of Table 3C
Figure imgf000107_0001
[0289] Using the model coefficients of Table 11B, a predicted probability was generated for early stage and late stage ovarian cancer samples showing a stratification in predicted probabilities between the two cohorts as is illustrated in Figure 16. Figure 17 illustrates a receiver-operating-characteristic (ROC) curve and the area under curve (AUC) for the regularized regression model (e.g., LASSO regression model) for early stage and late stage ovarian cancer samples using testing case data and training case data. Table 12 shows the accuracy, sensitivity, specificity and precision for the training data set and the testing data set. Table 13 shows the training accuracy and testing accuracy for the early stage and late stage cohort for ovarian cancer.
Table 12.
Figure imgf000108_0001
Table 13.
Figure imgf000108_0002
[0290] The markers from Table 3D were used to train a regularized regression model (e.g., LASSO regression model). Coefficients for the regularized regression model (e.g., LASSO regression model) are provided in Table 11C. Using the values of Table 11C, a probability for one of the states can be determined by summing together the product of the concentration of each biomarker in the sample and the respective coefficient (of one column) and then adding the summation and the intercept to yield the logit of a probability score (see equation 1).
Table 11C. Coefficients for Peptide Structure Markers for a Regression Model to distinguish between late stage (3/4) EOC and early stage EOC (1/2) based on the biomarkers of Table 3D
Figure imgf000108_0003
Figure imgf000109_0001
Figure imgf000110_0001
[0291 ] Using the model coefficients of Table 11C, a predicted probability was generated for early stage and late stage ovarian cancer samples showing a stratification in predicted probabilities between the two cohorts as is illustrated in Figure 20. In various embodiments, predicted probability can be generated for classifying early stage and late stage ovarian cancer samples using the markers with non-zero coefficients such as SEQ ID NO’s 130-135, 137, 139, 140, 143, 148, 149, 155, 158-162, 166, and 171. A logistic regression model was used with the glycopeptides of Table 3D where the glycopeptides had 1 or more sialic acids and zero or more fucosylations for the early and late stage EOC cohorts. The glycopeptides that included fucose were found to be associated with EOC. In addition, glycopeptides that included fucose and also carrying tri- and tetra-antennary glycan structure were found to be more strongly associated with EOC. For example, Figures 21A to 21E show that the relative abundance of tri- and tetra-antennary glycan structures in benign tumors, early-stage EOC and late-stage EOC showed an increase with the progression of the EOC disease. The numbers 6512, 6512, 7612, 7613, 7614 correspond to the five distinct glycans attached to the glycopeptides. Referring back to Figures 21A to 21C, the three leftmost bar graphs represent glycopeptides with tetra-antennary glycans with varying degrees of sialylation. The two rightmost bar graphs are Figures 21D and 21E and they represent glycopeptides with tri-antennary glycans with two or three sialic acids. The asterisk(s) in Figures 21 A and 21E represent the following - * p- value<=0.05, ** p-value<=0.01, *** p-value<=0.001, and **** p-value<=0.0001). It should be noted that the horizontal bars on Figures 21A to 21E are used to show the statistical comparisons between the benign and late-stage cohorts (highest horizontal bar), early-stage and late-stage cohorts (middle horizontal bar), and the benign and early-stage cohorts (lowest horizontal bar),
[0292] Table 14 shows the accuracy, sensitivity, and specificity for the training data set and the testing data set.
Table 14.
Figure imgf000111_0001
[0293] In various embodiments, a specific subset of tri- and tetra-antennary fucosylated N-glycopeptides were identified that can be used to differentiate between early- and late- stage ovarian cancer. In particular, the fucose portion of the specific subset of tri- and tetra- antennary fucosylated N-glycopeptides were found to have an outer arm position. It should be noted that fucose can be bound to a glycan in a core fucosylation or outer-arm orientation. Core fucosylation is a modification of a N-glycan core structure, forming the al, 6 fucosylation of the GlcNAc residue linked to the asparagine, that is catalyzed by FUT8. A fucose in the outer-arm orientation is attached to the antennae of the complex type N-glycans by a-(l-3/4) linkage to the GlcNAc residues or by a-(l-2) linkage to galactose.
[0294] To elucidate the site of fucosylation in these markers, high-resolution LC-MS/MS analysis was used. A subset of 25 samples from the original set (5 each of from individuals with benign, stage I, II, III, or IV tumors) were chosen for high resolution MS analysis using serum. Samples were digested with trypsin as described above and enriched using an Oasis MAX 96-well plate (30 pm particle size, 30 mg sorbent weight). Each well was equilibrated with 100% ACN, followed by incubation with 100 mM tri ethylammonium acetate and 95% ACN, 1% TFA in water, respectively. Samples were transferred to designated wells and were allowed to flow through. Each well was washed with a solution of 95% ACN, 1% TFA in water, and samples were eluted with 50% ACN, 0.1% TFA in water. The concentration of each sample was determined by NanoDrop (Thermo Scientific). Samples were concentrated by SpeedVac (Thermo Scientific) and reconstituted with 0.1% FA in water. Samples were transferred to vials for LC-MS/MS analysis using an Orbitrap Exploris 480 instrument (Thermo Fisher Scientific).
[0295] Tryptic glycopeptides of commonly abundant plasma glycoproteins were identified using high resolution mass spectrometry and analyzing collision induced fragmentation. The oxonium ion mass spectra from the fragmentation of glycopeptides were analyzed to determine the fucose orientation. Based on the mass of the various breakdown fragments of the glycopeptide-derived glycans, a determination was made if the fucose was a “core” fucose, “antennary” fucose (also referred to as an outer-arm fucose), “mixed” (i.e. containing a mixture of both “core” and “antennary” fucose), or “not-identified”.
[0296] Figure 22 is a representative figure of a mass spectra with m/z represented on the X-axis and intensity (and therefore abundance) represented on the Y-axis. Arrows indicate the breakdown products indicating the fucose is on the outer-arm (purple diamond - sialic acid, yellow circle - galactose, blue square - N-acetylglucosamine, red triangle - fucose, green circle - mannose). It is worth noting that there is a 4 glycan breakdown fragment composed of sialic acid, galactose, N-acetylglucosamine, and fucose (m/z value of 803.294). In the 4 glycan breakdown fragment, the sialic acid is connected to galactose, galactose is connected to N- acetylglucosamine, and N-acetylglucosamine is connected to fucose. The 4 glycan breakdown fragment represents a single antennary branch having a fucose in an outer arm fucose position where the aggregate glycan was cleaved at a linkage between a mannose and a N- acetylglucosamine.
[0297] Referring back to Figure 22, there is another breakdown fragment that has 3 glycans that is the child of the 4 glycan breakdown fragment with a sialic acid removed. The 3 glycan breakdown fragment includes galactose, N-acetylglucosamine, and fucose (m/z value of 512.198). In the 3 glycan breakdown fragment, the galactose is connected to N- acetylglucosamine, and N-acetylglucosamine is connected to fucose. The presence of the 4 glycan breakdown fragment and the 3 glycan breakdown fragment as shown in Figure 22 indicates the presence of outer arm fucosylation. In addition, it should also be noted that no fragments consistent with a core fucose orientation were measured. [0298] Referring back to Table 11C, the SEQ ID NOS. 131, 137, 143, 155, 158, 159, 162, 166, and 171 correspond to glycopeptides that each have a non-zero coefficient along with one fucose. Using high resolution mass spectroscopy, it was determined that SEQ ID NO 131, 137, 143, 155, 159, 162, 166, and 171 each correspond to a glycoepeptide having an outer arm fucosylation format. In various embodiments, glycopeptide biomarkers with outer arm fucosylation can provide better prediction of ovarian cancer disease states.
[0299] The subset of markers that had predominantly tri and tetraantennary glycopeptides from Table 3D were used to train a regularized regression model (e.g., LASSO regression model). Coefficients for the regularized regression model (e.g., LASSO regression model) are provided in Table 11D. Using the values of Table 11D, a probability for one of the states can be determined by summing together the product of the concentration of each biomarker in the sample and the respective coefficient (of one column) and then adding the summation and the intercept to yield the logit of a probability score (see equation 1).
Table 11D. Coefficients for Peptide Structure Markers for a Regression Model to distinguish between late stage (3/4) EOC and early stage EOC (1/2) based on 12 of the biomarkers of Table 3D
Figure imgf000113_0001
[0300] Using the model coefficients of Table 11D, a predicted probability can be generated for early stage and late stage ovarian cancer samples showing a stratification in predicted probabilities between the two cohorts. In various embodiments, predicted probability can be generated for classifying early stage and late stage ovarian cancer samples using the markers
- I l l - with non-zero coefficients. A logistic regression model was used with a subset of the glycopeptides of Table 3D where the glycopeptides had 1 or more sialic acids and zero or more fucosylations for the early and late stage EOC cohorts. The glycopeptides that included fucose were found to be associated with EOC. In addition, glycopeptides that included fucose and also carrying tri- and tetra-antennary glycan structure were found to be more strongly associated with EOC.
[0301] Table 15 shows the accuracy, sensitivity, and specificity for the training data set and the testing data set. The relative performance of Table 15 is better than Table 14 indicating that the subset of biomarkers using predominantly tri and tetra antennary glycans generated a better model for determining early vs late stage EOC.
Table 15.
Figure imgf000114_0001
VIII.B. Exemplary Retrospective & Prospective Analysis
[0302] A validation study was conducted using both retrospective patient samples and samples collected prospectively in the ongoing Clinical Validation of the InterVenn Ovarian CAncer Liquid Biopsy (VOCAL) study. Samples included those from patients with malignant EOC and patients with benign pelvic tumors. Samples were processed in a manner similar to the manner described for the Exemplary Retrospective Analysis in Section VII. A above.
[0303] A logistic regression model was built identifying a panel of 38 peptide structures (same as those in Table 3 above). This panel of 38 peptide structures had an overall predictive accuracy of over 86% for the prediction of malignancy versus benign status of pelvic tumors.
[0304] Table 10 below provides the fold changes and p-values for the 38 peptide structures also identified in Table 3 above based on differential expression analysis (DEA). These peptide structures are ordered both in Table 3 and in Table 10 with respect to relative significance to the probability score generated by the model based on p-values. In this context, more significant peptide structures have lower p-values, while less significant peptide structures have higher p-values. In other words, relative significance to the probability score decreased with increasing p-values. IX. Additional Considerations
[0305] Any headers and/or sub-headers between sections and subsections of this document are included solely for the purpose of improving readability and do not imply that features cannot be combined across sections and subsection. Accordingly, sections and subsections do not describe separate embodiments.
[ 0306] While the present teachings are described in conjunction with various embodiments, it is not intended that the present teachings be limited to such embodiments. On the contrary, the present teachings encompass various alternatives, modifications, and equivalents, as will be appreciated by those of skill in the art. The present description provides preferred exemplary embodiments, and is not intended to limit the scope, applicability or configuration of the disclosure. Rather, the present description of the preferred exemplary embodiments will provide those skilled in the art with an enabling description for implementing various embodiments.
[0307] It is understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope as set forth in the appended claims. Thus, such modifications and variations are considered to be within the scope set forth in the appended claims. Further, the terms and expressions which have been employed are used as terms of description and not of limitation, and there is no intention in the use of such terms and expressions of excluding any equivalents of the features shown and described or portions thereof, but it is recognized that various modifications are possible within the scope of the invention claimed.
[0308] In describing the various embodiments, the specification may have presented a method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the various embodiments.
[0309] Some embodiments of the present disclosure include a system including one or more data processors. In some embodiments, the system includes a non-transitory computer readable storage medium containing instructions which, when executed on the one or more data processors, cause the one or more data processors to perform part or all of one or more methods and/or part or all of one or more processes disclosed herein. Some embodiments of the present disclosure include a computer-program product tangibly embodied in a non-transitory machine-readable storage medium, including instructions configured to cause one or more data processors to perform part or all of one or more methods and/or part or all of one or more processes disclosed herein.
[0310] Specific details are given in the present description to provide an understanding of the embodiments. However, it is understood that the embodiments may be practiced without these specific details. For example, circuits, systems, networks, processes, and other components may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known circuits, processes, algorithms, structures, and techniques may be shown without unnecessary detail in order to avoid obscuring the embodiments.

Claims

CLAIMS What is claimed is:
1. A method for diagnosing a subject with respect to an ovarian cancer disease state, the method comprising: receiving peptide structure data corresponding to a biological sample obtained from the subject; analyzing the peptide structure data using a supervised machine learning model to generate a disease indicator that indicates whether the biological sample evidences the ovarian cancer disease state of having early stage or late stage ovarian cancer based on at least one peptide structure selected from one of a group of peptide structures identified in Tables 3B, 3C, or 3D, generating a diagnosis output based on the disease indicator.
2. The method of claim 1, wherein the disease indicator comprises a score.
3. The method of claim 2, wherein generating the diagnosis output comprises: determining that the score falls above a selected threshold; and generating the diagnosis output based on the score falling above the selected threshold, wherein the diagnosis output includes a classification of late stage ovarian cancer disease state.
4. The method of claim 2, wherein generating the diagnosis output comprises: determining that the score falls below a selected threshold; and generating the diagnosis output based on the score falling below the selected threshold, wherein the diagnosis output includes a classification of early stage ovarian cancer disease state.
5. The method of claim 3 or claim 4, wherein the score comprises a probability score and the selected threshold is 0.5.
6. The method of claim 3 or claim 4, wherein the selected threshold falls within a range between 0.30 and 0.65.
7. The method of any one of claims 1-6, wherein analyzing the peptide structure data comprises: analyzing the peptide structure data using a binary classification model.
8. The method of any one of claims 1-7, wherein a peptide structure of the at least one peptide structure comprises a glycopeptide structure defined by a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3D, with the peptide sequence being one of SEQ ID NOS: 126-175 in Table 3D as defined in Table 5.
9. The method of any one of claims 1-8, further comprising: training the supervised machine learning model using training data, wherein the training data comprises a plurality of peptide structure profiles for a plurality of subjects and a plurality of subject diagnoses for the plurality of subjects, wherein the plurality of subject diagnoses includes a diagnosis for any subject of the plurality of subjects determined to have early stage or late stage ovarian cancer.
10. The method of any one of claims 1-9, further comprising: performing a differential expression analysis using initial training data to compare a first portion of the plurality of subjects diagnosed with the classification of early stage ovarian cancer disease state versus a second portion of the plurality of subjects diagnosed with the classification of late stage ovarian cancer disease state; and identifying a training group of peptide structures based on the differential expression analysis for use as prognostic markers for the ovarian cancer disease state; and forming the training data based on the training group of peptide structures identified.
11. The method of claim 9, wherein training the supervised machine learning model comprises reducing the training group of peptide structures to a final group of peptide structures identified in Tables 3B, 3C, or 3D.
12. The method of any one of claims 9-11, wherein each peptide structure profile of the plurality of peptide structure profiles includes a feature selected from one of a relative abundance and a concentration for a corresponding peptide structure.
13. The method of any one of claims 9-12, wherein the plurality of peptide structure profiles includes a first peptide structure profile with a relative abundance for a corresponding peptide structure and a second peptide structure profile with a concentration for the corresponding peptide structure.
14. The method of any one of claims 1-13, wherein the supervised machine learning model comprises a logistic regression model.
15. The method of any one of claims 1-14, wherein the first group of peptide structures in Table 3B, 3C, or 3D is used to distinguish between the ovarian cancer disease state being late stage or early stage.
16. The method of any one of claims 1-15, wherein the quantification data for a peptide structure of the set of peptide structures comprises at least one of an abundance, a relative abundance, a normalized abundance, a relative quantity, an adjusted quantity, a normalized quantity, a relative concentration, an adjusted concentration, or a normalized concentration.
17. The method of any one of claims 1-16, wherein the peptide structure data is generated using multiple reaction monitoring mass spectrometry (MRM-MS), wherein the using of the MRM-MS comprises: ionizing one or more glycopeptides to form ionized glycopeptides; filtering the ionized glycopeptides with a mass filter to form filtered glycopeptides; fragmenting the filtered glycopeptides in a collision chamber into product ions; and detecting the product ions.
18. The method of any one of claims 1-17, further comprising: preparing a sample of the biological sample using reduction, alkylation, and enzymatic digestion to form a prepared sample that includes a set of peptide structures.
19. The method of claims 1-18, wherein the subject has already been diagnosed as having ovarian cancer.
20. The method of any one of claims 1-19, wherein generating the diagnosis output comprises: generating a report identifying that the biological sample evidences the early stage or late stage ovarian cancer disease state.
21. The method of any one of claims 1-20, further comprising: generating a treatment output based on at least one of the diagnosis output or the disease indicator.
22. The method of claim 21, wherein the treatment output comprises at least one of an identification of a treatment to treat the subject or a treatment plan.
23. The method of claim 22, wherein the treatment comprises at least one of surgery, radiation therapy, a targeted drug therapy, chemotherapy, immunotherapy, hormone therapy, or neoadjuvant therapy.
24. The method of any one of claims 1-23, wherein the group of peptide structures in Tables 3B, 3C, or 3D is listed in order of relative significance to the disease indicator.
25. The method of any one of claims 1-24, further comprising: preparing a sample of the biological sample using reduction, alkylation, and enzymatic digestion to form a prepared sample that includes a set of peptide structures.
26. The method of claim 25, further comprising: generating the peptide structure data from the prepared sample using multiple reaction monitoring mass spectrometry (MRM-MS).
27. A method of training a model to diagnose a subject with respect to an ovarian cancer disease state being either early stage or late stage, the method comprising: receiving quantification data for a panel of peptide structures for a plurality of samples for a plurality of subjects, wherein the plurality of subjects includes a first portion diagnosed with a classification of the early stage ovarian cancer disease state and a second portion diagnosed with a classification of the late stage ovarian cancer disease state; wherein the quantification data comprises a plurality of peptide structure profiles for the plurality of subjects; and training a machine learning model using the quantification data to diagnose a biological sample with respect to the ovarian cancer disease state using a group of peptide structures associated with the ovarian cancer disease state, wherein the group of peptide structures is identified in Table 3B, 3C, or 3D.
28. The method of claim 27, wherein the machine learning model comprises a logistic regression model.
29. The method of claim 28, wherein the logistic regression model comprises a LASSO regression model.
30. The method of any one of claims 27-28, further comprising: identifying an initial plurality of peptide structure profiles; filtering the initial plurality of peptide structure profiles by a coefficient of variation to generate a plurality of peptide structure profiles for use in training the machine learning model.
31. The method of claim 30, wherein the filtering is performed to exclude peptide structure profiles having the coefficient of variation at or above 20%.
32. The method of claim 30, wherein training the machine learning model comprises reducing the plurality of peptide structure profiles using LASSO regression to identify a final group of peptide structures identified in Table 3B, 3C, or 3D.
33. The method of any one of claims 27-32, wherein the quantification data for the panel of peptide structures for the plurality of subjects diagnosed with the plurality of ovarian cancer disease states comprises at least one of an abundance, a relative abundance, a normalized abundance, a relative quantity, an adjusted quantity, a normalized quantity, a relative concentration, an adjusted concentration, or a normalized concentration.
34. The method of any one of claims 27-32, wherein the trained model uses a relative abundance for a first portion of the first group of peptide structures and a concentration for a second portion of the second group of peptide structures.
35. The method of any one of claims 27-34, wherein each peptide structure profile of the plurality of peptide structure profiles includes a feature selected from one of a relative abundance and a concentration for a corresponding peptide structure.
36. The method of any one of claims 27-35, wherein the plurality of peptide structure profiles includes a first peptide structure profile with a relative abundance for a corresponding peptide structure and a second peptide structure profile with a concentration for the corresponding peptide structure.
37. A composition comprising at least one of peptide structures identified in Table 3B, 3C, or 3D.
38. A method for diagnosing a subject with respect to an ovarian cancer disease state, the method comprising: analyzing the peptide structure data using a supervised machine learning model to generate a disease indicator that indicates whether a biological sample evidences the ovarian cancer disease state of having early stage or late stage ovarian cancer based on a group of glycopeptide structures, the group of glycopeptide structures comprising tri-antennary or tetra-antennary sialic acid moieties, wherein a portion of the glycopeptide structures of the group are fucosylated; generating a diagnosis output based on the disease indicator.
39. The method of claim 38, wherein the group of glycopeptide structures comprises at least one glycopeptide structure identified in Tables 3B, 3C, or 3D.
40. The method of claim 38, wherein the group of glycopeptide structures comprises at least three glycopeptide structures identified in Tables 3B, 3C, or 3D.
41. The method of claim 38, wherein the group of glycopeptide structures comprises at least five glycopeptide structures identified in Tables 3B, 3C, or 3D.
42. The method of claim 38, wherein the group of glycopeptide structures comprises at least ten glycopeptide structures identified in Tables 3B, 3C, or 3D.
43. The method of claim 38, wherein the peptide structure data was generated with a mass spectrometer using the biological sample obtained from the subject.
44. The method of any one of claims 38-43 further comprising: preparing a sample of the biological sample using reduction, alkylation, and enzymatic digestion to form a prepared sample that includes a set of peptide structures.
45. The method of claim 44, further comprising: generating the peptide structure data from the prepared sample using multiple reaction monitoring mass spectrometry (MRM-MS).
46. The method of any one of claims 38-45, wherein the peptide structure data is generated using multiple reaction monitoring mass spectrometry (MRM-MS), wherein the using of the MRM-MS comprises: ionizing one or more glycopeptides to form ionized glycopeptides; filtering the ionized glycopeptides with a mass filter to form filtered glycopeptides; fragmenting the filtered glycopeptides in a collision chamber into product ions; and detecting the product ions.
47. The method of any claim 17 or 24, wherein the peptide structure data is listed in Table 3D and the detected product ion comprises a first product having a m/z value listed in Table 4C
48. The method of any one of claims 1-7, wherein a peptide structure of the at least one peptide structure comprises a glycopeptide structure defined by a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3D, with the peptide sequence being one of SEQ ID NOS: 130-135, 137, 139, 140, 143, 148, 149, 155, 158-162, 166, and 171 in Table 3D as defined in Table 5.
49. The method of any one of claims 1-7, wherein a peptide structure of the at least one peptide structure comprises a glycopeptide structure defined by a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3D, with the peptide sequence being one of SEQ ID NOS: 130-135, 137, 139, 140, 143, 148, 149, 155, 159-162, 166, and 171 in Table 3D as defined in Table 5
50. The method of claim 49, wherein the glycan structure, corresponding to the peptide sequence of SEQ ID NOS: 131, 137, 143, 155, 159, 162, 166, and 171, includes a fucose and the fucose is in an outer arm orientation.
51. The method of any one of claims 1-7, wherein a peptide structure of the at least one peptide structure comprises a glycopeptide structure defined by a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3D, with the peptide sequence being one of SEQ ID NOS: 131, 137, 143, 155, 159, 162, 166, and 171 in Table 3D as defined in Table 5, wherein a fucose of the glycan structure comprises an outer arm orientation.
52. The method of any one of claims 50 or 51, wherein the outer arm orientation of the fucose comprises the fucose being linked to a N-acetylglucosamine by a a-(l-3/4) linkage.
53. The method of any one of claims 1-7, wherein the at least one peptide structure selected from one of a group of peptide structures is identified in Tables 3D.
54. The method of any one of claims 1-7, wherein a peptide structure of the at least one peptide structure comprises a glycopeptide structure defined by a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3D, with the peptide sequence being one of SEQ ID NOS: 131-134, 137, 139, 140, 143, 151, 165-167 in Table 3D as defined in Table 5.
55. The method of claim 49, wherein the glycan structure, corresponding to the peptide sequence of SEQ ID NOS: 131, 137, and 143, includes a fucose and the fucose is in an outer arm orientation.
56. The method of claim 55, wherein the outer arm orientation of the fucose comprises the fucose being linked to a N-acetylglucosamine by a a-(l-3/4) linkage.
57. A method of treating ovarian cancer in an individual, the method comprising: administering to the individual an ovarian cancer therapy, wherein the individual has been determined to be responsive to the ovarian cancer therapy via a trained machine learning classifier that distinguishes between responsive and non-responsive individuals who have received the ovarian cancer therapy, based at least in part on a group of peptide structures identified in Tables 3B, 3C, or 3D.
PCT/US2023/074251 2022-09-16 2023-09-14 Diagnosis of ovarian cancer using targeted quantification of site-specific protein glycosylation WO2024059750A2 (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US202263376053P 2022-09-16 2022-09-16
US63/376,053 2022-09-16
US202363489712P 2023-03-10 2023-03-10
US63/489,712 2023-03-10
US202363517859P 2023-08-04 2023-08-04
US63/517,859 2023-08-04

Publications (1)

Publication Number Publication Date
WO2024059750A2 true WO2024059750A2 (en) 2024-03-21

Family

ID=90275934

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/074251 WO2024059750A2 (en) 2022-09-16 2023-09-14 Diagnosis of ovarian cancer using targeted quantification of site-specific protein glycosylation

Country Status (1)

Country Link
WO (1) WO2024059750A2 (en)

Similar Documents

Publication Publication Date Title
CN104969071B (en) Method for assessing the presence or risk of colon tumor
JP6105491B2 (en) Collating cell-based assays and uses thereof
CN111148844A (en) Identification and use of glycopeptides as biomarkers for diagnosis and therapy monitoring
CN113439213A (en) Biomarkers for diagnosing ovarian cancer
JP2017520775A (en) Protein biomarker profile for detecting colorectal tumors
US20220310230A1 (en) Biomarkers for determining an immuno-onocology response
US20230112866A1 (en) Biomarkers for clear cell renal cell carcinoma
WO2024059750A2 (en) Diagnosis of ovarian cancer using targeted quantification of site-specific protein glycosylation
JP2023514809A (en) Biomarkers for diagnosing ovarian cancer
WO2023102443A2 (en) Diagnosis of pancreatic cancer using targeted quantification of site-specific protein glycosylation
US20230055572A1 (en) Biomarkers for diagnosing ovarian cancer
US11774459B2 (en) Biomarkers for diagnosing non-alcoholic steatohepatitis (NASH) or hepatocellular carcinoma (HCC)
WO2023075591A1 (en) Ai-driven glycoproteomics liquid biopsy in nasopharyngeal carcinoma
WO2023089597A2 (en) Predicting sarcoma treatment response using targeted quantification of site-specific protein glycosylation
US20230104536A1 (en) Systems and methods for glycopeptide concentration determination, normalized abundance determination, and lc/ms run sample preparation
WO2023154943A1 (en) De novo glycopeptide sequencing
CN116456895A (en) Biomarkers for diagnosing non-alcoholic steatohepatitis (NASH) or hepatocellular carcinoma (HCC)
CN117561449A (en) Biomarkers for determining immune oncologic response
WO2023154967A2 (en) Diagnosis of colorectal cancer using targeted quantification of site-specific protein glycosylation
WO2023193016A2 (en) Biomarkers for determining a cancer disease state, response to immuno-oncology, stages of fibrosis in non-alcoholic steatohepatitis, or application of age or sex related biomarker panel for quality control
WO2023019093A2 (en) Detection of peptide structures for diagnosing and treating sepsis and covid
KR20240062143A (en) Systems and methods for determining glycopeptide concentrations, determining normalized abundance, and preparing samples for LC/MS runs
WO2023147601A2 (en) Biomarkers for diagnosing preeclampsia
WO2023087004A2 (en) Methods of preparing and analyzing samples for biomarkers associated with placenta accreta

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23866505

Country of ref document: EP

Kind code of ref document: A2