WO2018213112A1 - Systems and methods for automated design of an analytical study for the structural characterization of a biologic composition - Google Patents

Systems and methods for automated design of an analytical study for the structural characterization of a biologic composition Download PDF

Info

Publication number
WO2018213112A1
WO2018213112A1 PCT/US2018/032217 US2018032217W WO2018213112A1 WO 2018213112 A1 WO2018213112 A1 WO 2018213112A1 US 2018032217 W US2018032217 W US 2018032217W WO 2018213112 A1 WO2018213112 A1 WO 2018213112A1
Authority
WO
WIPO (PCT)
Prior art keywords
biologic
analytical
stage
gbas
target
Prior art date
Application number
PCT/US2018/032217
Other languages
French (fr)
Inventor
Wael I. Yared
Kirtland G. Poss
Shiaw-Lin Wu
Original Assignee
Bioanalytix, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bioanalytix, Inc. filed Critical Bioanalytix, Inc.
Publication of WO2018213112A1 publication Critical patent/WO2018213112A1/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12PFERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
    • C12P19/00Preparation of compounds containing saccharide radicals
    • C12P19/14Preparation of compounds containing saccharide radicals produced by the action of a carbohydrase (EC 3.2.x), e.g. by alpha-amylase, e.g. by cellulase, hemicellulase
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12PFERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
    • C12P21/00Preparation of peptides or proteins
    • C12P21/06Preparation of peptides or proteins produced by the hydrolysis of a peptide bond, e.g. hydrolysate products
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/30Unsupervised data analysis
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2440/00Post-translational modifications [PTMs] in chemical analysis of biological material

Definitions

  • This invention relates generally to methods, systems, and architectures for facilitating analysis of biologies. More specifically, in certain embodiments, the techniques described herein facilitate the determination of procedures for obtaining structural characterizations of biologies.
  • Biologies are highly complex molecules whose detailed structural properties are critical to their ability to perform their desired function, as well as their stability over time in storage. Biologies can be expressed and refolded incorrectly based on a range of variations in the biosynthesis or manufacturing process, and can be degraded or chemically changed by proteases, heat, acidic or other environmental conditions to produce fragments and truncated molecules. Some biologies tend to form aggregates, which are inactive and sometimes immunogenic.
  • Biologies can be glycosylated at differing N-linked or O-linked sites, by different amounts, and/or with different sugars (e.g., they may vary by galactose content, afucosylation, sialic acid content, mannose content, etc.), and may include molecular species non-glycosylated at critical or uncritical locations.
  • Proper disulfide bonding within a molecule and/or between molecules typically is critical for efficacy, and wrongly paired or unpaired disulfide bonds can lead to inoperative misfolded contaminants or to aggregates.
  • the product also may be contaminated by one or more host cell proteins such as proteases, DNAs, methotrexate, or other residues from upstream expression, or with leached components from downstream purification.
  • host cell proteins such as proteases, DNAs, methotrexate, or other residues from upstream expression, or with leached components from downstream purification.
  • Biologies also may be deaminated, oxidized, methylated or otherwise modified.
  • the molecule may be altered after its release, in storage, or in vivo when exposed to blood-borne enzymes, physiological temperatures, and the like.
  • a master cell bank comprising replicable, recombinant clones that reliably express copious quantities of active biologic is only a beginning.
  • Upstream variables in culture of such cells such as culture duration, pH, amount of dissolved oxygen, concentrations and identities of media components, temperature, initial cell density, pCC>2, mixing and gassing strategy, and feeding strategy each may affect not only the quantitative protein yield, but also the structure of the product.
  • contaminants such as host cell proteins, metabolites and the like are inevitably introduced into extracellular broths, as are possible infective agents such as viruses.
  • the downstream purification process may introduce variants or contaminants that may alter protein structure.
  • the fine structure of the product can be affected by such aspects as the selection of separation technologies such as affinity columns, anionic or cationic exchange columns, or ultrafiltration apparatus. Also, contaminants may be introduced or product degraded or derivatized by the addition of preservatives, diluents, vehicle, as well as the decision as to when a chromatography resin or filter is replaced, and the temperature or pH of the product during purification, compounding, and storage.
  • separation technologies such as affinity columns, anionic or cationic exchange columns, or ultrafiltration apparatus.
  • contaminants may be introduced or product degraded or derivatized by the addition of preservatives, diluents, vehicle, as well as the decision as to when a chromatography resin or filter is replaced, and the temperature or pH of the product during purification, compounding, and storage.
  • the ability to accurately characterize the detailed fine structure of a biologic is essential to one's ability to effectively manufacture and create new biologies, as well as the ability to maintain structural and functional consistency of biologies from batch to batch.
  • critical features e.g., critical quality attributes, or CQAs
  • An analytical method for a given biologic may include, for example, the following stages, which correspond to experimental steps performed on a sample: extraction, purification, titration, denaturation, reduction and alkylation, fragmentation, separation, detection, mass analysis, and data analysis (e.g., for the purpose of determining molecular weight, amino acid sequence, amino acid modifications, post-translational modifications, higher order structures, and epitope mapping).
  • stages which correspond to experimental steps performed on a sample: extraction, purification, titration, denaturation, reduction and alkylation, fragmentation, separation, detection, mass analysis, and data analysis (e.g., for the purpose of determining molecular weight, amino acid sequence, amino acid modifications, post-translational modifications, higher order structures, and epitope mapping).
  • data analysis e.g., for the purpose of determining molecular weight, amino acid sequence, amino acid modifications, post-translational modifications, higher order structures, and epitope mapping.
  • analytical methods are designed for a given biologic by highly experienced experts
  • a study design and method capture technology that facilitates determining effective and improved analysis procedures for characterizing biologies.
  • the approaches described herein allow a user to input known or expected information describing the general structures of a target biologic to be analyzed, such as its nominal primary structure, along with study class attributes that identify desired structural characterizations to be obtained for the target biologic.
  • the described study design and method capture platform determines, based on the target biologic's generalizable attributes and the study class attributes, one or more study design results.
  • the determined study design results provide detailed procedures that, when implemented, allow or enhance the process of obtaining the desired structural characterization of the target biologic.
  • a given study design that, when implemented, allows a desired structural characterization of a target biologic to be obtained, comprises one or more analytical methods to be applied to a sample comprising the target biologic.
  • Each analytical method comprising a sequence of specific analytical stages corresponding to specific experimental steps.
  • the specific analytical method, or combination of multiple analytical methods used in a study design, as well as the specific sequence of analytical stages that each analytical method comprises depends in a complex fashion on underlying properties of the target biologic and the specific desired structural
  • the approaches described herein determine study design results for a given target biologic by mapping a set of known or expected attributes of the target biologic to records of analytical methods and/or analytical stages that have been previously applied in various structural characterization studies of known biologies (e.g., which may be different from the target biologic) and are stored in a database.
  • This database and the mapping between generalizable biologic attributes and the records it stores are collectively referred to herein as a method store.
  • the method store codifies domain knowledge and previous experience in applying various analytical methods to characterize biologies.
  • the manner in which data are stored in the method store goes beyond merely storing records of studies and analytical methods that were previously implemented to characterize various biologies.
  • the method store includes sets of generalizable biologic attributes (GBAs) that are determined for the known biologies that have been previously determined using analytical methods that are stored as records in the method store.
  • GBAs are determined via a preprocessing step and linked to the records of the various analytical methods that were used in its characterization.
  • GBAs are values representing various structural attributes and physio-chemical properties of a biologic and allow patterns of similarities between various different biologies to be identified.
  • the approaches described herein go beyond merely searching a database to identify and return study design results.
  • the systems and methods described herein may determine, for a given target biologic, a set of target GBAs that are used as features in machine learning algorithms that utilize the method store to identify relevant analytical methods (and analytical stages that they comprise) that can be applied to obtain a desired structural characterization of the target biologic.
  • the systems and methods allow for study design results to be determined (e.g., in an improved, faster, and more accurate manner) for target biologies that have not been characterized before.
  • the specific sequence of analytical stage as well as combinations of analytical methods included in a given determined study design result also do not need to have been previously performed and stored in the method store. Accordingly, the systems and methods described herein allow for unique study design results to be obtained.
  • the invention is directed to a method of automatically identifying analytical study design parameters for analysis of a target biologic, the method comprising: (a) receiving, by a processor of a computing device, an input query comprising one or more generalizable biologic attributes (GBAs) of [e.g., determined based on an amino acid sequence of; e.g., determined based on a molecule type of; e.g., based on known information, such as a known identification of one or more disulfide linkage sites of) the target biologic, wherein the one or more GBAs comprise one or more study class attributes [e.g., textual labels that identify particular types of analytical studies to be performed on the target biologic (e.g., to obtain specific sets of information about the target biologic)]; (b) accessing, by the processor, a method store comprising a plurality of analytical stage records, each analytical stage record corresponding to a specific analytical stage having been implemented as a step of an analytical method used in an analytical study for structural characterization of an associated known
  • the one or more GBAs of the target biologic comprise(s) one or more members selected from the group consisting of: (A) a sequence fragment; (B) a molecular weight of the target biologic; (C) a molecule type of the target biologic; (D) a quantification of one or more specific amino acids within the target biologic [e.g., a total number of one or more specific amino acids (e.g., cysteines) within the target biologic; e.g., a fraction of one or more specific amino acids within the target biologic]; (E) a proportion of amino acids within the target biologic having a particular classification based on one or more specific properties; (F) an identification and/or quantification of patterns of amino acid motifs associated with propensity towards one or more specific types of amino acid modifications [e.g., positions and/or number of potential sites of oxidation; e.g., positions and/or number of potential sites of deamidation; e.g., positions and/or number of potential sites of post- translation
  • isomerization and e.g., positions and/or number of potential instances of racemization] ; and (G) one or more metrics corresponding to a result of applying one or more sample handling techniques, including one or more fragmentation techniques, to the target biologic.
  • the one or more GBAs of the target biologic comprises a proportion of amino acids within the target biologic having a particular classification based on one or more specific properties, wherein the one or more specific properties comprise(s) one or more members selected from the group consisting of: (i) hydrophobicity [e.g., having at least a give level of hydrophobicity (e.g., as measured on a predefined scale); having less than or equal to a given level of hydrophobicity; e.g., having a level of hydrophobicity within a particular range; classified as hydrophobic]; (ii) hydrophilicity [e.g., having at least a give level of hydrophilicity (e.g., as measured on a predefined scale); having less than or equal to a given level of hydrophilicity; e.g., having a level of hydrophilicity within a particular range; classified as hydrophilic]; (iii) charge (e.g., having a charge greater than or equal to
  • the one or more GBAs of the target biologic comprises one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more fragmentation techniques comprise(s) one or more members selected from the group consisting of enzymatic digestion, chemical digestion, and gas-phase fragmentation.
  • the one or more GBAs of the target biologic comprises one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more metrics are selected from the group consisting of: (i) a likelihood of enzymatic cleavage resulting from digestion with select proteolytic enzymes for which cleavage sites are highly specific (e.g., trypsin; e.g., Lys-C; e.g., Glu-C; e.g., Asp-N; e.g., Arg-C), whether the enzymes are applied singly, serially or simultaneously; (ii) a fragmentation partem; and (iii) a statistical distribution of fragments by fragment length and/or by fragment molecular weight.
  • the one or more GBAs of the target biologic comprise(s) one or more bioprocess attributes representing parameters of a bioprocess used to produce the target biologic.
  • the one or more bioprocess attributes comprise(s) one or more members selected from the group consisting of: (A) an identification of a cell culture type used to produce the target biologic (e.g., such as a textual label that identifies a cell culture type); and (B) an identification of a purification stage performed subsequently to harvesting a cell culture supernatant.
  • the one or more GBAs of the associated known biologic comprise(s) one or more members selected from the group consisting of: (A) a sequence fragment; (B) a molecular weight of the associated known biologic; (C) a molecule type of the associated known biologic; (D) a quantification of one or more specific amino acids within the associated known biologic [e.g., a total number of one or more specific amino acids (e.g., cysteines) within the associated known biologic; e.g., a fraction of one or more specific amino acids within the associated known biologic] ; (E) a proportion of amino acids within the associated known biologic having a particular classification based on one or more specific properties; (F) an identification and/or quantification of patterns of amino acid motifs associated with propensity towards one or more specific types of amino acid modifications [e.g., positions and/or number of potential sites of oxidation; e.g., positions and/or number of potential sites of deamidation; e.g., positions and/or
  • the one or more GBAs of the associated known biologic comprises a proportion of amino acids within the associated known biologic having a particular classification based on one or more specific properties, wherein the one or more specific properties comprise(s) one or more members selected from the group consisting of: (i) hydrophobicity [e.g., having at least a give level of hydrophobicity (e.g., as measured on a predefined scale); having less than or equal to a given level of hydrophobicity; e.g., having a level of hydrophobicity within a particular range; classified as hydrophobic]; (ii)
  • hydrophilicity [e.g., having at least a give level of hydrophilicity (e.g., as measured on a predefined scale); having less than or equal to a given level of hydrophilicity; e.g., having a level of hydrophilicity within a particular range; classified as hydrophilic]; (iii) charge (e.g., having a charge greater than or equal to a specific charge; e.g., having charge less than or equal to a specific charge; e.g., having a positive charge; e.g., having a negative charge; e.g., neutral); (iv) acidity (e.g., classified as acidic; e.g., classified as basic; e.g., classified as neutral); and (v) aromaticity (e.g., classified as aromatic).
  • charge e.g., having a charge greater than or equal to a specific charge; e.g., having charge less than or equal to a specific charge; e.
  • the one or more GBAs of the associated known biologic comprises one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more fragmentation techniques comprise(s) one or more members selected from the group consisting of enzymatic digestion, chemical digestion, and gas-phase fragmentation.
  • the one or more GBAs of the associated known biologic comprises one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more metrics are selected from the group consisting of: (i) a likelihood of enzymatic cleavage resulting from digestion with select proteolytic enzymes for which cleavage sites are highly specific (e.g., trypsin; e.g., Lys-C; e.g., Glu-C; e.g., Asp-N; e.g., Arg-C), whether the enzymes are applied singly, serially or simultaneously; (ii) a fragmentation pattern; (iii) a statistical distribution of fragments by fragment length and/or by fragment molecular weight.
  • a likelihood of enzymatic cleavage resulting from digestion with select proteolytic enzymes for which cleavage sites are highly specific e.g., trypsin; e.g., Lys-C; e.g.
  • the one or more GBAs of the associated known biologic comprise(s) one or more bioprocess attributes representing parameters of a biomanufacturing process used to produce the associated known biologic.
  • the one or more bioprocess attributes comprise(s) one or more members selected from the group consisting of: (A) an identification of a cell culture type used to produce the associated known biologic (e.g., such as a textual label that identifies a cell culture type); and (B) an identification of a purification stage performed subsequently to harvesting a cell culture supernatant.
  • the method comprises: receiving, by the processor, a user input comprising one or more known or expected structural features (e.g., including substructures) of the target biologic [e.g., an amino acid sequence; e.g., locations of disulfide bonds; e.g., locations and/or types of glycan structures attached to the target biologic (e.g., via glycosylation)]; determining, by the processor, using the one or more known or expected structural features of the target biologic, the one or more GBAs of the target biologic; and providing, by the processor, the determined one or more target biologic GBAs via the input query for automated identification of analytical study design parameters for analysis of the target biologic.
  • a user input comprising one or more known or expected structural features (e.g., including substructures) of the target biologic [e.g., an amino acid sequence; e.g., locations of disulfide bonds; e.g., locations and/or types of glycan structures attached to the target biologic (e.g.,
  • the one or more known or expected structural features of the target biologic comprises an amino acid sequence of the target biologic (e.g., a nominal amino acid sequence of the target biologic).
  • the received user input comprises an identification of a molecule type of the target biologic (e.g., a recombinant protein, a fusion protein, a monoclonal antibody, an antibody-drug conjugate) and the identification of the molecule type is used as a GBA of the one or more GBAs of the target biologic.
  • the input query comprises one or more bioprocess parameters that represent properties of the bioprocess used to produce the target biologic (e.g., an identification of a cell culture type; e.g., an identification of a purification technique) and the one or more study design results determined in step (c) based further on the one or more bioprocess parameters.
  • bioprocess parameters that represent properties of the bioprocess used to produce the target biologic (e.g., an identification of a cell culture type; e.g., an identification of a purification technique) and the one or more study design results determined in step (c) based further on the one or more bioprocess parameters.
  • At least one of the one or more study class attributes corresponds to an identifier of a specific analytical study type (e.g., a specific type of structural characterization study) selected from the group consisting of: (i) determination of a molecular weight of the target biologic; (ii) determination of a primary structure of the target biologic; (iii) determination of post-translational modifications (e.g., intra- and inter-chain disulfide bonds; e.g., disulfide knots; e.g., glycosylation nature and sites; e.g., chemical post- translational modifications); (iv) determination of one or more higher order structures of the target biologic (e.g., secondary structure; e.g., tertiary structure; e.g., quaternary structure); (vi) comparison of the target biologic with a reference biologic (e.g., for characterization of the target biologic as a biosimilar); (vii) a lot comparison study; (viii) determination of a specific analytical study type
  • each of one or more of the analytical stage records of the method store comprises or is linked to one or more prior study class attributes that identify the structural characterization study in which the analytical stage that the analytical stage record represents was implemented.
  • each of one or more of the analytical stage records of the method store corresponds to a specific analytical stage selected from the group consisting of: (i) a separation stage; (ii) a detection stage; (iii) a mass spectrometry stage; (iv) a digestion strategy [e.g., an enzymatic digestion stage (e.g., single digest; e.g., serial digest; e.g., cocktail digest); e.g., a chemical digestion stage (e.g., single digest; e.g., serial digest; e.g., cocktail digest)]; and (v) a sample preparation stage [e.g., a sample preprocessing stage (e.g., dilution, enrichment, buffer exchange, desalting, stress, or tit
  • At least a portion of the plurality of analytical stage records were created from published documents (e.g., published journal articles of structural characterization studies) via automated processing using text mining and/or natural language processing.
  • At least a portion of the plurality of analytical stage records were created from published documents (e.g., published journal articles of structural characterization studies) via automated processing in combination with a user interaction.
  • At least a portion of the analytical stage records were created from in-house studies in an automated fashion via dedicated software as part of a laboratory information management system.
  • each determined study design result comprises a set of analytical stage results, each representing a specific analytical stage to be applied to the target biologic, and comprising a list of parameters to be used when applying the analytical stage that the analytical stage result represents to the target biologic
  • the analytical stage results of a given study design result are determined via a machine learning module that receives as input the GBAs of the target biologic and determines the set of analytical stage results and, for each analytical stage result, parameter values associated with that stage, based on patterns identified using the GBAs associated with analytical stage records of the method store [e.g., for a given target biologic, the machine learning module computes relevant analytical stage records by matching the GBAs of the target biologic with GBAs of a subset of the analytical stage records according to an identified pattern in GBAs of the analytical stage records (e.g., determined via clustering analysis)].
  • the machine learning module implements a supervised machine learning technique [e.g., wherein the machine learning module determines study design results using a set of training data that comprises a plurality examples of correct output results (e.g., previously determined study design results; e.g., previously determined analytical stage results) and for each correct output result, a set of input features used to determine the correct output (e.g., for each of a plurality of previously determined study design result, GBAs that were used as input; e.g., for each of a plurality of previously determined analytical stage results of a plurality of previously determined study design results, GBAs that were used as input); e.g., an artificial neural network technique; e.g., a decision tree; e.g., one or more regression models; e.g., a k-nearest neighbor technique].
  • a supervised machine learning technique e.g., wherein the machine learning module determines study design results using a set of training data that comprises a plurality examples of correct
  • the machine learning module implements a reinforcement machine learning technique [e.g., wherein the machine learning module determines study design results using a set of training data that comprises a plurality of example outputs (e.g., analytical stage records), and for each example output a performance measure of the example output (e.g., a performance index that an analytical stage record comprises or is linked to).
  • a reinforcement machine learning technique e.g., wherein the machine learning module determines study design results using a set of training data that comprises a plurality of example outputs (e.g., analytical stage records), and for each example output a performance measure of the example output (e.g., a performance index that an analytical stage record comprises or is linked to).
  • the machine learning module implements an unsupervised machine learning technique (e.g., a clustering method; e.g., k-means clustering; e.g., self- organizing maps).
  • an unsupervised machine learning technique e.g., a clustering method; e.g., k-means clustering; e.g., self- organizing maps.
  • the machine learning module implements the unsupervised machine learning technique as a precursor to a supervised machine learning technique.
  • the method comprises: receiving, by the processor, a user input corresponding to a modification of a particular study design result of the one or more determined study design results; updating, by the processor, the particular study design result according to the user input; and storing, by the processor, one or more analytical stage results of the updated study design result as analytical stage records in the method store.
  • the method comprises using the stored analytical stage results of the updated study design result as training data in a supervised machine learning technique (e.g., an artificial neural network technique; e.g., a decision tree; e.g., one or more regression models; e.g., a k-nearest neighbor technique) implemented by the machine learning module of step (c).
  • a supervised machine learning technique e.g., an artificial neural network technique; e.g., a decision tree; e.g., one or more regression models; e.g., a k-nearest neighbor technique
  • step (d) comprises, for at least one study design result of the determined study design results: (A) causing, by the processor, display of at least one of a graphical control element corresponding to an analytical stage result of the study design result; (B) receiving, by the processor, via a user interaction with the with the graphical control element, a user input corresponding to a modification of the analytical stage result; (C) responsive to the received user input, updating, by the processor, the particular study design result according to the user input; and (D) storing, by the processor, one or more analytical stage results of the updated study design result as analytical stage records in the method store.
  • the one or more study design results comprises one or more documents corresponding to software file(s) that specify parameters for analytical instruments (e.g., parameters for analytical instruments such as chromatography
  • the invention is directed to a method of populating a method store corresponding to a database of records representing analytical methods and/or analytical stages thereof having been previously applied in structural characterization of a plurality of known biologies, the method comprising: (a) creating, by a processor of a computing device, an analytical stage record corresponding to a specific analytical stage having been implemented a step of an analytical method used in an analytical study for structural characterization of an associated known biologic, wherein the analytical stage record comprises: (i) an identifier (e.g., a text label) of the corresponding specific analytical stage, and (ii) a series of parameter values used to implement the corresponding analytical stage for characterizing the associated known biologic; (b) storing, by the processor, the analytical stage record in the method store; (c) storing, by the processor, one or more GBAs of the associated known biologic in the method store; and (d) linking, by the processor, the one or more known biologic GBAs with the analytical stage record.
  • an identifier e.g., a
  • the one or more GBAs of the associated known biologic comprise(s) one or more members selected from the group consisting of: (A) a sequence fragment; (B) a molecular weight of the associated known biologic; (C) a molecule type of the associated known biologic; (D) a quantification of one or more specific amino acids within the associated known biologic [e.g., a total number of one or more specific amino acids (e.g., cysteines) within the associated known biologic; e.g., a fraction of one or more specific amino acids within the associated known biologic]; (E) a proportion of amino acids within the associated known biologic having a particular classification based on one or more specific properties; (F) an identification and/or quantification of patterns of amino acid motifs associated with propensity towards one or more specific types of amino acid modifications [e.g., positions and/or number of potential sites of oxidation; e.g., positions and/or number of potential sites of deamidation; e.g., positions and/or number
  • isomerization and e.g., positions and/or number of potential instances of racemization] ; and (G) one or more metrics corresponding to a result of applying one or more sample handling techniques, including one or more fragmentation techniques, to the associated known biologic.
  • the one or more GBAs of the associated known biologic comprises a proportion of amino acids within the associated known biologic having a particular classification based on one or more specific properties, wherein the one or more specific properties comprise(s) one or more members selected from the group consisting of: (i) hydrophobicity [e.g., having at least a give level of hydrophobicity (e.g., as measured on a predefined scale); having less than or equal to a given level of hydrophobicity; e.g., having a level of hydrophobicity within a particular range; classified as hydrophobic]; (ii) hydrophilicity [e.g., having at least a give level of hydrophilicity (e.g., as measured on a predefined scale); having less than or equal to a given level of hydrophilicity; e.g., having a level of hydrophilicity within a particular range; classified as hydrophilic]; (iii) charge (e.g., having a charge greater than or
  • the one or more GBAs of the associated known biologic comprises one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more fragmentation techniques comprise(s) one or more members selected from the group consisting of enzymatic digestion, chemical digestion, and gas-phase fragmentation.
  • the one or more GBAs of the associated known biologic comprises one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more metrics are selected from the group consisting of: (i) a likelihood of enzymatic cleavage resulting from digestion with select proteolytic enzymes for which cleavage sites are highly specific (e.g., trypsin; e.g., Lys-C; e.g., Glu-C; e.g., Asp-N; e.g., Arg-C), whether the enzymes are applied singly, serially or simultaneously; (ii) a fragmentation pattern; (iii) a statistical distribution of fragments by fragment length and/or by fragment molecular weight.
  • a likelihood of enzymatic cleavage resulting from digestion with select proteolytic enzymes for which cleavage sites are highly specific e.g., trypsin; e.g., Lys-C; e.g.
  • the one or more GBAs of the associated known biologic comprise(s) one or more bioprocess attributes representing parameters of a biomanufacturing process used to produce the associated known biologic.
  • the one or more bioprocess attributes comprise(s) one or more members selected from the group consisting of: (A) an identification of a cell culture type used to produce the associated known biologic (e.g., such as a textual label that identifies a cell culture type); and (B) an identification of a purification stage performed subsequently to harvesting a cell culture supernatant.
  • the one or more GBAs of the associated known biologic comprise one or more study class attributes, each of which identifies a type of analytical study performed on the associated known biologic using an analytical method comprising the analytical stage that the analytical stage record represents.
  • At least one of the one or more study class attributes corresponds to an identifier of a specific analytical study type (e.g., a specific type of structural characterization study) selected from the group consisting of: (i) determination of a molecular weight of the target biologic; (ii) determination of a primary structure of the target biologic; (iii) determination of post-translational modifications (e.g., intra- and inter-chain disulfide bonds; e.g., disulfide knots; e.g., glycosylation nature and sites; e.g., chemical post- translational modifications); (iv) determination of one or more higher order structures of the target biologic (e.g., secondary structure; e.g., tertiary structure; e.g., quaternary structure); (vi) comparison of the target biologic with a reference biologic (e.g., for characterization of the target biologic as a biosimilar); (vii) a lot comparison study; (viii) determination of a specific analytical study type
  • the analytical stage record of the method store corresponds to a specific analytical stage selected from the group consisting of: (i) a separation stage; (ii)a detection stage; (iii) a mass spectrometry stage; (ii) a digestion strategy [e.g., an enzymatic digestion stage (e.g., single digest; e.g., serial digest; e.g., cocktail digest); e.g., a chemical digestion stage (e.g., single digest; e.g., serial digest; e.g., cocktail digest)]; (iii) a sample preparation stage [e.g., a sample preprocessing stage (e.g., dilution, enrichment, buffer exchange, desalting, stress, or titration); e.g., fractionation; e.g., denaturation; e.g., reduction; e.g., alkylation].
  • a digestion strategy e.g., an enzymatic digestion stage (e.g
  • creating the analytical stage record comprises extracting at least one of (i) the identifier and (ii) one or more parameter values of the series of parameter values from published documents (e.g., published journal articles of structural
  • creating the analytical stage record comprises extracting at least one of (i) the identifier and (ii) one or more parameter values of the series of parameter values from published documents (e.g., published journal articles of structural
  • creating the analytical stage record comprises obtaining the identifier and series of parameter values from an in-house study in an automated fashion via dedicated software as part of a laboratory information management system.
  • the invention is directed to a method of automatically identifying analytical study design parameters for analysis of a target biologic, the method comprising: (a) receiving, by a processor of a computing device, an input query comprising one or more generalizable biologic attributes (GBAs) of [e.g., determined based on an amino acid sequence of; e.g., determined based on a molecule type; e.g., based on known information, such as a known identification of one or more disulfide linkage sites) the target biologic, wherein the one or more GBAs comprise one or more study class attributes [e.g., textual labels that identify particular types of analytical studies to be performed on the target biologic (e.g., to obtain specific sets of information about the target biologic)]; (b) accessing, by the processor, a method store comprising a plurality of analytical method records, each analytical method record corresponding to a specific analytical method used in an analytical study for structural characterization of an associated known biologic, wherein: (i) each analytical method record comprises
  • the invention is directed to a method of populating a method store corresponding to a database of records representing analytical methods and/or analytical stages thereof having been previously applied in structural characterization of a plurality of known biologies, the method comprising: (a) creating, by a processor of a computing device, an analytical method record corresponding to a specific analytical method having been used in an analytical study for structural characterization of an associated known biologic, wherein the analytical method record comprises a sequence of analytical stage records, each representing a specific analytical stage used in the analytical method that the analytical method record represents; (b) storing, by the processor, the analytical method record in the method store; (c) storing, by the processor, one or more GBAs of the associated known biologic in the method store; and (d) linking, by the processor, the one or more known biologic GBAs with the analytical method record.
  • the invention is directed to a system for automatically identifying analytical study design parameters for analysis of a target biologic.
  • the system includes: a processor; and a memory having instructions stored thereon, wherein the instructions, when executed by the processor cause the processor to: (a) receive an input query comprising one or more generalizable biologic attributes (GBAs) of [e.g., determined based on an amino acid sequence of; e.g., determined based on a molecule type of; e.g., based on known information, such as a known identification of one or more disulfide linkage sites of] the target biologic, wherein the one or more GBAs comprise(s) one or more study class attributes [e.g., textual labels that identify particular types of analytical studies to be performed on the target biologic (e.g., to obtain specific sets of information about the target biologic)]; (b) access a method store comprising a plurality of analytical stage records, each analytical stage record corresponding to a specific analytical stage having been implemented as
  • GBAs generali
  • the invention is directed to a system for populating a method store corresponding to a database of records representing analytical methods and/or analytical stages thereof having been previously applied in structural characterization of a plurality of known biologies.
  • the system includes: a processor; and a memory having instructions stored thereon, wherein the instructions, when executed by the processor cause the processor to: (a) create an analytical stage record corresponding to a specific analytical stage having been implemented a step of an analytical method used in an analytical study for structural characterization of an associated known biologic, wherein the analytical stage record comprises: (i) an identifier (e.g., a text label) of the corresponding specific analytical stage, and (ii) a series of parameter values used to implement the corresponding analytical stage for characterizing the associated known biologic; (b) store the analytical stage record in the method store; (c) store one or more generalized biologic attributes (GBAs) of the associated known biologic in the method store; and (d) link the one or more known biologic GBAs with the analytical stage record.
  • GBAs general
  • the invention is directed to a system of automatically identifying analytical study design parameters for analysis of a target biologic.
  • the system includes: a processor; and a memory having instructions stored thereon, wherein the instructions, when executed by the processor cause the processor to: (a) receive an input query comprising one or more generalizable biologic attributes (GBAs) of [e.g., determined based on an amino acid sequence of; e.g., determined based on a molecule type; e.g., based on known information, such as a known identification of one or more disulfide linkage sites] the target biologic, wherein the one or more GBAs comprise(s) one or more study class attributes [e.g., textual labels that identify particular types of analytical studies to be performed on the target biologic (e.g., to obtain specific sets of information about the target biologic)]; (b) access a method store comprising a plurality of analytical method records, each analytical method record corresponding to a specific analytical method used in an analytical study for structural
  • GBAs general
  • the invention is directed to a system for populating a method store corresponding to a database of records representing analytical methods and/or analytical stages thereof having been previously applied in structural characterization of a plurality of known biologies.
  • the system includes: a processor; and a memory having instructions stored thereon, wherein the instructions, when executed by the processor cause the processor to: (a) create an analytical method record corresponding to a specific analytical method having been used in an analytical study for structural characterization of an associated known biologic, wherein the analytical method record comprises a sequence of analytical stage records, each representing a specific analytical stage used in the analytical method that the analytical method record represents; (b) store the analytical method record in the method store; (c) store one or more generalized biologic attributes (GBAs) of the associated known biologic in the method store; and (d) link the one or more known biologic GBAs with the analytical method record.
  • GBAs generalized biologic attributes
  • FIG. 1 A is a block diagram showing a process for populating a method store, according to an illustrative embodiment.
  • FIG. IB is a block diagram showing a process for populating a method store, including examples of determined generalizable biologic attributes, according to an illustrative embodiment.
  • FIG. 1C is a block diagram showing a process for populating a method store, including an example organization of analytic stages, according to an illustrative embodiment.
  • FIG. ID is a block flow diagram illustrating a process for populating an analytical method store, according to an illustrative embodiment.
  • FIG. 2A is a block diagram showing a process for determining study design results, according to an illustrative embodiment.
  • FIG. 2B is a block diagram showing a process for determining an analytical method for a target biologic, according to an illustrative embodiment.
  • FIG. 3 is a portion of pseudocode for instantiating an analytical stage record, according to an illustrative embodiment.
  • FIG. 4 is a diagram showing an example hierarchical organization of various analytical stages and relevant parameters, according to an illustrative embodiment.
  • FIG. 5 is a block diagram showing an organization analytical method records and their links to generalizable biologic attributes, according to an illustrative embodiment.
  • FIG. 6 is a block diagram showing an example process for creating analytical stage records and analytical method records from previously carried out structural characterization studies, according to an illustrative embodiment.
  • FIG. 7 is a block flow diagram illustrating a process for determining study design results, according to an illustrative embodiment.
  • FIG. 8A is a graph illustrating use of cluster analysis to determine groups of related analytical method records, wherein each of two example target biologies are identified as belonging to a determine group of related analytical methods, according to an illustrative embodiment.
  • FIG. 8B is a graph illustrating use of cluster analysis to determine groups of related analytical method records, wherein an example target biologic is identified as not belonging to any of the determined groups, according to an illustrative embodiment.
  • FIG. 9 is a set of graphs illustrating use of cluster analysis to determine groups of related analytical method records, wherein a non-linear transform is used, according to an illustrative embodiment.
  • FIG. 10 is a block diagram of an exemplary cloud computing environment, used in certain embodiments.
  • FIG. 1 1 is a block diagram of an example computing device and an example mobile computing device, used in certain embodiments.
  • FIG. 12 is a flow chart showing three different analytical methods, namely, bottom-up (Fragmentation 1 : produced by enzymatic digestion using multiple enzymes), middle-down (Fragmentation 2 : produced by enzymatic digestion using a single enzyme or chemical), or top-down (Fragmentation 3 : produced via gas phase cleavage for example using collision induced dissociation (CID) and electron transfer dissociation (ETD)), that may be used for characterizing the primary sequence of a biologic, for example, a protein, according to an illustrative embodiment.
  • CID collision induced dissociation
  • ETD electron transfer dissociation
  • the term “approximately” or “about” refers to a range of values that fall within 25%, 20%, 19%, 18%, 17%, 16%, 15%, 14%, 13%, 12%, 11%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, or less in either direction (greater than or less than) of the stated reference value unless otherwise stated or otherwise evident from the context (except where such number would exceed 100% of a possible value).
  • biologic refers to a composition that is produced by recombinant DNA technologies, peptide synthesis, or purified from natural sources and that has a desired biological activity.
  • the biologic can be, for example, a protein, peptide, glycoprotein, polysaccharide, a mixture of proteins or peptides, a mixture of glycoproteins, a mixture of polysaccharides, a mixture of one or more of a protein, peptide, glycoprotein or polysaccharide, or a derivatized form of any of the foregoing entities.
  • the molecular weight of biologies can vary widely, from about 1000 Da for small peptides such as peptide hormones to one thousand kDa or more for complex polysaccharides, mucins, and other heavily glycosylated proteins.
  • the biologic subject of the process of this invention can have a molecular weight of 1 kDa to 1000 kDa, more typically 20 kDa to 200 kDa, and often 30 kDa to 150 kDa.
  • desmopressin, oxytocin, angiotensin and bradykinin each have a molecular weight of about 1 kDa, calcitonin is 3.5 kDa, insulin is 5.8 kDa, kineret is 17.3 kDa, erythropoietin is about 30 kDa, ontak is 58 kDa, orencia is 92 kDa, and antibodies are approximately 150 kDa (Rituxan 145 kDa, Erbitux 152 kDa).
  • Hyaluronic acids and salts have an average molecular weight often greater than 1000 kDa.
  • a biologic is a drug used for treatment of diseases and/or medical conditions.
  • biologic drugs include, for example, native or engineered antibodies or antigen binding fragments thereof, and antibody-drug conjugates, which comprise an antibody or antigen binding fragments thereof conjugated directly or indirectly (e.g., via a linker) to a drug of interest, such as a cytotoxic drug or toxin.
  • a biologic is a diagnostic, used to diagnose diseases and/or medical conditions.
  • allergen patch tests utilize biologies (e.g., biologies manufactured from natural substances) that are known to cause contact dermatitis.
  • Diagnostic biologies may also include medical imaging agents, such as proteins that are labelled with agents that provide a detectable signal that facilitates imaging such as fluorescent markers, dyes, radionuclides, and the like.
  • reference biologic refers to a biologic that is representative of the biologic drug under development or that that has been approved for marketing, and provides a reference standard for the biologic drug with, for example, the appropriate, pre-determined composition, purity and/or biological activity.
  • Structural characterization study refers to an experimental study that is performed on a biologic and obtains information (e.g., data; e.g., data derived from mass spectrometry analysis) about specific types of structural features of the biologic.
  • structural characterization studies include a molecular weight study, a primary structure study, amino acid modifications studies, post-translational modification studies, higher order structure determination studies, critical quality attribute (CQA) mapping studies, in vivo comparability profile determination studies, and biosimilar/reference lot comparison studies.
  • a molecular weight study determines a molecular weight of a given biologic
  • a primary structure study determines a primary structure - a measured amino acid sequence - of a given biologic
  • amino acid modification studies are used to obtain information about one or more specific types of amino acid modifications that may occur for a given biologic
  • post-translational modification studies are used to obtain information about one or more specific types of post-translational modifications that may occur for a given biologic
  • higher order structure studies are used to determine one or more higher order structures, such as secondary, tertiary, and quaternary structures, of a given biologic
  • CQA mapping studies are used to determine a CQA map of a given biologic
  • in vivo comparability profile studies are used to determine an in vivo comparability profile for a given biologic, biosimilar/reference comparison study, and a lot comparison study.
  • Analytical stage refers to a specific isolated experimental processing step used, typically in combination with other analytical stages, to obtain a structural characterization of a biologic.
  • types of analytical stages include (i) sample extraction, (ii) sample purification, (iii) sample preparation, (iv) digestion, (v) separation, (vi) detection, and (vii) mass spectrometry, each of which are described in detail in the following.
  • Analytical method refers to a sequence of one or more analytical stages that is applied to a given biologic in order to obtain a dataset used in obtaining a particular desired structural characterization of a biologic.
  • the dataset obtained by applying an analytical method to characterize a given biologic comprises mass spectrometry data.
  • determining or identifying a given analytical method for use in characterizing a given biologic comprises determining a sequence of one or more analytical stages of the given analytical method.
  • Study design refers to an identification of a specific combination of one or more analytical methods that are used to carry out a particular desired structural characterization study of a specific biologic.
  • a study design depends on both the type of structural characterization study and the specific biologic it is used to characterize.
  • determining or identifying a particular study design for a given biologic may comprise determining or identifying one or more analytical methods and, accordingly, for each of the one or more analytical methods determining or identifying a sequence of one or more analytical stages.
  • a study design comprises a single analytical method. In certain embodiments, a study design comprises two or more analytical methods and datasets obtained by applying each of the analytical methods to characterize the biologic are combined to obtain the desired structural characterization of the biologic.
  • Analytical stage record refers to a data structure that represents a specific analytical stage having been implemented in the characterization of a specific known biologic.
  • An analytical stage record comprises an identifier of the specific analytical stage it represents and a series of parameter values used to implement the represented analytical stage for characterizing the specific known biologic.
  • An analytical stage record may be implemented in software in a variety of ways, such as a database entry, an instance of a class (e.g., as in object-oriented programming), or a combination of data elements (e.g., arrays, structures, integer variables, string variables, and the like).
  • Analytical method record refers to a data structure that represents a specific analytical method.
  • An analytical method record comprises one or more analytical stage records, each of which represents a specific analytical stage that the represented analytical method comprises.
  • An analytical method record may be implemented in software in a variety of ways, such as a database entry, an instance of a class (e.g., as in object-oriented programming), or a combination of data elements (e.g., arrays, structs, integer variables, string variables, and the like).
  • Analytical stage result refers to a computer representation of a specific analytical stage, as output by the study design and method capture tool described herein.
  • Analytical method result refers to a computer representation of a specific analytical method, as output by the study design and method capture tool described herein.
  • an analytical method result comprises a sequence of analytical stage results, just as an analytical method is a sequence of analytical stages.
  • steps such as determining an analytical method result, providing an analytical method result, rendering an analytical method result, and the like comprise performing the same steps for analytical stage results that the analytical method result comprises.
  • Study design result refers to a computer representation of a study design, as output by the study design and method capture tool described herein.
  • a study design result comprises a single analytical method result.
  • a study design result comprises two or more analytical method results.
  • steps such as determining an study design result, providing an study design result, rendering an study design result, and the like comprise performing the same steps for one or more analytical method results that the study design result comprises.
  • generalizable biologic attributes refers to sets of features from biologic molecules associated via heuristic and/or domain knowledge with recommended analytical methods.
  • a generalizable biologic attribute (GBA) of a given biologic is a value that is derived from and represents any one of (i) structural features of the given biologic, (ii) properties of a biomanufacturing process used to produce the given biologic, and (iii) attributes of a structural characterization study that either has previously been performed on the given biologic, or is to be performed for the given biologic.
  • GBAs representing structural features of a biologic include sequence fragments, molecular weight, molecule type, i
  • a GBA corresponding to a molecular weight of a given biologic may be a value of the molecular weight of the given biologic.
  • GBAs representing structural features of a biologic may also include a quantification of (e.g., a number of; e.g., a fraction of) one or more specific amino acids [e.g., Arginine (also referred to as Arg, or R); e.g., Lysine (also referred to as Lys, or K); e.g., cysteine (also referred to as Cys, or C)] within the biologic.
  • Arginine also referred to as Arg, or R
  • Lysine also referred to as Lys, or K
  • cysteine also referred to as Cys, or C
  • GBAs may also include an identification and/or quantification of patterns of amino acid motifs associated with propensity towards certain types of modification [e.g., a position and/or number of potential (e.g., predicted) or known sites of oxidation; e.g., a position and/or number of potential (e.g., predicted) or known sites of deamidation; a position and/or number of potential (e.g., predicted) or known sites of various post-translational modifications (e.g., N-linked glycosylation; e.g., disulfide bridges; e.g., disulfide knots; e.g., modification of cysteine to formylglycine)].
  • a position and/or number of potential e.g., predicted or known sites of oxidation
  • a position and/or number of potential e.g., predicted or known sites of deamidation
  • GBAs representing structural features of a given biologic also include proportions of amino acids of various properties (e.g., hydrophobicity, hydrophilicity, charge, acidity, aromaticity, and the like).
  • GBAs representing proportions of amino acids of various properties within a given biologic include a proportion of amino acids having a particular classification based on hydrophobicity [e.g., having at least a give level of hydrophobicity (e.g., as measured on a predefined scale); having less than or equal to a given level of hydrophobicity; e.g., having a level of hydrophobicity within a particular range; classified as hydrophobic], a proportion of amino acids having a particular classification based on hydrophilicity [e.g., having at least a give level of hydrophilicity (e.g., as measured on a predefined scale); having less than or equal to a given level of hydrophilicity; e.g., having a level of hydrophilicity within a particular range; classified as hydro
  • GBAs of a given biologic include one or more metrics describing predicted or previously known results of applying a fragmentation stage [e.g., enzymatic digestion (e.g., applied in solution and/or in-gel); e.g., chemical digestion (e.g., applied in solution and/or in-gel); e.g., gas-phase fragmentation; e.g., any combination of one or more enzymatic digestion, chemical digestion, and gas-phase fragmentation methods, applied serially or in a cocktail] to given biologic.
  • a fragmentation stage e.g., enzymatic digestion (e.g., applied in solution and/or in-gel); e.g., chemical digestion (e.g., applied in solution and/or in-gel); e.g., gas-phase fragmentation; e.g., any combination of one or more enzymatic digestion, chemical digestion, and gas-phase fragmentation methods, applied serially or in a cocktail
  • a GBA corresponding to a likelihood of enzymatic cleavage resulting from digestion with one or more select proteolytic enzymes for which cleavage sites are highly specific e.g., enzymes including trypsin, Lys-C, Glu-C, Asp-N, Arg-C, and other enzymes
  • enzymes including trypsin, Lys-C, Glu-C, Asp-N, Arg-C, and other enzymes
  • GBAs corresponding to fragmentation pattems resulting from enzymatic digestion, chemical digestion, gas-phase fragmentation, and combinations thereof may be determined.
  • enzymes used in enzymatic digestion for which fragmentation patterns may be determined include trypsin, Lys-C, Glu-C, Asp-N, Arg-C, and other enzymes.
  • cyanogen bromide CNBr
  • NTCB 2-nitro-5-thiocyanobenzoate
  • FA formic acid
  • GBAs corresponding to combinations of enzymatic and chemical cleavage pattems may also be determined.
  • Fragmentation patters for enzymatic digestion, chemical digestion, and combinations thereof may be determined for in-solution and/or in-gel digestion.
  • GBAs corresponding to gas-phase fragmentation patterns include fragmentation patterns from gas-phase fragmentation techniques carried out in an electronic instrument such as a mass spectrometer.
  • gas- phase fragmentation techniques include collision induced dissociation (CID), higher-energy collisional dissociation (HCD), electron capture dissociation (ECD), electron transfer dissociation (ETD), infrared multiphoton dissociation (IRMPD), and other fragmentation techniques.
  • GBAs corresponding to metrics describing predicted or previously known results of applying a fragmentation stage to a given biologic also include statistical distributions of fragments by fragment length and/or by molecular weight (e.g., a histogram of fragment lengths; e.g., a histogram of fragment molecular weights; e.g., an average fragment length; e.g., an average fragment molecular weight).
  • molecular weight e.g., a histogram of fragment lengths; e.g., a histogram of fragment molecular weights; e.g., an average fragment length; e.g., an average fragment molecular weight.
  • GBAs corresponding to features related to physico-chemical properties of biologic molecules may also be determined.
  • GBAs of a given biologic include one or more study class attributes that identify one or more particular structural characterization studies that have been performed on the given biologic (e.g., wherein the given biologic is a known, previously characterized biologic), or identify one or more particular desired structural characterization studies to be performed on the given biologic (e.g., wherein the given biologic is a target biologic to be characterized).
  • GBAs of a given biologic include one or more bioprocess attributes that represent parameters of a biomanufacturing process used to produce the given biologic.
  • bioprocess attributes may include an identification of an identification of a cell culture type used to produce the associated known biologic (e.g., such as a textual label that identifies a cell culture type), and an identification of a purification stage performed subsequently to harvesting a cell culture supernatant.
  • Link As used herein, the terms “link”, and “linked”, as in a first data structure or data element is linked to a second data structure or data element, refer to a computer representation of an association between two data structures or data elements that is stored electronically (e.g. in computer memory).
  • systems, architectures, devices, methods, and processes of the claimed invention encompass variations and adaptations developed using information from the embodiments described herein. Adaptation and/or modification of the systems, architectures, devices, methods, and processes described herein may be performed, as contemplated by this description.
  • Headers are provided for the convenience of the reader - the presence and/or placement of a header is not intended to limit the scope of the subject matter described herein.
  • the systems and methods described herein facilitate the design of analytical studies for characterizing various structural properties of biologies.
  • the approaches described herein allow a user to submit (e.g., to a computer program) input describing a target biologic that they are interested in characterizing and a particular type of desired structural characterization that they would like to obtain for the target biologic.
  • the approaches described herein then provides a specific detailed procedure for obtaining, from a sample comprising the target biologic, the desired structural characterization in the form of a study design result.
  • the study design results determined by the tool comprises a single analytical method result, which in turn comprises a sequence of analytical stage results that represent specific analytical stages to be implemented in order to obtain the desired structural characterization of the target biologic.
  • a study design result comprises two or more analytical method results that represent orthogonal analytical methods which provide complementary data with respect to each other.
  • FIG. 12 illustrates three different analytical methods that can be combined in a study for determining a primary structure of a biologic.
  • the three different analytical methods - a top- down approach, a bottom-up approach, and a middle-down approach use different sample digestion strategies (including no digestion in the case of the top-down approach) to obtain different datasets for a given target biologic to be characterized.
  • each of the three analytical methods comprises a different sequence of analytical stages.
  • the top-down approach includes a protein biologies production stage 1202, a separation stage 1204, a mass spectrometry stage 1206, a fragmentation stage 1208, a data process and analysis stage 1210, and a comprehensive coverage of protein structure stage 1212.
  • the bottom-up approach includes an additional fragmentation stage 1214 and a small peptides and glycan fragments stage 1216.
  • the middle-down approach includes an additional fragmentation stage 1218 and a large peptides and glycan fragments stage 1220.
  • An example study design result for determining a primary structure of a target biologic that combines the three approaches shown in FIG. 12 would include three analytical method results, each representing one of the illustrated top-down, bottom-up, and middle-down approaches.
  • Study design results may be provided to the user in a variety of fashions, for example as one or more documents (e.g., an automatically generated document) and/or via a graphical user interface (GUI) that allows the user to edit various steps and parameters of the provided study design result in an interactive fashion.
  • the documents provided via a study design result may include generated software files that specify parameters for analytical instruments such as liquid chromatography workstations, mass spectrometry workstations, and the like. Such generated software files may be loaded by analytical instruments in order to set up determined parameters to be used in carrying out the desired structural characterization study.
  • a given study design that, when implemented, allows a desired structural characterization of a target biologic to be obtained comprises one or more analytical methods, each comprising a sequence of specific analytical stages.
  • the specific sequence of analytical stages of an appropriate analytical method depends in a complex fashion on underlying properties of the target biologic and the specific desired structural characterization.
  • the systems and methods described herein utilize a database - a method store - that codifies previous experience in applying various analytical methods (and the analytical stages they comprise) to characterize biologies.
  • FIGS. 1A-1C An example 100 of organization and procedure for building a method store is shown in FIGS. 1A-1C.
  • the manner in which data are stored in the method store goes beyond merely storing records of studies and analytical methods that were previously implemented to characterize various biologies.
  • the method store 120 includes sets of generalizable biologic attributes (GBAs) 122 that are determined for the known biologies using analytical methods and analytical stages thereof that are stored as records in the method store.
  • GBAs are determined via a preprocessing step 110 and linked to the records of the various analytical methods and/or analytical stages 124 that were used in its characterization.
  • GBAs are values representing various features of a biologic and allow similarities between various different biologies to be identified.
  • the approaches described herein go beyond merely searching a database to identify and return study design results.
  • the systems and methods described herein may determine, for a given target biologic, a set of target GBAs that are used as features in machine learning algorithms that analyze the method store to identify relevant analytical methods that can be applied to obtain a desired structural characterization of the target biologic. This approach is illustrated in the block diagram of FIG. 2A. In this manner, the systems and methods allow for study design results to be determined for target biologies that may not have been characterized before.
  • a determined study design result includes analytical method results that represent analytical methods and comprise sequences of analytical stage results, representing specific analytical stages. Notably, the specific sequence of analytical stages represented by the analytical stage results of a given determined analytical method result does not need to have been previously performed and stored in the method store. Similarly, a given determined study design result may include analytical method results that represent new and unique combinations of analytical methods. Accordingly, the systems and methods described herein allow for novel and unique study design results to be obtained. Embodiments of specific approaches for obtaining user input, determining GBAs, representing data, and determining and providing study design results are described in detail in the following. A. Structural Characterization of Biologies
  • the structure of biological molecules can be highly complex, and vary significantly between different biologies.
  • a wide variety of structural characterization studies including determining molecular weight, a measured primary structure, characterizing and quantifying amino acid and post-translational modifications, and characterizing higher order structures (HOS)(e.g., such as secondary, tertiary, and quaternary structures) are used to determine and identify relevant structural features of biologies that are relevant to their efficacy and/or stability and may be influenced, e.g., by processes for manufacturing the biologic.
  • HOS higher order structures
  • different combinations of analytical methods each comprising different sequences and combinations of analytical stages are utilized in different structural characterization studies.
  • the particular analytical methods and analytical stages that they comprise that will be successful in allowing a desired structural characterization of a given biologic to be obtained depend in a complex fashion on the structure of the given biologic itself. Embodiments of various structural characterization studies and relevant analytical method approaches are described in detail in
  • a given biologic of interest e.g., a target biologic; e.g., a known biologic
  • a biological sample such as a cell, tissue, or bodily fluid (e.g., blood, urine, saliva, and the like).
  • a particular extraction procedure used depends on the particular sample in which the biologic is present. For example, if the biologic of interest is present in blood of a subject, extraction may be effectuated by removing a sample of blood from the subject.
  • a sample comprising a given biologic of interest e.g., a target biologic; e.g., a known biologic
  • a desired level of purity of the biologic of interest e.g., the biologic of interest comprises a specific percentage by weight of all components in the sample.
  • a sample comprising a biologic is purified to at least 50% purity (e.g., such that the biologic of interest comprises a at least 50% by weight of all components in the sample).
  • purification steps remove impurities such as process-related contaminants, unrelated biological macromolecules, misfolded proteins, and the like).
  • a cell sample comprises a biologic of interest and purification steps are used to purify a biologic from other macromolecules present in the cell sample, such as unrelated nucleic acids, lipids, glycolipids, polysaccharides, lipopolysaccharides, proteins, or even misfolded and/or misprocessed forms of the biological molecule of interest.
  • immunocapture also known as immunoaffinity purification techniques are used.
  • Non-antibody based purification techniques such as protein precipitation, gel filtration, ion exchange chromatography and gel electrophoresis may alternatively (or also) be used.
  • Immunocapture and non-antibody based purification techniques are described in additional detail in PCT/US2014/059150 and PCT/US2016/053434, the contents of which are incorporated herein by reference in their entirety.
  • a denaturation step may be performed. Denaturation methods may involve exposure of the biologic of interest to elevated temperatures, chemical denaturants, and mechanical stress (e.g., freeze- thaw processes).
  • analytical methods used to characterize a biologic include a reduction and alkylation analytical stages.
  • a reduction and alkylation stage may be used to cleave disulfide bridges during in order to characterize biologies that include disulfide bridges, such as monoclonal antibodies.
  • a reduction and alkylation stage comprises exposure of a biologic to a reducing agent (e.g., 2-mercaptoethanol) followed by exposure to an alkylating agent (e.g., iodoacetic acid; e.g., iodacetamide (IAA); e.g., ethylmaleimide (NEM)).
  • a reducing agent e.g., 2-mercaptoethanol
  • an alkylating agent e.g., iodoacetic acid; e.g., iodacetamide (IAA); e.g., ethylmaleimide (NEM)
  • Parameters relevant for carrying out a reduction and alkylation stage include concentrations of the reducing and alkylating agents, as well as temperatures of each of the solutions comprising the reducing and alkylating agents and times for which the biologic of interest should be exposed (e.g., immersed in) the solutions.
  • a structural characterization study of a given biologic includes one or more fragmentation stages.
  • a variety of fragmentation techniques may be used and can be executed in solution (e.g., solution digestion), in a gel (e.g., in-gel digestion), or in a gas phase (e.g., gas-phase fragmentation).
  • solution e.g., solution digestion
  • gel e.g., in-gel digestion
  • gas phase e.g., gas-phase fragmentation
  • enzymatic digestion in solution is used for fragmentation.
  • enzymatic digestion comprises exposure of a biologic to one or more digestion enzymes that cleave the biologic at various positions along its primary structure sequence (e.g., at specific sites within or adjacent to specific residues and/or combinations of residues; e.g., in between specific combinations of residues; e.g., at random positions along its primary structure sequence).
  • digestion enzymes include, without limitation, various proteolytic enzymes, such as trypsin, Lys-C, Glu-C, Asp-N, and Arg-C, various proteolytic enzymes, such as trypsin, Lys-C, Glu-C, Asp-N, and Arg-C, various proteolytic enzymes, such as trypsin, Lys-C, Glu-C, Asp-N, and Arg-C, various proteolytic enzymes, such as trypsin, Lys-C, Glu-C, Asp-N, and Arg-C, various proteo
  • deglycosylating enzymes such as PNGaseF, O-glyocosidase, sialidase, glucosaminidase, and beta-galactosidase, as well as other enzymes, such as pepsin, papain, chymotrypsin, aminopeptidases, and carboxypeptidases.
  • an enzymatic digestion step is a single digest, wherein a single enzyme is used for fragmentation.
  • an enzymatic digestion step may be a serial digest, wherein multiple enzymes are used one after the other, in a serial fashion.
  • Various enzymes used in a serial digest may be added to a solution comprising the biologic one after another. Prior to adding a given enzyme the solution in a serial digest, a composition may be added to the solution to terminate the reaction between the biologic and the previously added enzyme.
  • a given enzyme is added to the solution following buffer exchange.
  • an enzymatic digestion step is a cocktail digest, wherein a biologic of interest is exposed to multiple enzymes in a single reaction mixture.
  • cocktails of enzymes that are viable include, without limitation, (i) a mixture of trypsin and Lys-C, (ii) a mixture of trypsin, Lyz-C, and Asp-N, and (iii) a mixture of trypsin, Lys-C, Asp-N, and PNGase.
  • relevant parameters used for in solution enzymatic digestion stages include pHs of solutions, buffer compositions, temperatures, and incubation times. Chemical digestion may be used for fragmentation, wherein a biologic of interest is exposed to a solution comprising one or more particular digestion chemicals.
  • digestion chemicals include cyanogen bromide (CNBr), 2-nitro-5-thiocyanobenzoate (NTCB), hydroxylamine, and formic acid (FA).
  • a chemical digestion step can be implemented using a single chemical, multiple chemicals in a serial fashion, or a cocktail of multiple chemicals.
  • relevant parameters used for in solution chemical digestion steps include pHs of solutions, buffer compositions, temperatures, and incubation times.
  • fragmentation is achieved via a combination of (i) one or more chemicals with (ii) one or more enzymes. Combinations of chemical and enzymatic digestion may be applied in a serial fashion.
  • an in gel digestion stage is used for fragmentation.
  • gel digestion comprises subjecting a biologic of interest that is captured in a gel band to digestion in situ - that is, within the gel band.
  • digestion enzymes and/or digestion chemicals are added to a gel band comprising the biologic of interest in a manner similar to that described above with respect to in solution digestion.
  • digestion via a single enzyme or chemical, or multiple enzymes and/or chemicals e.g., using serial digestion wherein multiple enzymes and or chemicals are added one after another in a serial fashion; e.g., using a cocktail comprising the multiple enzymes and/or chemicals
  • serial digestion wherein multiple enzymes and or chemicals are added one after another in a serial fashion; e.g., using a cocktail comprising the multiple enzymes and/or chemicals
  • Gas phase fragmentation may be carried out as part of mass spectrometry analysis.
  • gas- phase fragmentation used in mass spectrometers include collision induced dissociation (CID), higher energy collisional dissociation (HCD), electron capture dissociation (ECD), electron transfer dissociation (ETD), infrared multi-photon dissociation (IRMPD), ultraviolet photo dissociation mass spectrometry, and CID of the isolated charge-reduced ions followed by ETD (CRCID).
  • CID collision induced dissociation
  • HCD collisional dissociation
  • ECD electron capture dissociation
  • ETD electron transfer dissociation
  • IRMPD infrared multi-photon dissociation
  • IRMPD infrared multi-photon dissociation
  • a particular gas-phase fragmentation technique used may depend on the type of mass spectrometry instrument used.
  • a particular gas-phase fragmentation technique used may also depend on the nature of the biologic molecule and the desired peptide fragments.
  • low energy CID fragmentation typically occurs at amide bonds of the peptide backbone of proteins and typically generates b and y sequencing ions.
  • ECD and ETD fragmentation can cleave proteins at the N-Ca bond within their peptide backbone and generates c and z ions.
  • a particular type of gas-phase fragmentation to be used is identified as a parameter in a mass analysis analytical method step.
  • a separation stage is used.
  • separation stages include, without limitation, gel electrophoresis (GE), sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE), 2-dimensional GE, isoelectric focusing (IEF), high- performance liquid chromatography (HPLC), reverse-phase HPLC (RP-HPLC), hydrophobic interaction chromatography (HIC), hydrophilic interaction chromatography (HILIC), size- exclusion chromatography (SEC), ion-exchange chromatography (IEC), anion exchange chromatography (AEX), cation exchange chromatography (CEX), capillary electrophoresis (CE), and 2D-HPLC.
  • Relevant separation method parameters include column selection, mobile phase composition, flow rates, gradients.
  • a detection stage is included to measure retention times.
  • detection methods include, without limitation, fluorescence detection (FL), , ultraviolet to visible absorbance detection (UV/Vis) , multiple-wavelength diode array detection (DAD), light-scattering spectroscopy.
  • mass spectrometry is used for characterization of complex biologies.
  • Mass spectrometers directly read the mass fingerprints (mass/charge ratios, m/z) of intact or fragmented proteins or molecules.
  • Types of mass spectrometers used for structural characterization of biologies include quadrupole, ion trap, time-of-fiight (TOF), orbitrap and Fourier transform ion cyclotron resonance (FTICR).
  • Modern hybrid mass analyzers such as a hybrid linear ion trap-obitrap (e.g., Thermo LTQ Obitrap EliteTM) and a quadrupole-time-of-flight (e.g., Agilent Q-TOF mass spectrometers) have been developed for structural characterization of biologies to support biopharmaceutical discovery and development pipelines.
  • a hybrid linear ion trap-obitrap e.g., Thermo LTQ Obitrap EliteTM
  • a quadrupole-time-of-flight e.g., Agilent Q-TOF mass spectrometers
  • mass spectrometry includes use of a gas-phase fragmentation technique that is carried out within the mass spectrometer.
  • gas-phase fragmentation techniques include collision induced dissociation (CID), higher-energy collisional dissociation (HCD), electron capture dissociation (ECD), electron transfer dissociation (ETD), infrared multi-photon dissociation (IRMPD), and CID of the isolated charge-reduced ions followed by ETD (CRCID).
  • CID collision induced dissociation
  • HCD collision induced collisional dissociation
  • ECD electron capture dissociation
  • ETD electron transfer dissociation
  • IRMPD infrared multi-photon dissociation
  • CID CID of the isolated charge-reduced ions followed by ETD
  • fragmentation used may depend on the type of mass spectrometers used (Scigelova,
  • CID, HCD, and ETD are built on a hybrid linear ion trap-obitrap mass spectrometer; CID and ECD are constructed on FTICR MS; low-energy CID is configured on Q-TOF MS; high- energy CID is included on TOF/TOF MS
  • an ionization source is used.
  • ionization source technologies employed on mass spectrometers include electrospray ionization (ESI) and matrix-assisted laser desorption ionization (MALDI).
  • ESI electrospray ionization
  • MALDI matrix-assisted laser desorption ionization
  • a particular ionization technology to be used in a mass spectrometry stage varies with the type of structural characterization desired, the nature of the biologic to be characterized, as well as other analytical stages that are used, such as a type of separation stage implemented prior to introduction of a sample into a mass spectrometer.
  • an ESI interface enables on-line introduction of samples (analytes) using HPLC, CE, or an infusion pump to deliver analytes from solution phase into gas phase on a mass analyzer.
  • Mass spectrometric methods using ESI ionization with CID, HCD, ETD, and CRCID fragmentation techniques can be used to facilitate structural characterization of biologies.
  • a MALDI interface is especially beneficial for a sample where the amount is limited. For example, a protein or peptide sample (1 ⁇ ) typically is spotted on a MALDI plate having a matrix such as a-cyano-4-hydroxycinnamic acid or sinapinic acid to form a crystal prior to MS analysis.
  • parameters relevant to a mass spectrometry analytical stage include identifiers of acquisition mode, fragmentation technique, and ionization technology, as well as various instrument settings (e.g., ESI source temperature and voltage, polarity, capillary temperature, gas flow rate, scan range, scan resolution, collision energy, AGC target ion values, injection times, isolation windows, ion mode).
  • instrument settings e.g., ESI source temperature and voltage, polarity, capillary temperature, gas flow rate, scan range, scan resolution, collision energy, AGC target ion values, injection times, isolation windows, ion mode.
  • a user decides which of a variety of structural
  • characterization studies he wants to perform in the characterization of a particular biologic of interest.
  • the automated approaches described herein are then used to systematically identify a set of one or more real world analytical methods, each in turn comprising a sequence of analytical stages that should be performed to carry out the chosen characterization study.
  • a given structural characterization study is used to obtain a particular set of data and information that conveys particular structural information about the biologic of interest.
  • the particular type of structural characterization study will, along with the particular biologic, influence the specific set of analytical methods that are automatically determined by the standardized techniques described herein, which, once determined, can be carried out in the real world in order to obtain the desired structural characterization. Examples of various types of structural characterization studies are summarized below.
  • PCT/US2014/059150 and PCT/US2016/053434 the contents of which are incorporated by reference herein in their entirety, include detailed discussion of structural characterization studies such as the below, and discuss analytical methods and stages that are of relevance to particular studies.
  • Molecular weight is an important aspect of biologies of interest.
  • a variety of techniques can be used to measure a molecular weight of a biologic of interest.
  • mass spectrometry is used to determine a molecular weight of an intact protein, following purification.
  • the primary structure of a protein refers to the amino acid sequence of the polypeptide. It is very important to confirm the amino acid sequence of the protein backbone since amino acid modifications may occur during manufacture or storage of biologies, which can result in the loss of stability and/or biological function. A profile of amino acid modifications is one of the major quality characteristics for protein biologies.
  • Mass spectrometry-based platforms using multiple fragmentation techniques can be used to obtain a comprehensive coverage of peptide sequence on protein biologies.
  • characterization of a primary structure of a protein may utilize a single analytical method or, as described above with respect to FIG. 12, two or more analytical methods, each employing different fragmentation stages, in combination.
  • a detailed discussion of various mass spectrometry-based approaches, including analytical methods corresponding to bottom- up, middle-down, and top-down approaches, is included in PCT/US2014/059150 and PCT/US2016/053434, the contents of which are incorporated herein by reference in their entirety.
  • the protein is digested into large peptide fragments, which are subsequently fragmented using CID and/or ETD, on a mass spectrometer.
  • CID and/or ETD CID and/or ETD
  • a top-down approach can be used as an alternative to bottom-up and middle-down methods.
  • An intact protein without digestion can be directly measured by mass analyzers and subsequently fragmented by CID, HCD and ETD.
  • Top-down sequencing allows location of post-translational modifications and differentiating isomers which could be lost in bottom-up and/or middle-down approaches.
  • the use of two or more of these approaches of mapping peptide fragments enables high probabilities of a full coverage of peptide sequences on proteins.
  • Amino acid modifications can occur in biological molecules during manufacture, formulation, or storage as a consequence of protein degradation or post-translational modifications. Protein degradation can occur as a result of chemical and physical modification. Chemical modification can change peptide backbone amino acids through oxidation, deamidation, isomerization, and racemization. Physical modification can trigger unfolding, misfolding, or aggregation on proteins. The altered or modified amino acids can be detected via peptide sequencing using enzymes, chemicals, and MS fragmentation techniques, as described herein. The key for identifying amino acid
  • modifications is to locate modification sites which can be found according to mass differences between modified (observed MS spectra) and un-modified (theoretically predicted MS spectra) amino acids.
  • MS/MS analysis uses MS/MS methods to determine the modification site of interest. Examples of particular amino acid modifications and their characterization are described in detail in PCT/US2014/059150 and
  • N-terminal modifications include, for example, acetylation, methylation, formylation, cyclization of glutamine, myristoylation, phosphorylation, and glycosylation (Meinnel et al., PROTEOMICS (2008) 8: 626-649).
  • acetylation takes place mostly at a lysine (Lys) residue; formylation is often observed on an N-methionine (Met) residue; cyclization converts glutamine (Gin) to pyroglutamic acid (pGlu) which is observed in mAb; and myristoylation usually occurs to a glycine (Gly) residue.
  • the digested protein containing peptide fragments including N-terminal peptide is subjected to LC-MS/MS analysis.
  • the mass of the N-terminal peptide with a modification on amino acid can be monitored by LC-MS/MS.
  • the type of N-terminal modification can be distinguished by mass differences (for example, +42 for acetylation; + 14 for methylation; +28 for formylation; + 17 for cyclization of glutamine to pyroglutamic acid; +210 for myristoylation; +80 for phosphorylation) through the MS full scan followed by LC separation.
  • the N-peptide containing amino acid modification is selected as a precursor ion and then subjected to CID- MS 2 and/or ETD-MS 2 to produce more fragments.
  • the modification on the N- amino acid residue can be detected by additional masses on b fragmentation ions observed from CID-MS 2 (for example, +42 for acetylation) and no mass shift on y fragmentation ions.
  • C-terminal heterogeneities often occur in recombinant monoclonal antibodies (Liu et al, JOURNAL OF PHARMACEUTICAL SCIENCES (2008) 97(7): 2426-2447).
  • One of the most common C-terminal heterogeneities is the incomplete C-terminal lysine processing of the heavy chain during production of monoclonal antibodies to produce three antibody species containing zero, one, and two C-lysine residues.
  • the antibody is subjected to digestion using enzymes and then separation of heavy chains from light chains using molecular weight cut-off centrifugation filters (for example, 10,000 Da cut-off).
  • the heavy-chain fragments containing C-terminal processing lysine peptide species are subjected to LC separation (e.g., use of reverse-phase LC column), followed by MS/MS analysis.
  • LC separation e.g., use of reverse-phase LC column
  • MS/MS analysis e.g., MS/MS analysis of peptides containing heterogeneous C- terminal lysine residues
  • LC separation e.g., use of reverse-phase LC column
  • MS/MS analysis e.g., MS/MS analysis of peptides containing heterogeneous C- terminal lysine residues can be separated by LC and then identified by MS.
  • a reduction of 128 Da in mass indicates a removal of one C-terminal lysine residue.
  • the positive charge state on the removal of one C-terminal lysine peptide is decreased by 1 unit as well. Therefore, various C-terminal lysine species can be identified by LC-MS based on the
  • Biologies can be oxidized if oxygen radicals or metals are present in the environment.
  • the most common oxidation occurs to amino acids containing a sulfur atom such as methionine (Met) and cysteine (Cys) or an aromatic ring such as histidine (His), tyrosine (Tyr), tryptophan (Tip), and phenylalanine (Phe) (Patal et al, BIOPROCESS
  • Trp can be oxidized by light (also known as photo-oxidation) to form oxidation products such as N-formylkynurenine and kynurenine (Li et al, BIOTECHNOLOGY AND BIOENGINEERING (1995) 48: 490-500).
  • Photo- oxidation of Tyr can form 3,4-dihydroxyphenylalanine (DOPA) and dityrosine, resulting in covalent aggregation through forming Tyr-Tyr cross links.
  • DOPA 3,4-dihydroxyphenylalanine
  • Protein oxidation can be measured through LC-MS analysis of a protein digest.
  • Use of theoretically predicted masses of peptides containing potential oxidation products (for example,+ 16 Da or +32 Da) and fragmentation pattern observed on peptide fragments can identify oxidation products on protein.
  • Deamidation occurs in many recombinant proteins by removing an amide group from an amino acid such as asparagine (Asn) and glutamine (Gin) (Patal (2011) supra).
  • Deamidation is a non-enzymatic process that can take place spontaneously in proteins or peptides in vivo or in vitro systems. Consequently, proteins undergo isomerization and racemization after deamidation.
  • Asn is initially converted to aspartic acid (Asp) by the non-enzymatic deamidation process, which can be identified through a mass shift of +1 Da on a mass spectrometer.
  • Isoaspartic acid (isoAsp ) is then formed via isomerization of Asp.
  • the isoAsp and Asp peptide products are normally separated by LC and subsequently identified by MS/MS.
  • D-Asp succinimide intermediate generated during Asn deamidation process
  • D-Asp succinimide intermediate generated during Asn deamidation process
  • racemization the rate of deamidation on an intact protein is very slow; whereas the deamidation rate can be increased significantly for peptides under alkaline condition" (Hao et al, (2011) MOLECULAR & CELLULARPROTEOMICS 10.10).
  • D,L-Asp and D,L-isoAsp peptides predominated with isoAsp peptides
  • racemization during trypsin digestion using buffer at pH 8.
  • deamidation of Gin is much slower compared to the deamidation of Asn. It may be important to avoid inducing in-vitro deamidation during sample preparation while identifying in-vivo deamidation sites on proteins. Modified sample preparation procedures may be needed to identify deamidation modifications on proteins. For example, a protein sample can be subjected to trypsin digestions under pH 6.5 and pH 8, respectively. In vivo deamidation products can be distinguished from in-vitro products by profiling the digested proteins (pH 6.5 vs. pH 8) by LC-MS. Peptides obtained from protein digestion at pH 6.5 serves as a control to filter in-vitro induced deamidation peptide products at pH 8.
  • Post-translational modifications play an essential role in protein functions which regulate cellular process. Post-translational modifications occur after the translation of mRNA.
  • a post-translational modification is a biochemical process where amino acid residues are covalently modified by removing or adding molecules in a protein. These modifications can change a protein's folding, biological function, immunogenicity, and/or stability (Farley et al, METHODS IN ENZYMOLOGY (2009) 463: 725-762; Walsh et al, NATURE BIOTECHNOLOGY (2006) 24(10): 1241-1252).
  • Post-translational modifications include, but are not limited to, acetylation, acylation, ⁇ -carboxylation, ⁇ -hydroxylation, disulfide bond formation, glycosylation, methylation, phosphorylation, proteolysis processing, and sulfation.
  • acetylation, methylation, amidation, phosphorylation, and glycosylation are commonly found in approved therapeutic protein drugs and candidates in discovery or clinical trial stages. These modifications may take place in N-terminal, C-terminal, or side chain amino acids.
  • Heterogeneous species can be formed after post-translational modifications, such as glycosylation or amidation, which may or may not alter protein folding and function.
  • Post- translational modifications usually occur during production of biologies in a cell or cell system. Accordingly, characterization of post-translational modifications provides structural insight to allow associating structure with biologic functions, as well as maintaining control over biologic manufacturing procedures.
  • Use of mass spectrometric methods for characterization of post-translational modifications on proteins, and MS fragmentation techniques play critical roles in producing specific types of fragments to allow identifying post-translational modification sites in which amino acid residues are modified. Examples of characterization of protein post-translational modifications are described in detail in
  • methylation involves adding one or more methyl groups onto amino acids.
  • N-methylation can be found at the N-Terminal alanine, isoleucine, leucine, methionine, phenylalanine, proline, tyrsosine and/or the side chains of lysine, arginine, glutamine, asparagine or the imidazole ring of histidine residues (Paik et al, YONSEI MEDICAL JOURNAL (1986) 27(3): 159-177).
  • O-methylation can be observed either at a C-Terminal cysteine, leucine, lysine or at the side chain of glutamic acid and aspartic acid residues (Paik (1986) supra). S-methylation can be noted at the side chains of methionine and/or cysteine residues.
  • Acetylation transfers an acetyl group to the side chain of lysine (also known as lysine acetylation) or the N-terminal amino acid residue (also known as N-terminal acetylation).
  • methylation and acetylation modifications on proteins remain unchanged during sample preparation (for example, digestion by enzymes).
  • Methylated peptide species can be identified by additional masses (for example,+ 14 Da for mono-methylation, +28 Da for di-methylation) followed by LC separation. Though trimethylation and acetylation modifications provide the same additional mass of 42 (Da), the identification can be carried out by the CID-MS2 fragmentation. For example, an unique immonium ion of mlz 126 can be observed in acetylated lysine but not present in tri-methylated lysine residues (Farley (2009) supra). In addition, a neutral loss of 59 Da, corresponding to the loss of trimethylamine, is unique for a tri-methylated lysine.
  • Phosphorylation typically occurs at serine, threonine or tyrosine residues of proteins or peptides.
  • phosphorylation is a reversible post-translational modification occurring in cellular process to control protein activities.
  • Phosphorylation is one of liable posttranslational modifications.
  • the phosphate groups on serine and threonine residues can compete with the peptide bones as preferable cleaved sites.
  • peptides containing phosphorylated amino acid residues tend to lose the phosphor groups before they fragment along with the peptide backbone. As a result, mixed fragments are obtained, which cannot be differentiated between unmodified and phosphorylated peptides.
  • CID-MS 2 CID-MS 2
  • ETD-MS 2 ETD-MS 2
  • CRCID-MS 3 CRCID fragmentation techniques
  • a phosphorylated protein sample is subjected to digestion using Lys-C to generate large proteolytic peptides.
  • a large pore size of monolithic LC column such as polystyrene-divinylbenzene (PS-DVB, 50 ⁇ i.d. x 10 cm) can be used to separate large peptides including unmodified and phosphorylated peptides.
  • CID and ETD are performed under both dependent and independent modes on a mass spectrometer (for example, Thermo LTQ Obitrap Elite ETD).
  • CID and ETD can manually select less intensity of precursor ions (usually phosphorylated peptides) for subsequent fragmentation (for example CID-MS 3 ), which is normally missed during data dependent experiments. Furthermore, fewer fragment ions ( c and z ions) are obtained for a large peptide with a less charge (for example, +2) in the ETD- MS 2 scan, resulting in insufficient fragmentation for peptide assignment.
  • a combination of using ETD (ETD-MS 2 ) following CRCID (CID-MS 3 ) via isolating an product ion produced in the ETD scan step can produce substantial c and z ion series along with phosphorylation sites on peptides.
  • the peptide assignment for phosphorylated peptides and unmodified peptides is achieved using software(s) (for example, PepFinder and/or Proteome Discoverer). Besides mapping fragment ions, HPLC retention times and masses (for example, a loss of 98 Da as signature of phosphorylated peptide) are the key to assign the peptide identity.
  • HOS Higher order structures of a biologic protein include the secondary, tertiary, and quaternary structures.
  • HOS provide a three-dimensional (3D) confirmation, which plays an important role in its biological function.
  • HOS are considered to be critical quality attributes because changes in HOS may affect efficacy or safety of biologic drugs.
  • HOS of a biologic protein is required by regulatory agencies (for example, USFDA quality by design, QbD and ICH Q5E (ICH HARMONISED TRIPARTITE
  • HOS are often required during manufacturing of biologies (for example, comparability evaluations), formulation, stability assessment, and process development.
  • Circular dichroism (CD) spectroscopy Li et al, JOURNAL OF PHARMACEUTICAL SCIENCES (2011) 100(11): 4642-4654
  • X-ray crystallography Harris et al, J. MOL. BIO. (1998) 275: 861- 872)
  • NMR nuclear magnetic resonance
  • PHARMACEUTICAL SCIENCES (2013) 102(6): 1724-1733 are the conventional tools used to analyze HOS of a protein.
  • Hydrogen/deuterium exchange coupled with mass spectrometry (HDX MS) can be used to probe HOS of a biologic.
  • HDX MS can provide a 3D confirmation of an intact molecule (Engen, ANAL. CHEM. (2009) 81(19): 7870-7875) and a local confirmation of fragments of a biologic, such as peptides (for example, peptide epitopes) (Coales et al, RAPID COMM. MASS SPECTROM. (2009) 23: 639-647).
  • HDX MS over x-ray crystallography and NMR spectroscopy are: 1) that it provides dynamic conformational information of native biologies in solution; 2) that it is unlimited by the size of proteins or biologies being interrogated; and 3) sensitivity (i.e., less material required for HDX MS analysis) (Berkowitz et al., NATURE REVIEWS DRUG DISCOVERY (2012) 11 : 527-540).
  • HOS of biologies also include disulfide bonds, disulfide knots, and glycosylation, which are described in detail in PCT/US2014/059150 and
  • Disulfide bonds primarily control the folding of three-dimensional protein structure, and generally fall into three groups: 1) intra-chain disulfide bonds; 2) inter- chain disulfide bonds; and 3) disulfide knots.
  • intra-chain disulfide bonds stabilize the tertiary structure and the inter-chain disulfide bonds involve in stabilizing quaternary structure.
  • Disulfide knots can improve protein structural stability. Any modifications to the process of producing a biologic (e.g., changes in cell lines, cell culture medium, agitation force etc.) have the potential to cause protein conformational changes due to disulfide bond rearrangements (for example, unpaired or mispaired disulfide bonds).
  • disulfide bonds are critical structural attributes, which should be monitored for quality control purposes during manufacture or storage of biologic or biologic reference.
  • Intra-chain disulfide bonds occur within a single polypeptide whereas interchain disulfide bonds are formed between two polypeptide chains through oxidation of thio (-SH) groups on cysteine residues.
  • a conventional approach for characterizing disulfide bonds includes comparing reduced and non-reduced peptide maps to help locate disulfide bonds on peptide backbones.
  • a protein sample is digested by enzyme with and without reduction and alkylation to generate two protein digests (for example, protein digest 1 (PD 1) with reduction and alkylation; protein digest 2 (PD2) without reduction and alkylation).
  • PD1 and PD2 are subjected to LC-MS analysis.
  • the disulfide-linking peptide (DSLP) can be found in the PD2 sample using a theoretical mass to locate retention time on LC-MS mass chromatogram. T hen, the sequence of DSLP can be determined using ETD to cleave a disulfide bridge followed by CID to break peptide amide bonds subsequently. As expected, DSLP should not be detected in a PD 1 sample. However, LC-MS analysis cannot differentiate an intra-chain disulfide bond from an inter-chain disulfide bond.
  • the sample can be subjected to SDS-PAGE gel electrophoresis under reduction and non-reduction conditions.
  • Disulfide knots are structural motifs often found in proteins and typically comprise at least three disulfide bonds (six cysteine residues), where one disulfide bond passes through the ring of the other two disulfide bonds.
  • Some therapeutic protein biologies for example, recombinant human arylsulfatase A
  • disulfide knots which can be scrambled or shuffled during expression, purification, or storage. It can be difficult to verify a protein bearing disulfide knots with a correct position since there are many ways to arrange a disulfide knot (Ni et al, J. AM. Soc. MASS SPECTROM. (2012) 24: 125-133).
  • Enzymes or CID typically do not cut the peptide backbone disposed within a disulfide knot. The generation of desirable sizes of peptide fragments is important in the successful
  • Glycosylation is also important for the production of biologies because, for example, more than 90% of the protein drugs such as monoclonal antibodies are glycoproteins.
  • glycosylation is the most complex post-translational modification, where sugar moieties play roles in protein binding, conformation, stability, and activity (Walsh (2006) supra). Glycosylation can significantly impact the potency, pharmacokinetic, or immunogenicity of a biological drug if any modifications (for example, changing cell lines) occur during the manufacturing process. Additionally, it can be difficult and impractical to produce a homogeneously glycosylated protein. Although the production of biologies is monitored under a good manufacturing process (GMP), heterogeneous species of glycoproteins (for example, different forms of glycan linked with a protein) can only be minimized. Thus, glycosylation is a critical attribute for a therapeutic protein.
  • GMP good manufacturing process
  • glycosylation can be grouped into five types: N-linked (glycan attached to the amino group of asparagine ), O- linked (glycan bound to the hydroxyl group of serine or threonine), C-linked (glycan added onto the indole ring of tryptophan), phospho-linked (glycan linked to serine through phosphodiester bond), and glypiation (glycosylphosphatidylinositol anchor linked a phospholipid and a protein through a glycan linkage) (Ni (2012) supra).
  • N-linked glycan attached to the amino group of asparagine
  • O- linked glycan bound to the hydroxyl group of serine or threonine
  • C-linked glycan added onto the indole ring of tryptophan
  • phospho-linked glycan linked to serine through phosphodiester bond
  • glypiation glycosylphosphatidylinositol
  • Deglycosylation is essential for identification of the peptide and the site of glycosylation. After producing the peptide backbone via deglycosylation, glycan attached on the peptide (known as glycopeptide) can be predicted by subtracting the molecular weight of the peptide from that of the glycopeptide.
  • PNGase F can remove most N-linked glycans except for a fucose-a(l-3) bound to the Asn-GlcNAc linkage.
  • N-glycosidase A can be used to release oligosaccharides containing an a(l-3) fucose core. There is no enzyme like PNGase F that can remove "intact" O-linked glycans.
  • O-linked glycans can be achieved using a series of exoglycosidases to hydrolyze various types of monosacchrides until only the Gal-J3(l,3)-GalNAc core remains.
  • 0-glycosidase endo-a-N- acetylgalactosamindase
  • Gal-J3(l,3)-GalNAc core structure can then release the Gal-J3(l,3)-GalNAc core structure from the serine or threonine residues (Iwase et al, METHODS IN MOLECULAR BIOLOGY (1993) 14: 151-159).
  • glycosylation site can be accomplished in parallel with peptide sequencing because N-linked asparagine or O-linked serine/threonine residues are the known as glycosylation sites.
  • glycan can be collected after enzymatic or chemical digestion.
  • CQA Critical Quality Attribute
  • CQAs Critical Quality Attributes
  • a CQA mapping study generates a map of such features in a given biologic.
  • the generated CQA map identifies, among other things, what structural features of the given biologic are critical for efficacy, what substructures are susceptible to degradation or modification, and what areas may induce aggregation.
  • a CQA map for a target biologic directly informs development of reproducible manufacture and control protocols, and permits the prediction of the effect of selection of particular starting materials or manufacturing process steps.
  • Such a map also provides developers, manufacturers, and potentially regulators, with both prospective and retrospective assurances that product quality specifications will be and have been met, and permits the development and production of consistent quality product.
  • the map also can inform development of in-process testing, release testing, process monitoring,
  • a CQA map is generated by characterizing the target biologic directly.
  • a CQA map for a target biologic is determined by characterizing a reference biologic that is representative of the target biologic. Accordingly, generating a CQA map for a target biologic may comprise performing the steps described below, and in additional detail in PCT/US2014/059150, the contents of which is incorporated herein by reference in its entirety, directly on the target biologic itself, or on a reference biologic for the target biologic.
  • Generating the CQA map for a target biologic includes, in a first phase, determining structural features of the target biologic and/or a reference biologic.
  • Such structural features include, without limitation, a molecular weight of the target and/or reference biologic, a primary structure of the target and/or reference biologic, identification and/or quantification of amino acid modifications, identification and/or quantification of various post-translational modifications, and determination of HOS of the target and/or reference biologic.
  • Such structural features may be determined via appropriate structural characterization studies, as described herein as well as in PCT/US2014/059150 and PCT/US2016/053434, the contents of which are incorporated herein by reference in their entirety.
  • the target and/or reference biologic in a second phase, is subjected to various conditions that (a) stress the molecule potentially to result in its modification, degradation, denaturation, contamination, instability or aggregation, and/or (b) assess the efficacy/safety of the target and/or reference biologic including in cell-based, in vivo or clinical assays in order to determine overall stability and/or efficacy/safety profiles.
  • the target and/or reference biologic is subjected to high temperature, physiological temperature, light, pH changes, enzymes that are commonly present in production broths, lyophilization and reconstitution, changes in ionic environment, mechanical stresses such as filtration and other separation techniques, accelerated aging conditions, and/or conditions that the molecule might encounter in vivo, such as physiological temperature, and various biomolecules (e.g., proteases), ions, or dissolved gases.
  • the as-stressed target and/or reference biologic, and/or its derivatives, fragments, or degradation products are themselves analyzed, for example, to determine the effect, if any, of the stress on its structure.
  • the structures then are compared to that of the intact, target and/or reference biologic, and optionally, as may be necessary, activity assays are conducted on the intact and/or as-stressed molecular species which exhibit an alteration in structure.
  • This process results in (for example, by mass spectrometric analysis) data indicative of the structures of the as-stressed reference biologic, and optionally derivatives, fragments, or degradation products thereof.
  • the data then can be analyzed computationally to determine which operational parameters used in the expression, purification, formulation or storage of the reference biologic result in or pose a risk of degrading, modifying, or contaminating the biological.
  • the generated CQA map can reveal which particular attributes or modifications affect the molecule's stability or activity, which attributes or modifications are innocuous, and what specific conditions induce the modifications.
  • the map thus allows determination of (a) which attributes actually relate to and/or impart function, or not, and (b) which processing parameters degrade or risk degrading the structural features of the biologic known to be material to its function and safety.
  • the map enables determination of which structural features of the protein matter and which don't, which features are susceptible to alteration or degradation caused by selection of particular raw materials used and/or processing steps employed in its expression, purification, or formulation, and from this directly informs downstream decisions in development, regulatory manufacturing, and Quality Assurance (QA)/Quality Control (QC) processes. Further details regarding CQA mapping are described in PCT/US2014/059150, the contents of which is incorporated herein by reference in its entirety.
  • biologic When a biologic is administered to a subject, various structural features of the biologic may or may not be modified during its in vivo residence within the subject.
  • An in vivo comparability profile provides a measure of in vivo comparability - that is, a degree of structural similarity between two biologic molecules and/or their metabolites after a period of in vivo residence in a subject.
  • an in vivo comparability profile is a compilation of comparative data (including statistically processed data) indicating a structural feature or set of structural features of a biologic molecule and/or one or more of its metabolites after a period of in vivo residence in a subject when compared to the same structural feature or set of structural features in a reference biologic molecule and/or one or more of its metabolites.
  • Determining an in vivo comparability profile of a target biologic comprises obtaining data indicative of the structure of the target biologic or a metabolite thereof following its extraction from a sample (e.g., a cell; e.g., a tissue; e.g., a bodily fluid) removed from a subject at a specific time interval following administration of the target biologic to the subject.
  • the data indicative of the structure of the target biologic is compared to data indicative of the structure of a reference biologic the structure of the reference biologic drug or a metabolite thereof generated through analysis of a sample taken at the same time interval from a subject to whom the reference biologic drug had been administered, thereby to produce an in vivo comparability profile of the candidate biologic drug to the reference biologic drug.
  • Structural features of the target biologic and the reference biologic can be analyzed following in vivo residence using any of the approaches described above.
  • the biological information can further define those aspects of the structure of the candidate and reference biologies that are critical to safety, purity, and/or potency.
  • the resulting data can be analyzed to determine the effect of in vivo residence on the biologic.
  • the data preferably is analyzed computationally using a conventional computer or computer system to identify structural changes, and to identify critical structure or substructures of the biologic.
  • the in vivo comparability profile may include data indicative of whether a feature is more or less common in the candidate biologic drug or a metabolite thereof, relative to the reference biologic drug or a corresponding metabolite thereof.
  • the profile may indicate that the candidate biologic drug is phosphorylated at a different amino acid residue than in the reference biologic drug, or it may indicate that the candidate biologic drug is phosphorylated more or less often at a particular residue than in the reference biologic drug.
  • structural similarities between the candidate biologic or a metabolite thereof and a reference biological or a corresponding metabolite thereof can be used to support the comparability and similarity of the candidate biologic and the reference biologic. Further details regarding determination of in vivo comparability profiles are described in PCT/US2016/053434, the contents of which is incorporated herein by reference in its entirety.
  • a biosimilar/reference comparison study aims to demonstrate biosimilarity between a target biologic and a reference biologic, also referred to as an "originator biologic" within the context of demonstrating biosimilarity.
  • Biosimilar/reference comparison studies leverage analytical methods to structurally characterize both the target biologic and the reference biologic and compare structural features of the target biologic with structural features of the reference biologic. Structural characterization of the target and reference biologies may be carried out via any combination of one or more of the structural characterization studies described in Sections A.iii.a - A.iii.h above.
  • Biosimilar/reference comparison studies address the expiration of originator biologic drug patent protection and the advent of the biosimilar 351(k) approval pathway, which have triggered a number of changes to the characterization and the regulation of these products.
  • Biologic License Application (BLA) reviews have primarily focused on the manufacturing process to ensure consistent production of the "same" product through control and validation of process conditions. Because biologic drugs are produced in living cells, small changes in the production conditions can results in larger changes to the drug substance that may impact safety or efficacy. Thus, biosimilar manufacturers may not be able to exactly replicate the originator's manufacturing process for a variety of reasons. Accordingly, demonstration of biosimilarity to the originator biologic needs to be demonstrated through analytical similarity assessments - that is, the structural characterization of target and a reference biologies determined through the use of analytical methods as described above.
  • Statistical tools for assessing biosimilarity from analytical data obtained via use of analytical methods to characterize a biologic are typically the same ones used to demonstrate comparability between pre- and post-change commercial processes, and include tests of statistical equivalence, statistical tests for differences, statistical intervals, and visualization and summary statistics.
  • structural characterization of biosimilars and comparisons with the structural characterization of their reference products using analytical methods that comprise various analytical stages such as those described in A.i and A.ii is an essential component of the development and approval process.
  • a lot comparison study aims to compare structural features of two or more biologies manufactured in different lots.
  • manufacturing changes to the drug substance and/or the drug product are common.
  • the manufacturer is responsible to demonstrate adequate and appropriate comparability between pre-change and post-change product lots.
  • Lot comparison studies provide structural characterization data that allows a manufacturer of a biologic to demonstrate that the changes (for example, changes in upstream processing such as cell lines or cell culture conditions, or changes in downstream processing such as purifications resins, buffers, or formulation, or changes of manufacturing facility or equipment) do not have an adverse effect on the quality, safety and efficacy of the manufactured biologic products.
  • product quality attributes examples include: glycosylation profile, charge distribution, product impurities, process impurities, aggregates, particulate matters, stability and potency.
  • any combination of one or more of the structural characterization studies described in Sections A.iii.a - A.iii.h above may be used to obtain structural characterization data used to demonstrate that relevant structural features of biologies are maintained from lot to lot. Accordingly, determining appropriate study designs via selection of appropriate combinations analytical stages, such as those described in Sections A.i and A.ii above, for characterization of biologies produced in different lots is an essential component of the development and approval process.
  • the study design and method capture technology described herein receives as input a user query comprising information regarding (i) the target biologic and (ii) the particular desired structural characterization.
  • the information regarding the target biologic comprises at least a portion (up to all) of a nominal primary structure of the target biologic.
  • the nominal primary structure is the nominal amino acid sequence that is coded for via the nucleic acid sequence (e.g., a DNA sequence; e.g., an RNA sequence) used to produce the target biologic.
  • the nominal amino acid sequence may cover the full target biologic, or a portion thereof, such as a specific fusion portion of a fusion protein to be analyzed.
  • a user provides the nominal primary structure via a text file that comprises an ordered list of the nominal amino acid sequence of the target biologic.
  • other forms of input are used.
  • a user may input a reference number to a public or proprietary database.
  • information regarding the target biologic comprises results from previously carried out structural characterization of the target biologic or from generally known information about the biologic (e.g., a molecule type). For example, if a primary structure of the target biologic has been previously characterized via an experimental study, then a user may input the measured primary structure of the target biologic (e.g., in addition to the nominal primary structure; e.g., instead of the nominal primary structure).
  • results from other types of previously carried out studies are included as information about the target biologic.
  • previous measurements of disulfide linkages or glycosylation of the target biologic may be included in the user input query.
  • such information e.g., glycans, post-translational modifications, disulfide bonds
  • the tool receives as input additional information about the target biologic, such as an identifier of a molecule type.
  • Molecule types that may be identified include, but are not limited to, a recombinant protein, a fusion protein, a monoclonal antibody, and an antibody-drug conjugate.
  • a user may input an identifier of a molecule type through a variety of user interactions.
  • molecule type may be identified via a textual label that refers to one or more entries in a molecule type dictionary stored in memory.
  • the label "Fc-fusion" may be used to identify an Fc-fusion protein molecule type.
  • a user may thus provide text input identifying the molecule type at a command line prompt.
  • User input of a molecule type may also be provided via a GUI. For example, a user may select one or more molecule types from a drop down list, or other types of graphical control elements (e.g., radio boxes, check boxes, and the like).
  • Bioprocess information may include an identifier of a cell culture type that is used to produce the target biologic.
  • a user may input an identifier of a cell culture type through a variety of user interactions.
  • a cell culture type may be identified via a textual label that refers to one or more entries in a cell culture type dictionary stored in memory.
  • the label, "CHO” may be used to identify that a Chinese Hamster Ovary cell culture was used to produce the target biologic
  • E-coli may be used to identify that the target biologic was produced using an E-coli cell culture.
  • various levels of specificity are used to identify a cell culture type.
  • a cell culture type may be identified as bacterial or not, or mammal or not.
  • a user provides text input identifying the cell culture type at a command line prompt.
  • User input of a cell culture type may also be provided via a GUI.
  • a user may select one or more cell culture types from a drop down list, or other types of graphical control elements (e.g., radio boxes, check boxes, and the like).
  • Bioprocess information may also include information about purification stages subsequent to harvesting the cell culture supernatant. For example, bioprocess information may identify purification steps such as of Protein A initial purification followed by hydrophobic interaction chromatography. Such steps may be identified by various parameters, such as textual labels of particular purification steps.
  • the tool also receives as input one or more study class attributes that serve as identifiers of one or more particular desired structural
  • a user may be interested in determining one or more of (i) molecular weight, (ii) primary structure, (iii) identification, site localization, and/or quantification of various specific post-translational modifications, (iv) characterization of higher order structures (HOS) (e.g., secondary structures; e.g., tertiary structures; e.g., quaternary structures), (vi) a map of critical quality attributes (CQAs), and (vii) an in vivo comparability profile.
  • HOS higher order structures
  • CQAs critical quality attributes
  • a user may also be interested in (viii) a biosimilar to reference product comparison, and/or (ix) a lot comparison study (e.g., a lot release study; e.g., lot-to-lot comparison study).
  • study class attributes are used to identify one or more types of structural characterization studies.
  • the tool uses a dictionary of predefined study class attributes to allow a user to select the particular one or more types of structural characterizations that they would like to obtain for a target biologic.
  • the user may input the study class attributes by inputting specific keywords at a command prompt for entering an input query or selecting options via a GUI (e.g., from a dropdown menu; e.g., via check-boxes).
  • a user may enter the text "molecular weight" to indicate that the desired structural characterization that they are interested in is a measurement of a molecular weight of the target biologic.
  • Similar textual labels may be used as study class attributes in order to identify any of the various types of structural characterizations [e.g., molecular weight, e.g., primary structure; e.g., amino acid modifications (e.g., N-terminal modifications; e.g., C- terminal modifications; e.g., oxidation; e.g., deamidation, isomerization, and racemization); e.g., post-translational modifications (e.g., methylation and acetylation; e.g.,
  • phosphorylation e.g., higher order structures (e.g., secondary structure motifs; e.g., tertiary structure; e.g., quaternary structure; e.g., disulfide bonds; e.g., disulfide knots; e.g., glycosylation); e.g., epitope mapping; e.g., biosimilar comparison; e.g., lot comparison; e.g., CQA mapping; e.g., in vivo comparability profile].
  • higher order structures e.g., secondary structure motifs; e.g., tertiary structure; e.g., quaternary structure; e.g., disulfide bonds; e.g., disulfide knots; e.g., glycosylation
  • epitope mapping e.g., biosimilar comparison; e.g., lot comparison; e.g., CQA mapping; e.g
  • study class attributes may be entered.
  • study class attributes are combined, to add further specificity to a type of desired structural characterization. For example, additional attributes such as
  • characterization and/or quantification can be used to specify whether a characterization (e.g., a map identifying locations and/or nature of specific types of structural features) is desired, or a quantification study is desired.
  • Additional examples of study class attributes include combinations of structural characterization studies together with study objectives. For example, additional attributes such as "lot release testing" can be combined with
  • glycosylation analysis to indicate the purpose of the study.
  • the study design and method capture technology described herein determines a set of GBAs for the target biologic.
  • the set of target biologic GBAs is determined from received input corresponding to information about the target biologic, such as the nominal primary structure of the target biologic.
  • a variety of GBAs may be determined from a target biologic' s nominal primary structure.
  • numbers or fractions of specific amino acids e.g., Arginine (also referred to as Arg, or R); e.g., Lysine (also referred to as Lys, or K), numbers or fractions of particular types of amino acids (e.g., aromatic amino acids; e.g., hydrophobic amino acids; e.g., hydrophilic amino acids) can be determined and included in the set of target biologic GBAs.
  • specific amino acids e.g., Arginine (also referred to as Arg, or R); e.g., Lysine (also referred to as Lys, or K)
  • numbers or fractions of particular types of amino acids e.g., aromatic amino acids; e.g., hydrophobic amino acids; e.g., hydrophilic amino acids
  • characteristics of expected di-sulfide bridges and knots are predicted using a target biologic' s molecule type or may be known a priori from other sources.
  • Characteristics of expected di-sulfide bridges and knots include a predicted nature and/or location of di-sulfide bridges and or knots, as well as an extent of disulfide bridges and/or knots in the target biologic (e.g., number of disulfide bridges and/or knots; e.g., pairs of nested cysteines, unpaired cysteines, and the post- translational modification of cysteine to formylglycine).
  • these determined characteristics of di-sulfide bridges and/or knots are included in the set of target biologic GBAs.
  • metrics characterizing predicted results of various enzymatic digestion steps are determined and used as target biologic GBAs.
  • a number of or average size of e.g., monoisotopic molecular weight of digest fragments; e.g., average molecular weight of digest fragments; e.g., sequence length expressed in terms of number of amino acids
  • peptide fragments resulting from one or more specific enzymatic digestion steps can be predicted using the target biologic' s nominal primary structure, and included in the set of target biologic GBAs.
  • statistical distributions (e.g., frequency of fragment lengths or weights) of peptide fragments resulting from one or more specific enzymatic digestion steps can be predicted using the target biologic' s nominal structure, and included in the set of target biologic GBAs.
  • these predictions of enzymatic digests may be built on deterministic models of enzyme specificity.
  • these predictions of enzymatic digests may be built on stochastic models of enzyme activity.
  • the molecule type is used as a target biologic GBA (e.g., included in the set of target biologic GBAs).
  • the set of target biologic GBAs includes GBAs that characterize structural elements that either do or are likely to occur in the target biologic. These structural elements influence the applicability of particular analytical methods to the target biologic.
  • the tool utilizes a determined set of target biologic GBAs to identify analytical methods that can be used to obtain a desired structural characterization of the target biologic, as identified by the input study class.
  • a set of target biologic GBAs is compared with GBAs associated with analytical method records stored in the method store in order to determine the sequences of analytical methods that form the study design results.
  • GBAs of a target biologic are determined via a preprocessing step, which may be implemented by a preprocessing module, as shown in FIG. 2A.
  • the preprocessing module may be the same preprocessing module that is used to determine GBAs of known biologies, for inclusion in the method store as described below, and illustrated in FIG. 1A and FIG. IB.
  • FIG. IB shows an example of output from a preprocessing module, that illustrates several GBAs including identification of cleavage sites by particular enzymes in a protein primary sequence, identification of N-linked glycosylation motifs, digest fragment statistics and mass tabulations, amino acid statistics, and various physiochemical properties.
  • GBAs of the target biologic include one or more study class attributes, input by the user as described above.
  • GBAs of a given target include one or more bioprocess attributes that represent parameters of a biomanufacturing process used to produce the given biologic. Such bioprocess attributes may be input directly by a user, as described above. D. Method Store
  • FIG. ID is a block flow diagram illustrating an exemplary process 140 for populating a method store.
  • an analytical stage record is created.
  • the analytical stage record corresponds to a specific analytical stage that has been implemented as a step of an analytical method used in an analytical study for structural characterization of an associated known biologic.
  • the analytical stage record is stored in the method store.
  • one or more GBAs of the associated known biologic are stored in the method store.
  • the one or more known biologic GBAs are linked with the analytical stage record.
  • the method store is a database that stores a plurality of analytical method records and/or a plurality of analytical stage records.
  • each analytical method record is a data structure that represents a particular analytical method that was (e.g., previously) applied to characterize a specific associated biologic.
  • each analytical stage record is a data structure that represents a particular analytical stage that was (e.g., previously) applied (e.g., as part of an analytical method) to characterize a specific associated known biologic.
  • An analytical stage record includes data representing (i) an identifier of the particular analytical stage it represents and (ii) a series of parameter values that were used in the application of the analytical stage that the analytical stage record represents.
  • an analytical stage record representing an enzymatic digestion stage may be represented by a data structure such as the one shown in FIG. 3.
  • the analytical stage record, "AS 1 " includes a series of field/value pairs that identify the particular type of analytical stage along with a set of parameters used in application of the stage.
  • the "stage” field stores a text label (e.g., a string value) that identifies the particular stage.
  • the label "single digest" identifies a single digest enzymatic digestion stage.
  • an "enzyme map” field that stores a text label identifying the particular type of enzyme used (e.g., trypsin in the case of FIG. 3)
  • an "incubation time field” that stores a value specifying the amount of time (e.g., in minutes; e.g., via a numeric value) of incubation
  • an "incubation temp” field that stores a value specifying a temperature at which incubation was performed (e.g., in degrees Celsius; e.g., via a numeric value)
  • an "incubation pH” field that stores a value specifying a pH at which the incubation was performed (e.g., via a numeric value).
  • FIG. 1C and FIG. 4 shows example hierarchical organizations of various analytical stages, along with relevant parameters for various stages that may be represented by analytical stages records. In certain embodiments, various other analytical stages may be utilized and represented by analytical stage records.
  • An analytical method record may include a sequence of analytical stage records representing the sequence of analytical stages used in the analytical method that the analytical method record represents.
  • the method store may store information about analytical methods used to characterize known biologies by virtue of a plurality of analytical method records, each of which represents a particular analytical method comprising a sequence of analytical stages, or via a plurality of individual analytical stage records, each of which represents an analytical stage used in an analytical method applied in characterizing a known biologic.
  • Approaches for linking GBAs with analytical method records, described below, may be applied equivalently to analytical stage records.
  • machine learning techniques described for identifying groups of related analytical methods represented by analytical method records may be applied similarly, to identify groups of related analytical stages represented by analytical stage records.
  • the organization 400 includes a sample preparation stage 410, a digestion strategy stage 420, a separation stage 430, a detection stage 440, and a mass spectrometry stage 450. Each stage can be associated with an analytical stage record which includes detailed process step information and parameters.
  • each analytical method record is associated (e.g., linked, as in a stored association in computer
  • analytical stage records are stored in the method store, and each analytical stage record is associated (e.g., linked, as in a stored association in computer code/memory) with one or more (e.g., a plurality of) GBAs of the biologic with which it is associated.
  • FIG. 1 A and FIG. IB An example process for building a method store, including the determination of GBAs is shown in FIG. 1 A and FIG. IB.
  • a preprocessing step (“Biologic molecule pre-processing") is used to determine, for a given known biologic, a set of GBAs.
  • the set of GBAs may be determined from an amino acid sequence of the known biologic. Additionally, because the known biologic has been characterized, additional information such as experimentally characterized glycans, post-translational modifications, disulfide bonds, and the like, may also be used to determine GBAs of the known biologic.
  • the one or more GBAs associated with the analytical method record are determined and stored when an analytical method record is created. Once the GBAs associated with a particular analytical method record are determined, they may be stored within the analytical method record, or elsewhere, and linked with the analytical method record.
  • FIG. 5 shows an example organization of analytical method records and their links to GBAs in the method store. GBA sets 502, 504, 506, and 508 are linked with analytical methods 512, 514, 516, and 518, respectively. As will be described below, each analytical method record is not necessarily linked to a unique set of GBAs.
  • a given analytical method record in the method store is linked to one or more GBAs of an associated biologic, but other identifying information of the associated biologic (e.g., a nominal primary structure of the associated biologic; e.g., any measured data from a study performed on the associated biologic) is not stored.
  • identifying information of the associated biologic e.g., a nominal primary structure of the associated biologic; e.g., any measured data from a study performed on the associated biologic
  • an analytical method record stores, or is linked to information about the associated biologic (e.g., a nominal or measured primary structure of the associated biologic; e.g., a molecule type of the associated biologic).
  • the associated biologic e.g., a nominal or measured primary structure of the associated biologic; e.g., a molecule type of the associated biologic.
  • one or more GBAs for the analytical method record can be determined as needed. Accordingly, in certain embodiments, if new or different sets of GBAs are relevant for a given application (e.g., addressing a given user input query), they can be determined using the stored or linked information about the associated biologic.
  • the same approaches described above for linking GBAs with analytical method records can be applied similarly to link GBAs with analytical stage records.
  • the one or more GBAs associated with the analytical stage record are determined and stored when an analytical method record is created. Once the GBAs associated with a particular analytical stage record are determined, they may be stored within the analytical stage record, or elsewhere, and linked with the analytical stage record. As will be described below, each analytical stage record is not necessarily linked to a unique set of GBAs.
  • a given analytical stage record in the method store is linked to one or more GBAs of an associated biologic, but other identifying information of the associated biologic (e.g., a nominal primary structure of the associated biologic; e.g., any measured data from a study performed on the associated biologic) is not stored.
  • identifying information of the associated biologic e.g., a nominal primary structure of the associated biologic; e.g., any measured data from a study performed on the associated biologic
  • an analytical stage record stores, or is linked to information about the associated biologic (e.g., a nominal or measured primary structure of the associated biologic; e.g., a molecule type of the associated biologic).
  • the associated biologic e.g., a nominal or measured primary structure of the associated biologic; e.g., a molecule type of the associated biologic.
  • one or more GBAs for the analytical stage record can be determined as needed. Accordingly, in certain embodiments, if new or different sets of GBAs are relevant for a given application (e.g., addressing a given user input query), they can be determined using the stored or linked information about the associated biologic.
  • an analytical method record stores, or is linked with GBAs corresponding to study class attributes that identify the particular type of structural characterization study and/or study objective in which the analytical method that the analytical method record represents was applied.
  • an analytical stage record stores, or is linked with GBAs corresponding to study class attributes that identify the particular type structural characterization study and/or study objective in which the analytical stage that the analytical stage record represents was applied.
  • the study class attributes are textual labels.
  • the study class attributes are selected from a set of predefined study class attributes (e.g., stored in a dictionary).
  • an analytical stage record such as the analytical method record shown in FIG. 3 may also include or be linked with the name value pair:
  • the name/value pair accordingly indicates that the particular analytical stage represented by the analytical stage record represents an analytical stage (e.g., a trypsin digestion step) that was used in a study characterizing di-sulfide linkages in an associated biologic.
  • An analytical method record representing an analytical method used in a study characterizing di-sulfide linkages in an associated known biologic might store, or be linked with a similar name/value pair.
  • analytical method records and/or analytical stage records are linked with, but need not store study class attributes.
  • links between (i) analytical method records and/or analytical stage records and (ii) various study class attributes may be established via a series of lists (e.g., represented as arrays), each corresponding to a particular study class attribute.
  • a list corresponding to a particular study class attributes may include identifiers (e.g., textual labels; e.g., pointers) of (i) analytical method records that represent analytical methods used in the particular type of structural characterization study that the corresponding particular study class attribute identifies and/or (ii) analytical stage records that represent analytical stages used in the particular type of structural characterization study that the corresponding particular study class attribute identifies.
  • an analytical method record stores or is linked to GBAs representing additional information about the bioprocess used to produce the associated biologic that the analytical method represented by the analytical method record was used to characterize.
  • an analytical stage record stores or is linked to GBAs representing additional information about the bioprocess used to produce the associated biologic that the analytical stage represented by the analytical stage record was used to characterize.
  • bioprocess information about an associated biologic may include an identifier of a cell culture type that was used to produce the associated biologic.
  • a cell culture type may be identified via a textual label, and stored as a name/value pair in the analytical method record.
  • analytical method records and/or analytical stage records are linked with, but need not store particular cell culture types. For example, links between (i) analytical method records and/or analytical stage records and (ii) various cell culture types may be established via a series of lists (e.g., represented as arrays), each corresponding to a particular cell culture type.
  • a list corresponding to a particular cell culture type may include identifiers (e.g., textual labels; e.g., pointers) of (i) analytical method records that represent analytical methods applied to biologies produced using the particular cell culture type and/or (ii) analytical stage records that represent analytical stages applied to biologies produced using the particular cell culture type.
  • identifiers e.g., textual labels; e.g., pointers
  • analytical method records that represent analytical methods applied to biologies produced using the particular cell culture type
  • analytical stage records that represent analytical stages applied to biologies produced using the particular cell culture type.
  • additional bioprocess information may be stored in or linked with analytical method records and/or analytical stage records in a similar fashion.
  • additional bioprocess information may include information about primary recovery, initial purification, polishing, and/or formulation stages of the biomanufacturing process used in the manufacture of the biologic molecule.
  • an analytical method record may also include (e.g., be linked with or comprise) one or more performance indices that quantifies performance of the analytical method that the analytical method record represents.
  • performance indices associated with particular analytical method records may include values representing a percent coverage of the biologic molecule's primary amino acid sequence, a sensitivity of detection of specific modifications (e.g., threshold quantification of modified peptides as a percentage of the corresponding unmodified peptides), a sensitivity of detection of sequence variants (e.g., as a percentage of the unmodified sequence), and a sensitivity of detection of impurities such as host cell proteins (e.g., in parts per million or ppm).
  • analytical stage records may be linked with or comprise performance indices.
  • Performance indices linked with or comprised in a given analytical stage record may be performance indices of an analytical method comprising the analytical stage that the given analytical stage record represents, or the performance indices may be representative of the analytical stage record itself.
  • the analytical method records and/or analytical stage records in the method store may be obtained from a variety of sources.
  • analytical method records and/or analytical stage records may be created from publicly available records of studies (e.g., published literature using article databases such as PubMed
  • a given study of a particular biologic may include one or more analytical methods, each comprising multiple analytical stages. Accordingly, a given study can be used to generate one or more analytical method records, each corresponding to a respective analytical method of the given study, as well as multiple analytical stage records, each representing a respective analytical stage of an analytical method used in the given study.
  • a first study 610 employs a single analytical method 612 ("Anal. Meth. Al”), which comprises N analytical stages, including a sample preparation stage (“Anal. Stage Al . l”), an enzymatic digestion stage (“Anal. Stage A1.2”), and a mass spectrometry stage (“Anal. Stage A1JV”).
  • the various analytical stages included in the first study are used to create corresponding analytical stage records that are stored in the method store.
  • Each of the corresponding analytical stage stores, as described above, the particular type of analytical stage that it represents along with the parameters used in application of the stage in the first study (“Study A").
  • the analytical stage records created from the first study also store or are linked with additional information, such as one or more study class parameters that identify the particular structural characterization carried out in the first study, a bioprocess used to produce a biologic characterized in the study, and/or performance indices determined directly within the study, or obtained from data produced by the study.
  • an analytical method record representing the analytical method used in the first study is created, and comprises a sequence of analytical stage records representing the stages of the analytical method.
  • the analytical method record created from the first study may store or be linked with one or more study class attributes, bioprocess information, and performance indices determined within the study.
  • Data stored in analytical stage records and/or analytical method records can be obtained from a source that describes a study in a variety of ways.
  • the source is a published document
  • an analytical stage record and/or analytical method record is created from the source manually, by an expert or a technician who reads and interprets the study, and inputs data stored in the analytical stage record and/or analytical method records manually.
  • analytical stage records and/or analytical method records may be created from published documents via automated processing using text mining and/or natural language processing.
  • a hybrid combination of interaction with a user and automated processing is used to create analytical stage records and/or analytical method records from published documents.
  • analytical stage records and/or analytical method records generated from in- house studies are created in an automated fashion via dedicated software as part of a laboratory information management system.
  • data e.g., type of analytical stage; e.g., parameters; e.g., bioprocess information; e.g., performance indices
  • data stored in analytical stage records and/or analytical method records is computed from data in a study source.
  • GBAs are determined for one or more known biologies characterized in a study, and linked with the analytical stage records and/or analytical method records created from the study.
  • GBAs may be determined based on information about the known biologic and the particular study. For example, as shown in FIG. 6, GBA sets 632, 634, and 636 are linked to analytical methods 612, 622, and 624, respectively.
  • a nominal primary structure of the known biologic is used to determine a variety of GBAs in a manner similar to that described above with regard to determination of GBAs from nominal primary structure of a target biologic.
  • known information about the known biologic is used to determine GBAs.
  • known information may include results of the study itself, other information included in the study source, information from other studies performed on the same known biologic, and generally available information, such as information stored in public databases.
  • results of the study itself may be used to determine GBAs of the known biologic.
  • a particular structural characterization study that identifies glycosylation sites of a particular known biologic can be used to directly provides GBAs corresponding glycosylation sites.
  • Such GBAs may be used instead of, or in addition to GBAs corresponding to predicted glycosylation sites.
  • a particular study employs multiple analytical methods to characterize a given known biologic, each analytical method comprising a sequence of analytical stages.
  • a second study 620 (“Study B") includes two analytical methods 622 and 624.
  • An analytical stage record representing each analytical stage used in each analytical method of the study (“Study B") may be created and stored in the method store.
  • an analytical method record can be created for each analytical method of the study.
  • Each analytical stage record, and analytical method used in the characterization of the known biologic of the study can be linked with GBAs determined for the known biologic that they were used to characterize, as described above.
  • a particular study source comprises multiple studies, performed on multiple known biologies. Accordingly, analytical stage records and/or analytical method records can be created for each analytical stage and/or method of each study, and linked with GBAs of the known biologic that they were used to characterize.
  • the systems and methods describe herein utilize (e.g., via a machine learning approach) the method store to determine one or more study design results in response to a user query.
  • the approaches described herein use the GBAs associated with analytical method records and/or analytical stage records of the method store and determined GBAs of a target biologic to identify relevant analytical method records and/or analytical stage records.
  • Relevant analytical method records and/or analytical stage records can be used to determine analytical stage results that represent specific analytical stages to be applied to the target biologic in its characterization. Sequences of determined analytical stage results may be combined to form analytical method results, representing specific analytical methods to be applied.
  • Determined study design results may include a single analytical method result, or a combination of two or more analytical method results.
  • FIG. 7 is a block flow diagram illustrating an example process 700 for determining study design results.
  • a user input is received 710.
  • the user input comprises information about the target biologic along with one or more study class parameters that identify particular desired structural characterizations of the target biologic.
  • target biologic GBAs are determined 720 using the information about the target biologic input by the user.
  • the information about the target biologic may comprise a nominal primary structure from which target biologic GBAs are determined.
  • the method store is accessed, and the target biologic GBAs are compared with known biologic GBAs linked to analytical method records and/or analytical stage records of the method store in order to determine study design results 730.
  • FIG. 2A illustrates an example interaction 200 between various components (e.g., modules, databases, and data elements) in determining study design results.
  • a received user input 202 includes GBAs corresponding to study class attributes ("Study Class"), information about the target biologic, and information about bioprocess parameters.
  • Information about the target biologic input by the user includes a nominal amino acid sequence of the target biologic ("Nominal AA Sequence").
  • the biologic preprocessing module 110 determines, using the user input nominal amino acid sequence, a plurality of GBAs corresponding to structural features of the target biologic. Additional user inputs of study class attributes, an identification of the biologic molecule type, and bioprocess parameters are directly used as GBAs of the target biologic.
  • a machine learning module 204 takes the GBAs of the target biologic (e.g., the GBAs determined from the nominal amino acid sequence by the biologic molecule preprocessing module, the study class attributes, the user input molecule type identifier, and the bioprocess parameters) as input.
  • the machine learning module 204 compares the GBAs of the target biologic with GBAs of known biologies for which analytical method records and/or analytical stage records are stored in the method store 120, and, based on this comparison, identifies relevant analytical methods records and/or analytical stage records and uses them to create a study design result.
  • the machine learning module 204 identifies patterns within the GBAs linked to the analytical method records and/or analytical stage records of the method store, and uses the identified patterns to identify relevant analytical method records and/or analytical stage records based on GBAs of the target biologic.
  • the identified relevant analytical method records and/or analytical stage records can be used to determine analytical stage results of one or more analytical method results, thereby determining a study design result 206.
  • the known biologies 224 act as training examples used to construct the method store 226.
  • the patterns of GBA and analytical method record relationships within the method store 226 act as a "hypothesis set" provided to a learning algorithm 228 when determining the analytical methods 230 for a target biologic (e.g., the "final hypothesis", which is an approximation to the unknown "target function” 222).
  • comparing target biologic GBAs with known biologic GBAs to determine study design results comprises performing classification and/or cluster analysis. For example, as shown in FIG. 8A and FIG. 8B, cluster analysis is used to categorize analytical method records in the method store as belonging to one or more particular analytical method groups based on the set of known biologic GBAs that they are linked to. In particular, in certain embodiments, one or more GBA vectors are determined for each set of known biologic GBAs.
  • a GBA vector is determined as a weighted sum of a subset of a given set of GBAs (e.g., a set of target biologic GBAs; e.g., a set of known biologic GBAs). Different GBA vectors are determined using different weightings and/or different subsets of GBAs. For a given set of GBAs, determined values of each of the GBA vectors identify a point in a multi-dimensional space (e.g., the number of dimensions corresponding to the number of GBA vectors). Accordingly, each set of known biologic GBAs maps to a point in a multi-dimensional space.
  • FIG. 8A shows a graph illustrating use of cluster analysis to group related analytical method records based on the GBAs with which they are linked.
  • an analytical method record represents a specific analytical method and is linked with GBAs of an associated known biologies that was characterized using the specific analytical method.
  • values of two or more GBA vectors, GBA; and GBA j are determined.
  • the determined values of the two or more GBA vectors map each set of known biologic GBAs to a point in the illustrated two-dimensional space, as identified by the green "x's".
  • FIG. 8 A shows three identified analytical method clusters -AM U , AM V , and AM W . As shown in the figure, each cluster corresponds to a region of the two-dimensional space represented via the two GBA vectors.
  • values of the two or more GBA vectors are determined for the set of GBAs of the target biologic.
  • the values of the two or more GBA vectors for the target biologic thus map the GBAs of the target biologic to a point in two or higher dimensional space.
  • the target biologic can be identified as belonging to a particular analytical method cluster based on whether it's GBAs map to a point within the region in space to which the analytical method cluster corresponds. Two such examples are shown in FIG. 8 A, wherein the GBAs of two different target biologies map to two different points in two- dimensional space, as indicated by the red "+'s".
  • a first target biologic is associated with analytical method cluster AM V
  • a second target biologic is associated with analytical method cluster AM U .
  • multiple different sets of GBA vectors can be used, either in combination (e.g., such that N GBA vectors define an N-dimensional space) or in multiple rounds (e.g., using a first set of N GBA vectors in a first round and a second set of M GBA vectors in a second round).
  • Different sets of GBA vectors can be used to define different multi-dimensional spaces and to map various known and target biologies to different points in these spaces based on the specific combinations and weightings of various GBAs from which the different GBA vectors are computed.
  • use of multiple different sets (e.g., in multiple rounds) of GBA vectors is valuable if an analytical method cluster cannot be identified for a given target biologic using a first set of GBA vectors.
  • a particular set of GBA vectors maps GBAs of a target biologic to a point that does not fall within any of the regions corresponding to identified analytical method clusters (e.g., as illustrated via the red "x" in FIG. 8B). Accordingly, another set of GBA vectors may be used to associate the target biologic with a particular analytical method cluster.
  • a non-linear transform is used in combination with computation of GBA vectors to separate groups of analytical methods of known biologies into different clusters.
  • FIG. 9 shows an example of an approach wherein a non-linear transform is applied to values of GBA vectors determined for a series of known biologies to determine transformed biologic attribute (TBA) vectors.
  • TSA transformed biologic attribute
  • applying a non-linear transform in this manner allows attributes that were previously non-separable (e.g., as shown in the left-hand graph) to be separated via a linear function (e.g., a line in two- dimensional space, as shown in the right-hand figure).
  • the same cluster analysis approaches described herein can be applied to identify analytical stages relevant to a given target biologic.
  • GBA vector values determined for known biologies can be used to identify groups of related analytical stages (e.g., as analytical stage clusters) and associated target biologies with particular analytical stage clusters based on a mapping of target biologic GBAs to points in multi-dimensional spaces defined by various sets of GBA vectors.
  • Multiple sets of GBA vectors, and non-linear transformation approaches may also be utilized for identification of relevant analytical stages.
  • various other machine learning approaches may be utilized, in combination with or in place of the cluster analysis approach described with respect to FIG. 8A, FIG. 8B, and FIG. 9.
  • unsupervised machine learning techniques such as k-means clustering and self-organizing maps may be used.
  • Unsupervised machine leaming techniques are useful where training data such as the analytical method and/or analytical stage records of known biologies does not include output information that characterized the performance or suitability of a particular represented analytical method or stage (represented by a record in the method store) to a given known biologic. In this manner, unsupervised leaming is viewed as the task of finding patterns and structure in input data - a way of creating a higher-level representation of the data.
  • performance indices included within and/or linked to analytical method records and/or analytical stage records are used as a measure of performance of the particular analytical method or stage that a given analytical method record or analytical stage record, respectively, represents. This allows reinforcement learning machine learning techniques to be used.
  • analytical method records and analytical stage records can be generated from corresponding analytical method results and analytical stage results, respectively.
  • Analytical method records and analytical stage records generated in this fashion can be linked with the GBAs of the target biologic that were received as input in order to determine the corresponding analytical method and analytical stage results.
  • such analytical method and analytical stage records represent examples of desired, or correct, outputs for known inputs - the target biologic GBAs that were used to determine them.
  • analytical method records and analytical stage records can be used as training data for supervised machine learning techniques, such as artificial neural networks, decision trees, regression modes, and k-nearest neighbor techniques.
  • multiple machine learning techniques are used in
  • an unsupervised machine learning technique e.g., cluster analysis
  • a supervised machine learning technique e.g., cluster analysis
  • study design results may be provided to a user in a variety of forms.
  • a study design for complex biologic molecules is typically arrived at iteratively rather than in one simple recipe.
  • the first output e.g., a first study design
  • the iterative adjustment of study design results by a human expert can provide a basis for method research and improvement. For example, once a given study design result is determined, and then adjusted by the human expert, analytical method results and/or analytical stage results of the adjusted study design result may be extracted, and stored as analytical method records and/or analytical stage records in the method store.
  • the technology "leams" a new method to be applied to a new molecule, and settings for the prior methods may then be re-adjusted accordingly.
  • analytical method records and/or analytical stage records generated from previously determined study design results can be used as training data for supervised machine learning techniques.
  • Example 1 is an example of use of the study design and method capture technology described herein for determining a study design result for performing an N-linked glycan characterization and quantitation study on a Fc-fusion protein.
  • a user inputs (i) a molecule type, (ii) bioprocess information, and (iii) an amino acid sequence (nominal primary structure) of the target biologic.
  • the molecule type is specified as a Fc-fusion protein.
  • One GBA implication of this molecule type is that the glycan profile is likely to be more complex and heterogeneous than for a monoclonal antibody (which tend to have one N-glycosylation site in the CH 2 domain of each heavy chain at Asn297). Consequently, the analysis method will be experimentally iterative as one works to optimize the method.
  • Another implication of this molecule type is for the present technology to raise a warning flag for aggregation potential.
  • the user input specifies that the target biologic is produced via a Chinese Hamster Ovary (CHO) cell culture and purified via protein A purification followed by hydrophobic interaction chromatography.
  • CHO Chinese Hamster Ovary
  • An implication of this method of production is that data analysis and interpretation would need to use a CHO N- glycan database.
  • an additional human N-glycan database would be needed to narrow down the search space.
  • the protein A purification and hydrophobic interaction chromatography bioprocess steps have no direct implication on the analysis method per se, but may be used themselves as GBAs that indicate aggregation potential, or may be used to determine a specific GBA corresponding to aggregation potential (e.g., as described below see below).
  • the desired structural characterization is a characterization and quantitation of N- linked glycans for potential QC-lot release.
  • study class attributes input by the user include attribute specifying an N-linked glycan study and that both characterization and quantification is desired. These may be input separately, e.g., as "N-linked glycan characterization", "N-linked glycan quantification", or in a hierarchical fashion, e.g., by first selecting N-linked glycan study (e.g., from a graphical control element of a GUI), followed by selection of characterization and quantification options (e.g., also via graphical control elements of a GUI). An additional attribute specifies that a lot-release study is desired (e.g., a user may input "lot-release").
  • the study class attributes have implications for the types of analytical methods that will be included in the determined study design.
  • the characterization attribute implies a need to identify glycan composition and linkage isoforms.
  • the quantification and lot-release attributes imply use of robust HPLC-based separation, rather than capillary electrophoresis.
  • the lot-release attribute also has an impact on the use of a fluorescent label, as will be described below.
  • Target biologic GBAs Preprocessing of the amino acid sequence input by the user is used to obtain target biologic GBAs that include the following:
  • N-linked glycosylation sites are predicted, three of which are located on the fusion portion of the target biologic and one of which is on the Fc end of the target biologic, and
  • a GBA corresponding to a percentage - approximately 50% - of hydrophobic amino acids on the fusion portion of the target biologic.
  • the GBA corresponding to the locations of the N-linked glycosylation sites will help guide data interpretation.
  • the one site that is on the Fc end will exhibit a similar profile to known monoclonal antibodies, whereas the three sites that are on the fusion portion will look markedly different.
  • the approximately 50% fraction of hydrophobic amino acids on the fusion portion GBA is relevant with regard to the inclusion of a protein denaturation step, as described below.
  • GBA corresponds to the molecule type input by the user.
  • An implication based on the molecule type is that Fc-fusion tends to generate aggregates during purification steps and during storage. Aggregation propensity is not a universal rule, but a good heuristic with Fc-fusion proteins.
  • the following series of analytical stage results are determined as appropriate for obtaining the desired structural characterization of the target biologic.
  • sample preparation stages will be guided to comprise protein denaturation and reduction with a mild non-ionic detergent, such as PS-20 at a low 0.1% concentration, to solubilize the protein and open up its 3D structure for complete deglycosylation.
  • PS-20 being mild, will not harm the PNGAseF enzyme (relevant for an enzymatic digestion step, listed below).
  • a stronger denaturing agent like SDS could potentially inactivate PNGaseF and can also affect downstream mass spectrometry performance.
  • Use of a detergent solution as part of the sample preparation step is informed by the hydrophobic amino acid content and the aggregation propensity GBAs identified above.
  • 5mM DTT should be used as reducing agent -a stronger reducing agent like TCEP could potentially harm PNGaseF.
  • the glycosylation analysis study class attribute will enroll a deglycosylating class of enzymes for digestion. Most common is peptide N-glycosidase F (PNGase F), which would be used as a starting point.
  • PNGase F peptide N-glycosidase F
  • a fluorescence labeling detection approach will be used to detect and quantify glycans.
  • Options for fluorescent labeling agents include 2-AB, 2-AA, APTS, and RapiFluor.
  • 2-AB or 2-AA are compatible with HPLC-based separation and are well-established in quality control (QC) environments, as required due to the lot-release study class attribute.
  • QC quality control
  • APTS is used with capillary electrophoresis-based separation, and therefore would not be selected as an appropriate fluorescent label.
  • the RapidFluor label also would not be selected due to the lot-release study class attribute, as RapiFluor is a more recent label, and therefore more appropriate for early-stage characterization rather than lot release.
  • HILIC Hydrophilic Interaction Chromatography
  • HILIC columns and bead size options are based on the instrumentation available (UPLC vs HPLC for example) and can be guided iteratively by the quality of the data. Parameters specifying properties of the mobile phases determined would include: Mobile Phase A: pure acetonitrile; and Mobile Phase B: 50mM ammonium acetate, pH 4.
  • Mass spectrometry parameters determined based on the study class attributes and GBAs include the following:
  • o Instrument any high-resolution ion-trap instrument (such as Orbitrap) to enable multi-stage MSn analysis,
  • Example 2 is an example of use of the study design and method capture technology described herein for determining a study design result for performing a di-sulfide linkage study on a Fc-fusion protein.
  • G. it a User Input and Target Biologic GBAs
  • a user inputs (i) a molecule type, (ii) bioprocess information, and (iii) an amino acid sequence (nominal primary structure) of the target biologic.
  • the molecule type is specified as a Fc-fusion protein.
  • bioprocess information the user input specifies that the target biologic is produced via an E. coli cell culture. The bacterial host cell raises a warning that the subsequent downstream processing could lead to disulfide bond scrambling.
  • the desired structural characterization is a characterization of disulfide linkages and free cysteines, if any are present in the target biologic. Accordingly study class attributes identify a characterization study, disulfide linkages, and free cysteine characterization.
  • An impact of the study class attributes on sample prep is to avoid reduction/alkylation for the main study group, and use reduction/alkylation for a negative control group.
  • the study class attribute also dictates use of multiple enzymes for digestion.
  • Preprocessing of the amino acid sequence is used to obtain target biologic GBAs that include:
  • a GBA corresponding to a molecular weight of the target biologic with a value of 50 kDa.
  • the molecular weight GBA influences the particular mass spectrometry step that is determined. In particular, it indicates that top-down intact protein MS may be feasible (as opposed to the case for a full-sized monoclonal antibody, where it is considerably more difficult).
  • the number of cysteines GBA is relevant for the likelihood that unpaired cysteines are present in the target biologic. In particular, if the number of cysteines were odd, this would point to an unpaired Cys residue, which would need to be identified as part of the study design as it could be the site of unwanted modifications.
  • the even number in this case does not exclude the possibility of having unpaired cysteines, but it reduces it from a certainty to a possibility.
  • a priori disulfide linkages may be known. This would usually be the case for a monoclonal antibody. In the case of a fusion protein, this information may be available from either prior measurements or from homology to a known molecule, in which case it can enable the following 2 GBAs, which will guide the data interpretation.
  • Protein denaturation using 6M GuHCl or 8M urea is identified as an analytical method step. Both are very strong denaturants at these high concentrations, and are identified in order to open up the protein completely to maximize enzymatic digestion. This approach contrasts with Example 1 above, where the glycans are relatively exposed and therefore only in need of mild denaturation. Also, the enzyme choices in the study design results for Example 2, as described below, are more tolerant of strong denaturants than those identified in Example 1.
  • a trypsin enzymatic digestion stage is included in the analytical method result of the determined study design result.
  • the trypsin digestion stage include exposure to trypsin followed by Glu-C to help elucidate disulfide linkages in single (tryptic) fragments having multiple disulfides.
  • a pepsin enzymatic digestion stage is also included in the determined study design result as an orthogonal assay providing complementary information.
  • a low pH is specified to be used for minimal disulfide bond scrambling.
  • RP-HPLC using either C12 or C18 columns is determined. If the target biologic were identified as having a high hydrophobic content (e.g., via another GBA), a C12 column would be preferable to CI 8.
  • Mass spectrometry parameters determined based on the study class attributes and GBAs include the following: o Polarity: positive ions only
  • o Instrument any high-resolution ion-trap instrument (such as Orbitrap) to enable multi-stage MSn analysis,
  • o Scan range 300-2000, data-dependent acquisition mode for top 10 precursor ions
  • a second study design result for intact mass analysis is determined. No sample preparation stage is included in the second study design result.
  • the determined separation stage would identify a C4 column or low hydrophobicity polymer-based column (e.g., PS-DVB) as an intact protein is typically more hydrophobic than peptides.
  • a C4 column or low hydrophobicity polymer-based column e.g., PS-DVB
  • Mass spectrometry parameters determined based on the study class attributes and GBAs include the following:
  • the cloud computing environment 1000 may include one or more resource providers 1002a, 1002b, 1002c (collectively, 1002). Each resource provider 1002 may include computing resources.
  • computing resources may include any hardware and/or software used to process data.
  • computing resources may include hardware and/or software capable of executing algorithms, computer programs, and/or computer applications.
  • exemplary computing resources may include application servers and/or databases with storage and retrieval capabilities.
  • Each resource provider 1002 may be connected to any other resource provider 1002 in the cloud computing environment 1000.
  • the resource providers 1002 may be connected over a computer network 1008.
  • Each resource provider 1002 may be connected to one or more computing device 1004a, 1004b, 1004c (collectively, 1004), over the computer network 1008.
  • the cloud computing environment 1000 may include a resource manager 1006.
  • the resource manager 1006 may be connected to the resource providers 1002 and the computing devices 1004 over the computer network 1008.
  • the resource manager 1006 may facilitate the provision of computing resources by one or more resource providers 1002 to one or more computing devices 1004.
  • the resource manager 1006 may receive a request for a computing resource from a particular computing device 1004.
  • the resource manager 1006 may identify one or more resource providers 1002 capable of providing the computing resource requested by the computing device 1004.
  • the resource manager 1006 may select a resource provider 1002 to provide the computing resource.
  • the resource manager 1006 may facilitate a connection between the resource provider 1002 and a particular computing device 1004.
  • the resource manager 1006 may establish a connection between a particular resource provider 1002 and a particular computing device 1004. In some implementations, the resource manager 1006 may redirect a particular computing device 1004 to a particular resource provider 1002 with the requested computing resource.
  • FIG. 11 shows an example of a computing device 1100 and a mobile computing device 1 150 that can be used to implement the techniques described in this disclosure.
  • the computing device 1 100 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers.
  • the mobile computing device 1 150 is intended to represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart-phones, and other similar computing devices.
  • the components shown here, their connections and relationships, and their functions, are meant to be examples only, and are not meant to be limiting.
  • the computing device 1 100 includes a processor 1102, a memory 1104, a storage device 1 106, a high-speed interface 1 108 connecting to the memory 1104 and multiple highspeed expansion ports 1 110, and a low-speed interface 1 112 connecting to a low-speed expansion port 1 114 and the storage device 1 106.
  • Each of the processor 1102, the memory 1104, the storage device 1106, the high-speed interface 1108, the high-speed expansion ports 1 110, and the low-speed interface 1 112 are interconnected using various busses, and may be mounted on a common motherboard or in other manners as appropriate.
  • the processor 1102 can process instructions for execution within the computing device 1 100, including instructions stored in the memory 1104 or on the storage device 1 106 to display graphical information for a GUI on an external input/output device, such as a display 1 116 coupled to the high-speed interface 1108.
  • an external input/output device such as a display 1 116 coupled to the high-speed interface 1108.
  • multiple processors and/or multiple buses may be used, as appropriate, along with multiple memories and types of memory.
  • multiple computing devices may be connected, with each device providing portions of the necessary operations (e.g., as a server bank, a group of blade servers, or a multi-processor system).
  • the memory 1 104 stores information within the computing device 1100.
  • the memory 1 104 is a volatile memory unit or units. In some implementations, the memory 1 104 is a volatile memory unit or units. In some
  • the memory 1104 is a non-volatile memory unit or units.
  • 1 104 may also be another form of computer-readable medium, such as a magnetic or optical disk.
  • the storage device 1 106 is capable of providing mass storage for the computing device 1 100.
  • the storage device 1106 may be or contain a computer-readable medium, such as a floppy disk device, a hard disk device, an optical disk device, or a tape device, a flash memory or other similar solid state memory device, or an array of devices, including devices in a storage area network or other configurations.
  • Instructions can be stored in an information carrier.
  • the instructions when executed by one or more processing devices (for example, processor 1 102), perform one or more methods, such as those described above.
  • the instructions can also be stored by one or more storage devices such as computer- or machine-readable mediums (for example, the memory 1 104, the storage device 1106, or memory on the processor 1 102).
  • the high-speed interface 1108 manages bandwidth-intensive operations for the computing device 1 100, while the low-speed interface 11 12 manages lower bandwidth- intensive operations.
  • Such allocation of functions is an example only.
  • the high-speed interface 1 108 is coupled to the memory 1 104, the display 11 16 (e.g., through a graphics processor or accelerator), and to the high-speed expansion ports 11 10, which may accept various expansion cards (not shown).
  • the low-speed interface 1 1 12 is coupled to the storage device 1 106 and the low-speed expansion port 1 114.
  • the low-speed expansion port 1114 which may include various communication ports (e.g., USB, Bluetooth®, Ethernet, wireless Ethernet) may be coupled to one or more input/output devices, such as a keyboard, a pointing device, a scanner, or a networking device such as a switch or router, e.g., through a network adapter.
  • the computing device 1100 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a standard server 1 120, or multiple times in a group of such servers. In addition, it may be implemented in a personal computer such as a laptop computer 1 122. It may also be implemented as part of a rack server system 1 124. Alternatively, components from the computing device 1 100 may be combined with other components in a mobile device (not shown), such as a mobile computing device 1 150. Each of such devices may contain one or more of the computing device 1 100 and the mobile computing device 1150, and an entire system may be made up of multiple computing devices communicating with each other.
  • the mobile computing device 1 150 includes a processor 1 152, a memory 1164, an input/output device such as a display 1154, a communication interface 1 166, and a transceiver 1168, among other components.
  • the mobile computing device 1 150 may also be provided with a storage device, such as a micro-drive or other device, to provide additional storage.
  • a storage device such as a micro-drive or other device, to provide additional storage.
  • Each of the processor 1152, the memory 1 164, the display 1154, the communication interface 1166, and the transceiver 1168, are interconnected using various buses, and several of the components may be mounted on a common motherboard or in other manners as appropriate.
  • the processor 1 152 can execute instructions within the mobile computing device 1 150, including instructions stored in the memory 1 164.
  • the processor 1152 may be implemented as a chipset of chips that include separate and multiple analog and digital processors.
  • the processor 1152 may provide, for example, for coordination of the other components of the mobile computing device 1 150, such as control of user interfaces, applications run by the mobile computing device 1 150, and wireless communication by the mobile computing device 1150.
  • the processor 1 152 may communicate with a user through a control interface 1158 and a display interface 1 156 coupled to the display 1 154.
  • the display 1 154 may be, for example, a TFT (Thin-Film-Transistor Liquid Crystal Display) display or an OLED (Organic Light Emitting Diode) display, or other appropriate display technology.
  • the display interface 1156 may comprise appropriate circuitry for driving the display 1 154 to present graphical and other information to a user.
  • the control interface 1 158 may receive commands from a user and convert them for submission to the processor 1152.
  • an external interface 1 162 may provide communication with the processor 1 152, so as to enable near area communication of the mobile computing device 1 150 with other devices.
  • the external interface 1162 may provide, for example, for wired communication in some implementations, or for wireless communication in other implementations, and multiple interfaces may also be used.
  • the memory 1 164 stores information within the mobile computing device 1 150.
  • the memory 1164 can be implemented as one or more of a computer-readable medium or media, a volatile memory unit or units, or a non-volatile memory unit or units.
  • An expansion memory 1174 may also be provided and connected to the mobile computing device 1150 through an expansion interface 1172, which may include, for example, a SIMM (Single In Line Memory Module) card interface.
  • SIMM Single In Line Memory Module
  • the expansion memory 1 174 may provide extra storage space for the mobile computing device 1 150, or may also store applications or other information for the mobile computing device 1 150.
  • the expansion memory 1 174 may include instructions to carry out or supplement the processes described above, and may include secure information also.
  • the expansion memory 1174 may be provide as a security module for the mobile computing device 1 150, and may be programmed with instructions that permit secure use of the mobile computing device 1 150.
  • secure applications may be provided via the SIMM cards, along with additional information, such as placing identifying information on the SIMM card in a non-hackable manner.
  • the memory may include, for example, flash memory and/or NVRAM memory (nonvolatile random access memory), as discussed below.
  • instructions are stored in an information carrier, that the instructions, when executed by one or more processing devices (for example, processor 1152), perform one or more methods, such as those described above.
  • the instructions can also be stored by one or more storage devices, such as one or more computer- or machine-readable mediums (for example, the memory 1 164, the expansion memory 1174, or memory on the processor 1 152).
  • storage devices such as one or more computer- or machine-readable mediums (for example, the memory 1 164, the expansion memory 1174, or memory on the processor 1 152).
  • the instructions can be received in a propagated signal, for example, over the transceiver 1168 or the external interface 1162.
  • the mobile computing device 1150 may communicate wirelessly through the communication interface 1166, which may include digital signal processing circuitry where necessary.
  • the communication interface 1166 may provide for communications under various modes or protocols, such as GSM voice calls (Global System for Mobile
  • SMS Short Message Service
  • EMS Enhanced Messaging Service
  • MMS Multimedia Messaging Service
  • CDMA code division multiple access
  • TDMA time division multiple access
  • PDC Personal Digital Cellular
  • a GPS (Global Positioning System) receiver module 1170 may provide additional navigation- and location-related wireless data to the mobile computing device 1150, which may be used as appropriate by applications running on the mobile computing device 1150.
  • the mobile computing device 1150 may also communicate audibly using an audio codec 1160, which may receive spoken information from a user and convert it to usable digital information.
  • the audio codec 1160 may likewise generate audible sound for a user, such as through a speaker, e.g., in a handset of the mobile computing device 1150.
  • Such sound may include sound from voice telephone calls, may include recorded sound (e.g., voice messages, music files, etc.) and may also include sound generated by applications operating on the mobile computing device 1150.
  • the mobile computing device 1150 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a cellular telephone 1180. It may also be implemented as part of a smart-phone 1182, personal digital assistant, or other similar mobile device. Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof.
  • ASICs application specific integrated circuits
  • implementations can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
  • a programmable processor which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
  • machine-readable medium and computer-readable medium refer to any computer program product, apparatus and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine- readable medium that receives machine instructions as a machine-readable signal.
  • machine-readable signal refers to any signal used to provide machine instructions and/or data to a programmable processor.
  • the systems and techniques described here can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer.
  • a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and a pointing device e.g., a mouse or a trackball
  • Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.
  • the systems and techniques described here can be implemented in a computing system that includes a back end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front end component (e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back end, middleware, or front end components.
  • the components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network (LAN), a wide area network (WAN), and the Internet.
  • LAN local area network
  • WAN wide area network
  • the Internet the global information network
  • the computing system can include clients and servers.
  • a client and server are generally remote from each other and typically interact through a communication network.
  • the relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
  • the modules e.g. biologic preprocessing module, machine learning module
  • the modules can be separated, combined or incorporated into single or combined modules.
  • the modules depicted in the figures are not intended to limit the systems described herein to the software architectures shown therein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Chemical & Material Sciences (AREA)
  • Medical Informatics (AREA)
  • Organic Chemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Zoology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Wood Science & Technology (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Bioethics (AREA)
  • Genetics & Genomics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Epidemiology (AREA)
  • Public Health (AREA)
  • General Chemical & Material Sciences (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Automation & Control Theory (AREA)
  • Biomedical Technology (AREA)
  • Hematology (AREA)

Abstract

Presented herein are systems and methods that provide a study design and method capture technology that facilitates determining effective and improved analysis procedures for characterizing biologies. The approaches described herein allow a user to input known or expected information describing the general structures of a target biologic to be analyzed, such as its nominal primary structure, along with study class attributes that identify desired structural characterizations to be obtained for the target biologic. The described study design and method capture platform then determines detailed procedures that, when implemented, allow or enhance the process of obtaining the desired structural characterization of the target biologic.

Description

SYSTEMS AND METHODS FOR AUTOMATED DESIGN OF AN ANALYTICAL STUDY FOR THE STRUCTURAL CHARACTERIZATION OF A BIOLOGIC
COMPOSITION CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims priority to and the benefit of U.S. Provisional Patent Application No. 62/506,443, filed May 15, 2017, the content of which is incorporated by reference herein in its entirety. This application is related to International Patent Application No. PCT/US2014/059150, titled, "Mass Spectrometry-Based Method for Identifying and Maintaining Quality Control Factors During the Development and Manufacture of a
Biologic", filed October 3, 2014, the content of which is incorporated by reference herein in its entirety. This application is also related to International Patent Application No. PCT/US2016/053434, titled, "Method for Determining the In Vivo Comparability of a Biologic Drug and a Reference Drug", filed September 23, 2016, the content of which is incorporated by reference herein in its entirety.
FIELD OF THE INVENTION
This invention relates generally to methods, systems, and architectures for facilitating analysis of biologies. More specifically, in certain embodiments, the techniques described herein facilitate the determination of procedures for obtaining structural characterizations of biologies.
BACKGROUND OF THE INVENTION
Biologies are highly complex molecules whose detailed structural properties are critical to their ability to perform their desired function, as well as their stability over time in storage. Biologies can be expressed and refolded incorrectly based on a range of variations in the biosynthesis or manufacturing process, and can be degraded or chemically changed by proteases, heat, acidic or other environmental conditions to produce fragments and truncated molecules. Some biologies tend to form aggregates, which are inactive and sometimes immunogenic. Biologies can be glycosylated at differing N-linked or O-linked sites, by different amounts, and/or with different sugars (e.g., they may vary by galactose content, afucosylation, sialic acid content, mannose content, etc.), and may include molecular species non-glycosylated at critical or uncritical locations. Proper disulfide bonding within a molecule and/or between molecules typically is critical for efficacy, and wrongly paired or unpaired disulfide bonds can lead to inoperative misfolded contaminants or to aggregates. The product also may be contaminated by one or more host cell proteins such as proteases, DNAs, methotrexate, or other residues from upstream expression, or with leached components from downstream purification. Biologies also may be deaminated, oxidized, methylated or otherwise modified. The molecule may be altered after its release, in storage, or in vivo when exposed to blood-borne enzymes, physiological temperatures, and the like.
Production methods profoundly affect structure in various ways. A master cell bank comprising replicable, recombinant clones that reliably express copious quantities of active biologic is only a beginning. Upstream variables in culture of such cells such as culture duration, pH, amount of dissolved oxygen, concentrations and identities of media components, temperature, initial cell density, pCC>2, mixing and gassing strategy, and feeding strategy each may affect not only the quantitative protein yield, but also the structure of the product. Furthermore, contaminants such as host cell proteins, metabolites and the like are inevitably introduced into extracellular broths, as are possible infective agents such as viruses. Similarly, the downstream purification process may introduce variants or contaminants that may alter protein structure. The fine structure of the product can be affected by such aspects as the selection of separation technologies such as affinity columns, anionic or cationic exchange columns, or ultrafiltration apparatus. Also, contaminants may be introduced or product degraded or derivatized by the addition of preservatives, diluents, vehicle, as well as the decision as to when a chromatography resin or filter is replaced, and the temperature or pH of the product during purification, compounding, and storage.
Accordingly, the ability to accurately characterize the detailed fine structure of a biologic, particularly its critical features (e.g., critical quality attributes, or CQAs), is essential to one's ability to effectively manufacture and create new biologies, as well as the ability to maintain structural and functional consistency of biologies from batch to batch.
Sophisticated analytical methods are employed to perform this characterization. An analytical method for a given biologic may include, for example, the following stages, which correspond to experimental steps performed on a sample: extraction, purification, titration, denaturation, reduction and alkylation, fragmentation, separation, detection, mass analysis, and data analysis (e.g., for the purpose of determining molecular weight, amino acid sequence, amino acid modifications, post-translational modifications, higher order structures, and epitope mapping). Currently, analytical methods are designed for a given biologic by highly experienced experts who engage in lengthy, iterative, trial-and-error processes.
Therefore, there exists a need for improved systems and methods for determining procedures to obtain the structural attributes of biologies, especially those attributes that influence their efficacy and stability.
SUMMARY OF THE INVENTION
Presented herein are systems and methods that provide a study design and method capture technology that facilitates determining effective and improved analysis procedures for characterizing biologies. The approaches described herein allow a user to input known or expected information describing the general structures of a target biologic to be analyzed, such as its nominal primary structure, along with study class attributes that identify desired structural characterizations to be obtained for the target biologic. The described study design and method capture platform then determines, based on the target biologic's generalizable attributes and the study class attributes, one or more study design results. The determined study design results provide detailed procedures that, when implemented, allow or enhance the process of obtaining the desired structural characterization of the target biologic. Thus, the techniques provide a tool for improved, faster, and more accurate advanced
characterization of complex biologies, thereby improving the quality of the characterization of a new biologic, facilitating faster development of manufacturing techniques and process scale-up, and reducing the time needed to bring the new biologic to market.
Determining a study design for a biologic is non-trivial. A given study design that, when implemented, allows a desired structural characterization of a target biologic to be obtained, comprises one or more analytical methods to be applied to a sample comprising the target biologic. Each analytical method comprising a sequence of specific analytical stages corresponding to specific experimental steps. The specific analytical method, or combination of multiple analytical methods used in a study design, as well as the specific sequence of analytical stages that each analytical method comprises depends in a complex fashion on underlying properties of the target biologic and the specific desired structural
characterization.
The approaches described herein determine study design results for a given target biologic by mapping a set of known or expected attributes of the target biologic to records of analytical methods and/or analytical stages that have been previously applied in various structural characterization studies of known biologies (e.g., which may be different from the target biologic) and are stored in a database. This database and the mapping between generalizable biologic attributes and the records it stores are collectively referred to herein as a method store. The method store codifies domain knowledge and previous experience in applying various analytical methods to characterize biologies.
Notably, in certain embodiments, the manner in which data are stored in the method store goes beyond merely storing records of studies and analytical methods that were previously implemented to characterize various biologies. In particular, the method store includes sets of generalizable biologic attributes (GBAs) that are determined for the known biologies that have been previously determined using analytical methods that are stored as records in the method store. For a given known biologic, GBAs are determined via a preprocessing step and linked to the records of the various analytical methods that were used in its characterization. As will be described in the following, GBAs are values representing various structural attributes and physio-chemical properties of a biologic and allow patterns of similarities between various different biologies to be identified.
In certain embodiments, by virtue of the manner in which records of analytical methods and/or analytical stages are linked with GBAs of known biologies in the method store, the approaches described herein go beyond merely searching a database to identify and return study design results. In particular, the systems and methods described herein may determine, for a given target biologic, a set of target GBAs that are used as features in machine learning algorithms that utilize the method store to identify relevant analytical methods (and analytical stages that they comprise) that can be applied to obtain a desired structural characterization of the target biologic. In this manner, the systems and methods allow for study design results to be determined (e.g., in an improved, faster, and more accurate manner) for target biologies that have not been characterized before. Moreover, the specific sequence of analytical stage as well as combinations of analytical methods included in a given determined study design result also do not need to have been previously performed and stored in the method store. Accordingly, the systems and methods described herein allow for unique study design results to be obtained.
In one aspect, the invention is directed to a method of automatically identifying analytical study design parameters for analysis of a target biologic, the method comprising: (a) receiving, by a processor of a computing device, an input query comprising one or more generalizable biologic attributes (GBAs) of [e.g., determined based on an amino acid sequence of; e.g., determined based on a molecule type of; e.g., based on known information, such as a known identification of one or more disulfide linkage sites of) the target biologic, wherein the one or more GBAs comprise one or more study class attributes [e.g., textual labels that identify particular types of analytical studies to be performed on the target biologic (e.g., to obtain specific sets of information about the target biologic)]; (b) accessing, by the processor, a method store comprising a plurality of analytical stage records, each analytical stage record corresponding to a specific analytical stage having been implemented as a step of an analytical method used in an analytical study for structural characterization of an associated known biologic, wherein: (i) each analytical stage record comprises an identifier (e.g., a text label) of the corresponding specific analytical stage, (ii) each analytical stage record comprises a series of parameter values used to implement the corresponding analytical stage for characterizing the associated known biologic, and (ii) each analytical stage record is linked to one or more GBAs of the associated known biologic (e.g., wherein the one or more GBAs is determined based on an amino acid sequence of the associated known biologic; e.g., wherein the one or more GBAs are determined based on a molecule type; e.g., wherein the one or more GBAs are determined based on known information, such as a known identification of one or more disulfide linkage sites); (c) determining, by the processor, responsive to the input query, one or more study design results based on (i) the GBAs of the target biologic and (ii) the one or more study class attributes, wherein step (c) is performed using a machine learning module that identifies patterns linking the GBAs with the analytical stage records of the method store; and (d) providing (e.g., rendering or providing for rendering), by the processor, the one or more study design results for display and/or further processing.
In certain embodiments, the one or more GBAs of the target biologic comprise(s) one or more members selected from the group consisting of: (A) a sequence fragment; (B) a molecular weight of the target biologic; (C) a molecule type of the target biologic; (D) a quantification of one or more specific amino acids within the target biologic [e.g., a total number of one or more specific amino acids (e.g., cysteines) within the target biologic; e.g., a fraction of one or more specific amino acids within the target biologic]; (E) a proportion of amino acids within the target biologic having a particular classification based on one or more specific properties; (F) an identification and/or quantification of patterns of amino acid motifs associated with propensity towards one or more specific types of amino acid modifications [e.g., positions and/or number of potential sites of oxidation; e.g., positions and/or number of potential sites of deamidation; e.g., positions and/or number of potential sites of post- translational modifications (e.g., N-linked glycosylation; e.g., disulfide bridges; e.g., disulfide knots; e.g., modification of cysteine to formylglycine); e.g., positions and/or number of potential sites of N-terminal modification; e.g., positions and/or number of potential sites of C-terminal modifications; e.g., positions and/or number of potential instances of
isomerization; and e.g., positions and/or number of potential instances of racemization] ; and (G) one or more metrics corresponding to a result of applying one or more sample handling techniques, including one or more fragmentation techniques, to the target biologic.
In certain embodiments, the one or more GBAs of the target biologic comprises a proportion of amino acids within the target biologic having a particular classification based on one or more specific properties, wherein the one or more specific properties comprise(s) one or more members selected from the group consisting of: (i) hydrophobicity [e.g., having at least a give level of hydrophobicity (e.g., as measured on a predefined scale); having less than or equal to a given level of hydrophobicity; e.g., having a level of hydrophobicity within a particular range; classified as hydrophobic]; (ii) hydrophilicity [e.g., having at least a give level of hydrophilicity (e.g., as measured on a predefined scale); having less than or equal to a given level of hydrophilicity; e.g., having a level of hydrophilicity within a particular range; classified as hydrophilic]; (iii) charge (e.g., having a charge greater than or equal to a specific charge; e.g., having charge less than or equal to a specific charge; e.g., having a positive charge; e.g., having a negative charge; e.g., neutral); (iv) acidity (e.g., classified as acidic; e.g., classified as basic; e.g., classified as neutral); and (v) aromaticity (e.g., classified as aromatic).
In certain embodiments, the one or more GBAs of the target biologic comprises one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more fragmentation techniques comprise(s) one or more members selected from the group consisting of enzymatic digestion, chemical digestion, and gas-phase fragmentation.
In certain embodiments, the one or more GBAs of the target biologic comprises one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more metrics are selected from the group consisting of: (i) a likelihood of enzymatic cleavage resulting from digestion with select proteolytic enzymes for which cleavage sites are highly specific (e.g., trypsin; e.g., Lys-C; e.g., Glu-C; e.g., Asp-N; e.g., Arg-C), whether the enzymes are applied singly, serially or simultaneously; (ii) a fragmentation partem; and (iii) a statistical distribution of fragments by fragment length and/or by fragment molecular weight. In certain embodiments, the one or more GBAs of the target biologic comprise(s) one or more bioprocess attributes representing parameters of a bioprocess used to produce the target biologic.
In certain embodiments, the one or more bioprocess attributes comprise(s) one or more members selected from the group consisting of: (A) an identification of a cell culture type used to produce the target biologic (e.g., such as a textual label that identifies a cell culture type); and (B) an identification of a purification stage performed subsequently to harvesting a cell culture supernatant.
In certain embodiments, the one or more GBAs of the associated known biologic comprise(s) one or more members selected from the group consisting of: (A) a sequence fragment; (B) a molecular weight of the associated known biologic; (C) a molecule type of the associated known biologic; (D) a quantification of one or more specific amino acids within the associated known biologic [e.g., a total number of one or more specific amino acids (e.g., cysteines) within the associated known biologic; e.g., a fraction of one or more specific amino acids within the associated known biologic] ; (E) a proportion of amino acids within the associated known biologic having a particular classification based on one or more specific properties; (F) an identification and/or quantification of patterns of amino acid motifs associated with propensity towards one or more specific types of amino acid modifications [e.g., positions and/or number of potential sites of oxidation; e.g., positions and/or number of potential sites of deamidation; e.g., positions and/or number of potential sites of post-translational modifications (e.g., N-linked glycosylation; e.g., disulfide bridges; e.g., disulfide knots; e.g., modification of cysteine to formylglycine); e.g., positions and/or number of potential sites of N-terminal modification; e.g., positions and/or number of potential sites of C-terminal modifications; e.g., positions and/or number of potential instances of isomerization; and e.g., positions and/or number of potential instances of racemization]; and (G) one or more metrics corresponding to a result of applying one or more sample handling techniques, including one or more fragmentation techniques, to the associated known biologic.
In certain embodiments, the one or more GBAs of the associated known biologic comprises a proportion of amino acids within the associated known biologic having a particular classification based on one or more specific properties, wherein the one or more specific properties comprise(s) one or more members selected from the group consisting of: (i) hydrophobicity [e.g., having at least a give level of hydrophobicity (e.g., as measured on a predefined scale); having less than or equal to a given level of hydrophobicity; e.g., having a level of hydrophobicity within a particular range; classified as hydrophobic]; (ii)
hydrophilicity [e.g., having at least a give level of hydrophilicity (e.g., as measured on a predefined scale); having less than or equal to a given level of hydrophilicity; e.g., having a level of hydrophilicity within a particular range; classified as hydrophilic]; (iii) charge (e.g., having a charge greater than or equal to a specific charge; e.g., having charge less than or equal to a specific charge; e.g., having a positive charge; e.g., having a negative charge; e.g., neutral); (iv) acidity (e.g., classified as acidic; e.g., classified as basic; e.g., classified as neutral); and (v) aromaticity (e.g., classified as aromatic).
In certain embodiments, the one or more GBAs of the associated known biologic comprises one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more fragmentation techniques comprise(s) one or more members selected from the group consisting of enzymatic digestion, chemical digestion, and gas-phase fragmentation.
In certain embodiments, the one or more GBAs of the associated known biologic comprises one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more metrics are selected from the group consisting of: (i) a likelihood of enzymatic cleavage resulting from digestion with select proteolytic enzymes for which cleavage sites are highly specific (e.g., trypsin; e.g., Lys-C; e.g., Glu-C; e.g., Asp-N; e.g., Arg-C), whether the enzymes are applied singly, serially or simultaneously; (ii) a fragmentation pattern; (iii) a statistical distribution of fragments by fragment length and/or by fragment molecular weight.
In certain embodiments, the one or more GBAs of the associated known biologic comprise(s) one or more bioprocess attributes representing parameters of a biomanufacturing process used to produce the associated known biologic.
In certain embodiments, the one or more bioprocess attributes comprise(s) one or more members selected from the group consisting of: (A) an identification of a cell culture type used to produce the associated known biologic (e.g., such as a textual label that identifies a cell culture type); and (B) an identification of a purification stage performed subsequently to harvesting a cell culture supernatant.
In certain embodiments, the method comprises: receiving, by the processor, a user input comprising one or more known or expected structural features (e.g., including substructures) of the target biologic [e.g., an amino acid sequence; e.g., locations of disulfide bonds; e.g., locations and/or types of glycan structures attached to the target biologic (e.g., via glycosylation)]; determining, by the processor, using the one or more known or expected structural features of the target biologic, the one or more GBAs of the target biologic; and providing, by the processor, the determined one or more target biologic GBAs via the input query for automated identification of analytical study design parameters for analysis of the target biologic.
In certain embodiments, the one or more known or expected structural features of the target biologic comprises an amino acid sequence of the target biologic (e.g., a nominal amino acid sequence of the target biologic). In certain embodiments, the received user input comprises an identification of a molecule type of the target biologic (e.g., a recombinant protein, a fusion protein, a monoclonal antibody, an antibody-drug conjugate) and the identification of the molecule type is used as a GBA of the one or more GBAs of the target biologic.
In certain embodiments, the input query comprises one or more bioprocess parameters that represent properties of the bioprocess used to produce the target biologic (e.g., an identification of a cell culture type; e.g., an identification of a purification technique) and the one or more study design results determined in step (c) based further on the one or more bioprocess parameters.
In certain embodiments, at least one of the one or more study class attributes corresponds to an identifier of a specific analytical study type (e.g., a specific type of structural characterization study) selected from the group consisting of: (i) determination of a molecular weight of the target biologic; (ii) determination of a primary structure of the target biologic; (iii) determination of post-translational modifications (e.g., intra- and inter-chain disulfide bonds; e.g., disulfide knots; e.g., glycosylation nature and sites; e.g., chemical post- translational modifications); (iv) determination of one or more higher order structures of the target biologic (e.g., secondary structure; e.g., tertiary structure; e.g., quaternary structure); (vi) comparison of the target biologic with a reference biologic (e.g., for characterization of the target biologic as a biosimilar); (vii) a lot comparison study; (viii) determination of a critical quality attribute (CQA) map of the target biologic; and (ix) determination of an in- vivo comparability profile of the target biologic;
In certain embodiments, each of one or more of the analytical stage records of the method store comprises or is linked to one or more prior study class attributes that identify the structural characterization study in which the analytical stage that the analytical stage record represents was implemented. In certain embodiments, each of one or more of the analytical stage records of the method store corresponds to a specific analytical stage selected from the group consisting of: (i) a separation stage; (ii) a detection stage; (iii) a mass spectrometry stage; (iv) a digestion strategy [e.g., an enzymatic digestion stage (e.g., single digest; e.g., serial digest; e.g., cocktail digest); e.g., a chemical digestion stage (e.g., single digest; e.g., serial digest; e.g., cocktail digest)]; and (v) a sample preparation stage [e.g., a sample preprocessing stage (e.g., dilution, enrichment, buffer exchange, desalting, stress, or titration); e.g., fractionation; e.g., denaturation; e.g., reduction; e.g., alkylation].
In certain embodiments, at least a portion of the plurality of analytical stage records were created from published documents (e.g., published journal articles of structural characterization studies) via automated processing using text mining and/or natural language processing.
In certain embodiments, at least a portion of the plurality of analytical stage records were created from published documents (e.g., published journal articles of structural characterization studies) via automated processing in combination with a user interaction.
In certain embodiments, at least a portion of the analytical stage records were created from in-house studies in an automated fashion via dedicated software as part of a laboratory information management system.
In certain embodiments, each determined study design result comprises a set of analytical stage results, each representing a specific analytical stage to be applied to the target biologic, and comprising a list of parameters to be used when applying the analytical stage that the analytical stage result represents to the target biologic, and the analytical stage results of a given study design result are determined via a machine learning module that receives as input the GBAs of the target biologic and determines the set of analytical stage results and, for each analytical stage result, parameter values associated with that stage, based on patterns identified using the GBAs associated with analytical stage records of the method store [e.g., for a given target biologic, the machine learning module computes relevant analytical stage records by matching the GBAs of the target biologic with GBAs of a subset of the analytical stage records according to an identified pattern in GBAs of the analytical stage records (e.g., determined via clustering analysis)].
In certain embodiments, the machine learning module implements a supervised machine learning technique [e.g., wherein the machine learning module determines study design results using a set of training data that comprises a plurality examples of correct output results (e.g., previously determined study design results; e.g., previously determined analytical stage results) and for each correct output result, a set of input features used to determine the correct output (e.g., for each of a plurality of previously determined study design result, GBAs that were used as input; e.g., for each of a plurality of previously determined analytical stage results of a plurality of previously determined study design results, GBAs that were used as input); e.g., an artificial neural network technique; e.g., a decision tree; e.g., one or more regression models; e.g., a k-nearest neighbor technique].
In certain embodiments, the machine learning module implements a reinforcement machine learning technique [e.g., wherein the machine learning module determines study design results using a set of training data that comprises a plurality of example outputs (e.g., analytical stage records), and for each example output a performance measure of the example output (e.g., a performance index that an analytical stage record comprises or is linked to).
In certain embodiments, the machine learning module implements an unsupervised machine learning technique (e.g., a clustering method; e.g., k-means clustering; e.g., self- organizing maps).
In certain embodiments, the machine learning module implements the unsupervised machine learning technique as a precursor to a supervised machine learning technique. In certain embodiments, the method comprises: receiving, by the processor, a user input corresponding to a modification of a particular study design result of the one or more determined study design results; updating, by the processor, the particular study design result according to the user input; and storing, by the processor, one or more analytical stage results of the updated study design result as analytical stage records in the method store.
In certain embodiments, the method comprises using the stored analytical stage results of the updated study design result as training data in a supervised machine learning technique (e.g., an artificial neural network technique; e.g., a decision tree; e.g., one or more regression models; e.g., a k-nearest neighbor technique) implemented by the machine learning module of step (c).
In certain embodiments, step (d) comprises, for at least one study design result of the determined study design results: (A) causing, by the processor, display of at least one of a graphical control element corresponding to an analytical stage result of the study design result; (B) receiving, by the processor, via a user interaction with the with the graphical control element, a user input corresponding to a modification of the analytical stage result; (C) responsive to the received user input, updating, by the processor, the particular study design result according to the user input; and (D) storing, by the processor, one or more analytical stage results of the updated study design result as analytical stage records in the method store.
In certain embodiments, the one or more study design results comprises one or more documents corresponding to software file(s) that specify parameters for analytical instruments (e.g., parameters for analytical instruments such as chromatography
workstations, mass spectrometry workstations, and the like).
In another aspect, the invention is directed to a method of populating a method store corresponding to a database of records representing analytical methods and/or analytical stages thereof having been previously applied in structural characterization of a plurality of known biologies, the method comprising: (a) creating, by a processor of a computing device, an analytical stage record corresponding to a specific analytical stage having been implemented a step of an analytical method used in an analytical study for structural characterization of an associated known biologic, wherein the analytical stage record comprises: (i) an identifier (e.g., a text label) of the corresponding specific analytical stage, and (ii) a series of parameter values used to implement the corresponding analytical stage for characterizing the associated known biologic; (b) storing, by the processor, the analytical stage record in the method store; (c) storing, by the processor, one or more GBAs of the associated known biologic in the method store; and (d) linking, by the processor, the one or more known biologic GBAs with the analytical stage record.
In certain embodiments, the one or more GBAs of the associated known biologic comprise(s) one or more members selected from the group consisting of: (A) a sequence fragment; (B) a molecular weight of the associated known biologic; (C) a molecule type of the associated known biologic; (D) a quantification of one or more specific amino acids within the associated known biologic [e.g., a total number of one or more specific amino acids (e.g., cysteines) within the associated known biologic; e.g., a fraction of one or more specific amino acids within the associated known biologic]; (E) a proportion of amino acids within the associated known biologic having a particular classification based on one or more specific properties; (F) an identification and/or quantification of patterns of amino acid motifs associated with propensity towards one or more specific types of amino acid modifications [e.g., positions and/or number of potential sites of oxidation; e.g., positions and/or number of potential sites of deamidation; e.g., positions and/or number of potential sites of post- translational modifications (e.g., N-linked glycosylation; e.g., disulfide bridges; e.g., disulfide knots; e.g., modification of cysteine to formylglycine); e.g., positions and/or number of potential sites of N-terminal modification; e.g., positions and/or number of potential sites of C-terminal modifications; e.g., positions and/or number of potential instances of
isomerization; and e.g., positions and/or number of potential instances of racemization] ; and (G) one or more metrics corresponding to a result of applying one or more sample handling techniques, including one or more fragmentation techniques, to the associated known biologic.
In certain embodiments, the one or more GBAs of the associated known biologic comprises a proportion of amino acids within the associated known biologic having a particular classification based on one or more specific properties, wherein the one or more specific properties comprise(s) one or more members selected from the group consisting of: (i) hydrophobicity [e.g., having at least a give level of hydrophobicity (e.g., as measured on a predefined scale); having less than or equal to a given level of hydrophobicity; e.g., having a level of hydrophobicity within a particular range; classified as hydrophobic]; (ii) hydrophilicity [e.g., having at least a give level of hydrophilicity (e.g., as measured on a predefined scale); having less than or equal to a given level of hydrophilicity; e.g., having a level of hydrophilicity within a particular range; classified as hydrophilic]; (iii) charge (e.g., having a charge greater than or equal to a specific charge; e.g., having charge less than or equal to a specific charge; e.g., having a positive charge; e.g., having a negative charge; e.g., neutral); (iv) acidity (e.g., classified as acidic; e.g., classified as basic; e.g., classified as neutral); and (v) aromaticity (e.g., classified as aromatic).
In certain embodiments, the one or more GBAs of the associated known biologic comprises one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more fragmentation techniques comprise(s) one or more members selected from the group consisting of enzymatic digestion, chemical digestion, and gas-phase fragmentation. In certain embodiments, the one or more GBAs of the associated known biologic comprises one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more metrics are selected from the group consisting of: (i) a likelihood of enzymatic cleavage resulting from digestion with select proteolytic enzymes for which cleavage sites are highly specific (e.g., trypsin; e.g., Lys-C; e.g., Glu-C; e.g., Asp-N; e.g., Arg-C), whether the enzymes are applied singly, serially or simultaneously; (ii) a fragmentation pattern; (iii) a statistical distribution of fragments by fragment length and/or by fragment molecular weight.
In certain embodiments, the one or more GBAs of the associated known biologic comprise(s) one or more bioprocess attributes representing parameters of a biomanufacturing process used to produce the associated known biologic.
In certain embodiments, the one or more bioprocess attributes comprise(s) one or more members selected from the group consisting of: (A) an identification of a cell culture type used to produce the associated known biologic (e.g., such as a textual label that identifies a cell culture type); and (B) an identification of a purification stage performed subsequently to harvesting a cell culture supernatant.
In certain embodiments, the one or more GBAs of the associated known biologic comprise one or more study class attributes, each of which identifies a type of analytical study performed on the associated known biologic using an analytical method comprising the analytical stage that the analytical stage record represents.
In certain embodiments, at least one of the one or more study class attributes corresponds to an identifier of a specific analytical study type (e.g., a specific type of structural characterization study) selected from the group consisting of: (i) determination of a molecular weight of the target biologic; (ii) determination of a primary structure of the target biologic; (iii) determination of post-translational modifications (e.g., intra- and inter-chain disulfide bonds; e.g., disulfide knots; e.g., glycosylation nature and sites; e.g., chemical post- translational modifications); (iv) determination of one or more higher order structures of the target biologic (e.g., secondary structure; e.g., tertiary structure; e.g., quaternary structure); (vi) comparison of the target biologic with a reference biologic (e.g., for characterization of the target biologic as a biosimilar); (vii) a lot comparison study; (viii) determination of a critical quality attribute (CQA) map of the target biologic; and (ix) determination of an in vivo comparability profile of the target biologic.
In certain embodiments, the analytical stage record of the method store corresponds to a specific analytical stage selected from the group consisting of: (i) a separation stage; (ii)a detection stage; (iii) a mass spectrometry stage; (ii) a digestion strategy [e.g., an enzymatic digestion stage (e.g., single digest; e.g., serial digest; e.g., cocktail digest); e.g., a chemical digestion stage (e.g., single digest; e.g., serial digest; e.g., cocktail digest)]; (iii) a sample preparation stage [e.g., a sample preprocessing stage (e.g., dilution, enrichment, buffer exchange, desalting, stress, or titration); e.g., fractionation; e.g., denaturation; e.g., reduction; e.g., alkylation].
In certain embodiments, creating the analytical stage record comprises extracting at least one of (i) the identifier and (ii) one or more parameter values of the series of parameter values from published documents (e.g., published journal articles of structural
characterization studies) via automated processing using text mining and/or natural language processing.
In certain embodiments, creating the analytical stage record comprises extracting at least one of (i) the identifier and (ii) one or more parameter values of the series of parameter values from published documents (e.g., published journal articles of structural
characterization studies) via automated processing in combination with a user interaction. In certain embodiments, creating the analytical stage record comprises obtaining the identifier and series of parameter values from an in-house study in an automated fashion via dedicated software as part of a laboratory information management system.
In another aspect, the invention is directed to a method of automatically identifying analytical study design parameters for analysis of a target biologic, the method comprising: (a) receiving, by a processor of a computing device, an input query comprising one or more generalizable biologic attributes (GBAs) of [e.g., determined based on an amino acid sequence of; e.g., determined based on a molecule type; e.g., based on known information, such as a known identification of one or more disulfide linkage sites) the target biologic, wherein the one or more GBAs comprise one or more study class attributes [e.g., textual labels that identify particular types of analytical studies to be performed on the target biologic (e.g., to obtain specific sets of information about the target biologic)]; (b) accessing, by the processor, a method store comprising a plurality of analytical method records, each analytical method record corresponding to a specific analytical method used in an analytical study for structural characterization of an associated known biologic, wherein: (i) each analytical method record comprises a sequence of analytical stage records, each representing a specific analytical stage used in the analytical method that the analytical method record represents; and (ii) each analytical method record is linked to one or more GBAs of the associated known biologic (e.g., wherein the one or more GBAs is determined based on an amino acid sequence of the associated known biologic; e.g., wherein the one or more GBAs are determined based on a molecule type; e.g., wherein the one or more GBAs are determined based on known information, such as a known identification of one or more disulfide linkage sites); (c) determining, by the processor, responsive to the input query, one or more study design results based on (i) the GBAs of the target biologic and (ii) the one or more study class attributes, wherein step (c) is performed using a machine learning module that identifies patterns linking the GBAs with the analytical method records of the method store; and (d) providing (e.g., rendering or providing for rendering), by the processor, the one or more study design results for display and/or further processing.
In another aspect, the invention is directed to a method of populating a method store corresponding to a database of records representing analytical methods and/or analytical stages thereof having been previously applied in structural characterization of a plurality of known biologies, the method comprising: (a) creating, by a processor of a computing device, an analytical method record corresponding to a specific analytical method having been used in an analytical study for structural characterization of an associated known biologic, wherein the analytical method record comprises a sequence of analytical stage records, each representing a specific analytical stage used in the analytical method that the analytical method record represents; (b) storing, by the processor, the analytical method record in the method store; (c) storing, by the processor, one or more GBAs of the associated known biologic in the method store; and (d) linking, by the processor, the one or more known biologic GBAs with the analytical method record.
In another aspect, the invention is directed to a system for automatically identifying analytical study design parameters for analysis of a target biologic. In certain embodiments, the system includes: a processor; and a memory having instructions stored thereon, wherein the instructions, when executed by the processor cause the processor to: (a) receive an input query comprising one or more generalizable biologic attributes (GBAs) of [e.g., determined based on an amino acid sequence of; e.g., determined based on a molecule type of; e.g., based on known information, such as a known identification of one or more disulfide linkage sites of] the target biologic, wherein the one or more GBAs comprise(s) one or more study class attributes [e.g., textual labels that identify particular types of analytical studies to be performed on the target biologic (e.g., to obtain specific sets of information about the target biologic)]; (b) access a method store comprising a plurality of analytical stage records, each analytical stage record corresponding to a specific analytical stage having been implemented as a step of an analytical method used in an analytical study for structural characterization of an associated known biologic, wherein: (i) each analytical stage record comprises an identifier (e.g., a text label) of the corresponding specific analytical stage, (ii) each analytical stage record comprises a series of parameter values used to implement the corresponding analytical stage for characterizing the associated known biologic, and (iii) each analytical stage record is linked to one or more GBAs of the associated known biologic (e.g., wherein the one or more GBAs is determined based on an amino acid sequence of the associated known biologic; e.g., wherein the one or more GBAs are determined based on a molecule type; e.g., wherein the one or more GBAs are determined based on known information, such as a known identification of one or more disulfide linkage sites); (c) determine responsive to the input query, one or more study design results based on (i) the GBAs of the target biologic and (ii) the one or more study class attributes, wherein step (c) is performed using a machine leaming module that identifies pattems linking the GBAs with the analytical stage records of the method store; and (d) provide (e.g., render or provide for rendering) the one or more study design results for display and/or further processing.
In another aspect, the invention is directed to a system for populating a method store corresponding to a database of records representing analytical methods and/or analytical stages thereof having been previously applied in structural characterization of a plurality of known biologies. In certain embodiments, the system includes: a processor; and a memory having instructions stored thereon, wherein the instructions, when executed by the processor cause the processor to: (a) create an analytical stage record corresponding to a specific analytical stage having been implemented a step of an analytical method used in an analytical study for structural characterization of an associated known biologic, wherein the analytical stage record comprises: (i) an identifier (e.g., a text label) of the corresponding specific analytical stage, and (ii) a series of parameter values used to implement the corresponding analytical stage for characterizing the associated known biologic; (b) store the analytical stage record in the method store; (c) store one or more generalized biologic attributes (GBAs) of the associated known biologic in the method store; and (d) link the one or more known biologic GBAs with the analytical stage record.
In another aspect, the invention is directed to a system of automatically identifying analytical study design parameters for analysis of a target biologic. In certain embodiments, the system includes: a processor; and a memory having instructions stored thereon, wherein the instructions, when executed by the processor cause the processor to: (a) receive an input query comprising one or more generalizable biologic attributes (GBAs) of [e.g., determined based on an amino acid sequence of; e.g., determined based on a molecule type; e.g., based on known information, such as a known identification of one or more disulfide linkage sites] the target biologic, wherein the one or more GBAs comprise(s) one or more study class attributes [e.g., textual labels that identify particular types of analytical studies to be performed on the target biologic (e.g., to obtain specific sets of information about the target biologic)]; (b) access a method store comprising a plurality of analytical method records, each analytical method record corresponding to a specific analytical method used in an analytical study for structural characterization of an associated known biologic, wherein: (i) each analytical method record comprises a sequence of analytical stage records, each representing a specific analytical stage used in the analytical method that the analytical method record represents; and (ii) each analytical method record is linked to one or more GBAs of the associated known biologic (e.g., wherein the one or more GBAs is determined based on an amino acid sequence of the associated known biologic; e.g., wherein the one or more GBAs are determined based on a molecule type; e.g., wherein the one or more GBAs are determined based on known information, such as a known identification of one or more disulfide linkage sites); (c) determine responsive to the input query, one or more study design results based on (i) the GBAs of the target biologic and (ii) the one or more study class attributes, wherein step (c) is performed using a machine learning module that identifies patterns linking the GBAs with the analytical method records of the method store; and (d) provide (e.g., render or provide for rendering) the one or more study design results for display and/or further processing.
In another aspect, the invention is directed to a system for populating a method store corresponding to a database of records representing analytical methods and/or analytical stages thereof having been previously applied in structural characterization of a plurality of known biologies. In certain embodiments, the system includes: a processor; and a memory having instructions stored thereon, wherein the instructions, when executed by the processor cause the processor to: (a) create an analytical method record corresponding to a specific analytical method having been used in an analytical study for structural characterization of an associated known biologic, wherein the analytical method record comprises a sequence of analytical stage records, each representing a specific analytical stage used in the analytical method that the analytical method record represents; (b) store the analytical method record in the method store; (c) store one or more generalized biologic attributes (GBAs) of the associated known biologic in the method store; and (d) link the one or more known biologic GBAs with the analytical method record.
Embodiments described with respect to one aspect of the invention may be applied to another aspect of the invention (e.g., features of embodiments described with respect to one independent claim are contemplated to be applicable to other embodiments of other independent claims). BRIEF DESCRIPTION OF THE DRAWINGS
The foregoing and other objects, aspects, features, and advantages of the present disclosure will become more apparent and better understood by referring to the following description taken in conjunction with the accompanying drawings, in which:
FIG. 1 A is a block diagram showing a process for populating a method store, according to an illustrative embodiment.
FIG. IB is a block diagram showing a process for populating a method store, including examples of determined generalizable biologic attributes, according to an illustrative embodiment.
FIG. 1C is a block diagram showing a process for populating a method store, including an example organization of analytic stages, according to an illustrative embodiment.
FIG. ID is a block flow diagram illustrating a process for populating an analytical method store, according to an illustrative embodiment.
FIG. 2A is a block diagram showing a process for determining study design results, according to an illustrative embodiment.
FIG. 2B is a block diagram showing a process for determining an analytical method for a target biologic, according to an illustrative embodiment.
FIG. 3 is a portion of pseudocode for instantiating an analytical stage record, according to an illustrative embodiment.
FIG. 4 is a diagram showing an example hierarchical organization of various analytical stages and relevant parameters, according to an illustrative embodiment.
FIG. 5 is a block diagram showing an organization analytical method records and their links to generalizable biologic attributes, according to an illustrative embodiment. FIG. 6 is a block diagram showing an example process for creating analytical stage records and analytical method records from previously carried out structural characterization studies, according to an illustrative embodiment.
FIG. 7 is a block flow diagram illustrating a process for determining study design results, according to an illustrative embodiment.
FIG. 8A is a graph illustrating use of cluster analysis to determine groups of related analytical method records, wherein each of two example target biologies are identified as belonging to a determine group of related analytical methods, according to an illustrative embodiment.
FIG. 8B is a graph illustrating use of cluster analysis to determine groups of related analytical method records, wherein an example target biologic is identified as not belonging to any of the determined groups, according to an illustrative embodiment.
FIG. 9 is a set of graphs illustrating use of cluster analysis to determine groups of related analytical method records, wherein a non-linear transform is used, according to an illustrative embodiment.
FIG. 10 is a block diagram of an exemplary cloud computing environment, used in certain embodiments.
FIG. 1 1 is a block diagram of an example computing device and an example mobile computing device, used in certain embodiments.
FIG. 12 is a flow chart showing three different analytical methods, namely, bottom-up (Fragmentation1: produced by enzymatic digestion using multiple enzymes), middle-down (Fragmentation2: produced by enzymatic digestion using a single enzyme or chemical), or top-down (Fragmentation3: produced via gas phase cleavage for example using collision induced dissociation (CID) and electron transfer dissociation (ETD)), that may be used for characterizing the primary sequence of a biologic, for example, a protein, according to an illustrative embodiment.
The features and advantages of the present disclosure will become more apparent from the detailed description set forth below when taken in conjunction with the drawings, in which like reference characters identify corresponding elements throughout. In the drawings, like reference numbers generally indicate identical, functionally similar, and/or structurally similar elements.
DEFINITIONS
In this application, the use of "or" means "and/or" unless stated otherwise. As used in this application, the term "comprise" and variations of the term, such as "comprising" and "comprises," are not intended to exclude other additives, components, integers or steps. As used in this application, the terms "about" and "approximately" are used as equivalents. Any numerals used in this application with or without about/approximately are meant to cover any normal fluctuations appreciated by one of ordinary skill in the relevant art. In certain embodiments, the term "approximately" or "about" refers to a range of values that fall within 25%, 20%, 19%, 18%, 17%, 16%, 15%, 14%, 13%, 12%, 11%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, or less in either direction (greater than or less than) of the stated reference value unless otherwise stated or otherwise evident from the context (except where such number would exceed 100% of a possible value).
Biologic: As used herein, the terms "biologic" and refers to a composition that is produced by recombinant DNA technologies, peptide synthesis, or purified from natural sources and that has a desired biological activity. The biologic can be, for example, a protein, peptide, glycoprotein, polysaccharide, a mixture of proteins or peptides, a mixture of glycoproteins, a mixture of polysaccharides, a mixture of one or more of a protein, peptide, glycoprotein or polysaccharide, or a derivatized form of any of the foregoing entities. The molecular weight of biologies can vary widely, from about 1000 Da for small peptides such as peptide hormones to one thousand kDa or more for complex polysaccharides, mucins, and other heavily glycosylated proteins. The biologic subject of the process of this invention can have a molecular weight of 1 kDa to 1000 kDa, more typically 20 kDa to 200 kDa, and often 30 kDa to 150 kDa. By way of example, desmopressin, oxytocin, angiotensin and bradykinin each have a molecular weight of about 1 kDa, calcitonin is 3.5 kDa, insulin is 5.8 kDa, kineret is 17.3 kDa, erythropoietin is about 30 kDa, ontak is 58 kDa, orencia is 92 kDa, and antibodies are approximately 150 kDa (Rituxan 145 kDa, Erbitux 152 kDa). Hyaluronic acids and salts have an average molecular weight often greater than 1000 kDa.
In certain embodiments, a biologic is a drug used for treatment of diseases and/or medical conditions. Examples of biologic drugs include, for example, native or engineered antibodies or antigen binding fragments thereof, and antibody-drug conjugates, which comprise an antibody or antigen binding fragments thereof conjugated directly or indirectly (e.g., via a linker) to a drug of interest, such as a cytotoxic drug or toxin.
In certain embodiments, a biologic is a diagnostic, used to diagnose diseases and/or medical conditions. For example, allergen patch tests utilize biologies (e.g., biologies manufactured from natural substances) that are known to cause contact dermatitis.
Diagnostic biologies may also include medical imaging agents, such as proteins that are labelled with agents that provide a detectable signal that facilitates imaging such as fluorescent markers, dyes, radionuclides, and the like.
Reference biologic: As used herein, the terms "reference biologic" and "reference biologic drug" refer to a biologic that is representative of the biologic drug under development or that that has been approved for marketing, and provides a reference standard for the biologic drug with, for example, the appropriate, pre-determined composition, purity and/or biological activity.
Study, Structural characterization study, analytical study: As used herein, the terms "study", and "structural characterization study" refer to an experimental study that is performed on a biologic and obtains information (e.g., data; e.g., data derived from mass spectrometry analysis) about specific types of structural features of the biologic. Examples of structural characterization studies include a molecular weight study, a primary structure study, amino acid modifications studies, post-translational modification studies, higher order structure determination studies, critical quality attribute (CQA) mapping studies, in vivo comparability profile determination studies, and biosimilar/reference lot comparison studies. As described in detail in the following, a molecular weight study determines a molecular weight of a given biologic, a primary structure study determines a primary structure - a measured amino acid sequence - of a given biologic, amino acid modification studies are used to obtain information about one or more specific types of amino acid modifications that may occur for a given biologic, post-translational modification studies are used to obtain information about one or more specific types of post-translational modifications that may occur for a given biologic, higher order structure studies are used to determine one or more higher order structures, such as secondary, tertiary, and quaternary structures, of a given biologic, CQA mapping studies are used to determine a CQA map of a given biologic, in vivo comparability profile studies are used to determine an in vivo comparability profile for a given biologic, biosimilar/reference comparison study, and a lot comparison study.
Analytical Stage: As used herein, the term "analytical stage", refers to a specific isolated experimental processing step used, typically in combination with other analytical stages, to obtain a structural characterization of a biologic. Examples of types of analytical stages include (i) sample extraction, (ii) sample purification, (iii) sample preparation, (iv) digestion, (v) separation, (vi) detection, and (vii) mass spectrometry, each of which are described in detail in the following.
Analytical method: As used herein, the term "analytical method" refers to a sequence of one or more analytical stages that is applied to a given biologic in order to obtain a dataset used in obtaining a particular desired structural characterization of a biologic. In certain embodiments, the dataset obtained by applying an analytical method to characterize a given biologic comprises mass spectrometry data. In certain embodiments, determining or identifying a given analytical method for use in characterizing a given biologic comprises determining a sequence of one or more analytical stages of the given analytical method.
Study design: As used herein, the term "study design" refers to an identification of a specific combination of one or more analytical methods that are used to carry out a particular desired structural characterization study of a specific biologic. A study design depends on both the type of structural characterization study and the specific biologic it is used to characterize. In certain embodiments, determining or identifying a particular study design for a given biologic may comprise determining or identifying one or more analytical methods and, accordingly, for each of the one or more analytical methods determining or identifying a sequence of one or more analytical stages.
In certain embodiments, a study design comprises a single analytical method. In certain embodiments, a study design comprises two or more analytical methods and datasets obtained by applying each of the analytical methods to characterize the biologic are combined to obtain the desired structural characterization of the biologic.
Analytical stage record: As used herein, the term "analytical stage record" refers to a data structure that represents a specific analytical stage having been implemented in the characterization of a specific known biologic. An analytical stage record comprises an identifier of the specific analytical stage it represents and a series of parameter values used to implement the represented analytical stage for characterizing the specific known biologic. An analytical stage record may be implemented in software in a variety of ways, such as a database entry, an instance of a class (e.g., as in object-oriented programming), or a combination of data elements (e.g., arrays, structures, integer variables, string variables, and the like).
Analytical method record: As used herein, the term "analytical method record" refers to a data structure that represents a specific analytical method. An analytical method record comprises one or more analytical stage records, each of which represents a specific analytical stage that the represented analytical method comprises. An analytical method record may be implemented in software in a variety of ways, such as a database entry, an instance of a class (e.g., as in object-oriented programming), or a combination of data elements (e.g., arrays, structs, integer variables, string variables, and the like).
Analytical stage result: As used herein, the term "analytical stage result" refers to a computer representation of a specific analytical stage, as output by the study design and method capture tool described herein.
Analytical method result: As used herein, the term "analytical method result" refers to a computer representation of a specific analytical method, as output by the study design and method capture tool described herein. In certain embodiments, an analytical method result comprises a sequence of analytical stage results, just as an analytical method is a sequence of analytical stages. In certain embodiments, steps such as determining an analytical method result, providing an analytical method result, rendering an analytical method result, and the like comprise performing the same steps for analytical stage results that the analytical method result comprises.
Study design result: As used herein, the term "study design result" refers to a computer representation of a study design, as output by the study design and method capture tool described herein. In certain embodiments, a study design result comprises a single analytical method result. In certain embodiments, a study design result comprises two or more analytical method results. In certain embodiments, steps such as determining an study design result, providing an study design result, rendering an study design result, and the like comprise performing the same steps for one or more analytical method results that the study design result comprises.
Generalizable Biologic Attributes: As used herein, the term "generalizable biologic attributes", refers to sets of features from biologic molecules associated via heuristic and/or domain knowledge with recommended analytical methods. In particular, a generalizable biologic attribute (GBA) of a given biologic is a value that is derived from and represents any one of (i) structural features of the given biologic, (ii) properties of a biomanufacturing process used to produce the given biologic, and (iii) attributes of a structural characterization study that either has previously been performed on the given biologic, or is to be performed for the given biologic.
Examples of GBAs representing structural features of a biologic include sequence fragments, molecular weight, molecule type, i For example, a GBA corresponding to a molecular weight of a given biologic may be a value of the molecular weight of the given biologic.
GBAs representing structural features of a biologic may also include a quantification of (e.g., a number of; e.g., a fraction of) one or more specific amino acids [e.g., Arginine (also referred to as Arg, or R); e.g., Lysine (also referred to as Lys, or K); e.g., cysteine (also referred to as Cys, or C)] within the biologic. GBAs may also include an identification and/or quantification of patterns of amino acid motifs associated with propensity towards certain types of modification [e.g., a position and/or number of potential (e.g., predicted) or known sites of oxidation; e.g., a position and/or number of potential (e.g., predicted) or known sites of deamidation; a position and/or number of potential (e.g., predicted) or known sites of various post-translational modifications (e.g., N-linked glycosylation; e.g., disulfide bridges; e.g., disulfide knots; e.g., modification of cysteine to formylglycine)].
GBAs representing structural features of a given biologic also include proportions of amino acids of various properties (e.g., hydrophobicity, hydrophilicity, charge, acidity, aromaticity, and the like). For example, GBAs representing proportions of amino acids of various properties within a given biologic include a proportion of amino acids having a particular classification based on hydrophobicity [e.g., having at least a give level of hydrophobicity (e.g., as measured on a predefined scale); having less than or equal to a given level of hydrophobicity; e.g., having a level of hydrophobicity within a particular range; classified as hydrophobic], a proportion of amino acids having a particular classification based on hydrophilicity [e.g., having at least a give level of hydrophilicity (e.g., as measured on a predefined scale); having less than or equal to a given level of hydrophilicity; e.g., having a level of hydrophilicity within a particular range; classified as hydrophilic], a proportion of amino acids having a particular classification based on charge (e.g., having a charge greater than or equal to a specific charge; e.g., having charge less than or equal to a specific charge; e.g., having a positive charge; e.g., having a negative charge; e.g., neutral), a proportion of amino acids having a particular classification based on acidity (e.g., classified as acidic; e.g., classified as basic; e.g., classified as neutral), and a proportion of amino acids having a particular classification based on aromaticity (e.g., classified as aromatic).
In certain embodiments, GBAs of a given biologic include one or more metrics describing predicted or previously known results of applying a fragmentation stage [e.g., enzymatic digestion (e.g., applied in solution and/or in-gel); e.g., chemical digestion (e.g., applied in solution and/or in-gel); e.g., gas-phase fragmentation; e.g., any combination of one or more enzymatic digestion, chemical digestion, and gas-phase fragmentation methods, applied serially or in a cocktail] to given biologic.
For example, a GBA corresponding to a likelihood of enzymatic cleavage resulting from digestion with one or more select proteolytic enzymes for which cleavage sites are highly specific (e.g., enzymes including trypsin, Lys-C, Glu-C, Asp-N, Arg-C, and other enzymes), whether the one or more enzymes are applied singly, serially or simultaneously, may be determined. GBAs corresponding to fragmentation pattems resulting from enzymatic digestion, chemical digestion, gas-phase fragmentation, and combinations thereof may be determined. Examples of enzymes used in enzymatic digestion for which fragmentation patterns may be determined include trypsin, Lys-C, Glu-C, Asp-N, Arg-C, and other enzymes. Examples of chemicals used in chemical digestion for which fragmentation patterns may be determined include cyanogen bromide (CNBr), 2-nitro-5-thiocyanobenzoate (NTCB), hydroxy lamine, formic acid (FA), and other chemicals. GBAs corresponding to combinations of enzymatic and chemical cleavage pattems may also be determined.
Fragmentation patters for enzymatic digestion, chemical digestion, and combinations thereof may be determined for in-solution and/or in-gel digestion. GBAs corresponding to gas-phase fragmentation patterns include fragmentation patterns from gas-phase fragmentation techniques carried out in an electronic instrument such as a mass spectrometer. Such gas- phase fragmentation techniques include collision induced dissociation (CID), higher-energy collisional dissociation (HCD), electron capture dissociation (ECD), electron transfer dissociation (ETD), infrared multiphoton dissociation (IRMPD), and other fragmentation techniques.
GBAs corresponding to metrics describing predicted or previously known results of applying a fragmentation stage to a given biologic also include statistical distributions of fragments by fragment length and/or by molecular weight (e.g., a histogram of fragment lengths; e.g., a histogram of fragment molecular weights; e.g., an average fragment length; e.g., an average fragment molecular weight).
Other GBAs corresponding to features related to physico-chemical properties of biologic molecules may also be determined.
In certain embodiments, GBAs of a given biologic include one or more study class attributes that identify one or more particular structural characterization studies that have been performed on the given biologic (e.g., wherein the given biologic is a known, previously characterized biologic), or identify one or more particular desired structural characterization studies to be performed on the given biologic (e.g., wherein the given biologic is a target biologic to be characterized).
In certain embodiments, GBAs of a given biologic include one or more bioprocess attributes that represent parameters of a biomanufacturing process used to produce the given biologic. For example, bioprocess attributes may include an identification of an identification of a cell culture type used to produce the associated known biologic (e.g., such as a textual label that identifies a cell culture type), and an identification of a purification stage performed subsequently to harvesting a cell culture supernatant.
Link: As used herein, the terms "link", and "linked", as in a first data structure or data element is linked to a second data structure or data element, refer to a computer representation of an association between two data structures or data elements that is stored electronically (e.g. in computer memory).
DETAILED DESCRIPTION
It is contemplated that systems, architectures, devices, methods, and processes of the claimed invention encompass variations and adaptations developed using information from the embodiments described herein. Adaptation and/or modification of the systems, architectures, devices, methods, and processes described herein may be performed, as contemplated by this description.
Throughout the description, where articles, devices, systems, and architectures are described as having, including, or comprising specific components, or where processes and methods are described as having, including, or comprising specific steps, it is contemplated that, additionally, there are articles, devices, systems, and architectures of the present invention that consist essentially of, or consist of, the recited components, and that there are processes and methods according to the present invention that consist essentially of, or consist of, the recited processing steps.
It should be understood that the order of steps or order for performing certain action is immaterial so long as the invention remains operable. Moreover, two or more steps or actions may be conducted simultaneously.
The mention herein of any publication, for example, in the Background section, is not an admission that the publication serves as prior art with respect to any of the claims presented herein. The Background section is presented for purposes of clarity and is not meant as a description of prior art with respect to any claim.
Documents are incorporated herein by reference as noted. Where there is any discrepancy in the meaning of a particular term, the meaning provided in the Definition section above is controlling.
Headers are provided for the convenience of the reader - the presence and/or placement of a header is not intended to limit the scope of the subject matter described herein.
In certain embodiments, the systems and methods described herein facilitate the design of analytical studies for characterizing various structural properties of biologies. In particular, in certain embodiments, the approaches described herein allow a user to submit (e.g., to a computer program) input describing a target biologic that they are interested in characterizing and a particular type of desired structural characterization that they would like to obtain for the target biologic. The approaches described herein then provides a specific detailed procedure for obtaining, from a sample comprising the target biologic, the desired structural characterization in the form of a study design result.
In certain embodiments, the study design results determined by the tool comprises a single analytical method result, which in turn comprises a sequence of analytical stage results that represent specific analytical stages to be implemented in order to obtain the desired structural characterization of the target biologic. In certain embodiments, a study design result comprises two or more analytical method results that represent orthogonal analytical methods which provide complementary data with respect to each other.
For example, structural characterization studies for determining a primary structure of a biologic may utilize multiple analytical methods in order to obtain a high level of sequence coverage. An example 1200 of such an approach is shown in FIG. 12, which illustrates three different analytical methods that can be combined in a study for determining a primary structure of a biologic. As shown in the figure, the three different analytical methods - a top- down approach, a bottom-up approach, and a middle-down approach use different sample digestion strategies (including no digestion in the case of the top-down approach) to obtain different datasets for a given target biologic to be characterized. Accordingly, each of the three analytical methods comprises a different sequence of analytical stages. The top-down approach includes a protein biologies production stage 1202, a separation stage 1204, a mass spectrometry stage 1206, a fragmentation stage 1208, a data process and analysis stage 1210, and a comprehensive coverage of protein structure stage 1212. Compared to the top-down approach, the bottom-up approach includes an additional fragmentation stage 1214 and a small peptides and glycan fragments stage 1216. The middle-down approach includes an additional fragmentation stage 1218 and a large peptides and glycan fragments stage 1220. An example study design result for determining a primary structure of a target biologic that combines the three approaches shown in FIG. 12 would include three analytical method results, each representing one of the illustrated top-down, bottom-up, and middle-down approaches.
Study design results may be provided to the user in a variety of fashions, for example as one or more documents (e.g., an automatically generated document) and/or via a graphical user interface (GUI) that allows the user to edit various steps and parameters of the provided study design result in an interactive fashion. The documents provided via a study design result may include generated software files that specify parameters for analytical instruments such as liquid chromatography workstations, mass spectrometry workstations, and the like. Such generated software files may be loaded by analytical instruments in order to set up determined parameters to be used in carrying out the desired structural characterization study.
Determining a study design for a biologic molecule is non-trivial. A given study design that, when implemented, allows a desired structural characterization of a target biologic to be obtained, comprises one or more analytical methods, each comprising a sequence of specific analytical stages. The specific sequence of analytical stages of an appropriate analytical method depends in a complex fashion on underlying properties of the target biologic and the specific desired structural characterization. As described in greater detail below, the systems and methods described herein utilize a database - a method store - that codifies previous experience in applying various analytical methods (and the analytical stages they comprise) to characterize biologies.
An example 100 of organization and procedure for building a method store is shown in FIGS. 1A-1C. Notably, in certain embodiments, the manner in which data are stored in the method store goes beyond merely storing records of studies and analytical methods that were previously implemented to characterize various biologies. In particular, as shown in FIGS. 1A-1C, the method store 120 includes sets of generalizable biologic attributes (GBAs) 122 that are determined for the known biologies using analytical methods and analytical stages thereof that are stored as records in the method store. For a given known biologic, GBAs are determined via a preprocessing step 110 and linked to the records of the various analytical methods and/or analytical stages 124 that were used in its characterization. As will be described in the following, GBAs are values representing various features of a biologic and allow similarities between various different biologies to be identified.
In certain embodiments, by virtue of the manner in which analytical stage and/or method records are linked with GBAs of known biologies in the method store, the approaches described herein go beyond merely searching a database to identify and return study design results. In particular, the systems and methods described herein may determine, for a given target biologic, a set of target GBAs that are used as features in machine learning algorithms that analyze the method store to identify relevant analytical methods that can be applied to obtain a desired structural characterization of the target biologic. This approach is illustrated in the block diagram of FIG. 2A. In this manner, the systems and methods allow for study design results to be determined for target biologies that may not have been characterized before. A determined study design result includes analytical method results that represent analytical methods and comprise sequences of analytical stage results, representing specific analytical stages. Notably, the specific sequence of analytical stages represented by the analytical stage results of a given determined analytical method result does not need to have been previously performed and stored in the method store. Similarly, a given determined study design result may include analytical method results that represent new and unique combinations of analytical methods. Accordingly, the systems and methods described herein allow for novel and unique study design results to be obtained. Embodiments of specific approaches for obtaining user input, determining GBAs, representing data, and determining and providing study design results are described in detail in the following. A. Structural Characterization of Biologies
The structure of biological molecules can be highly complex, and vary significantly between different biologies. A wide variety of structural characterization studies - including determining molecular weight, a measured primary structure, characterizing and quantifying amino acid and post-translational modifications, and characterizing higher order structures (HOS)(e.g., such as secondary, tertiary, and quaternary structures) are used to determine and identify relevant structural features of biologies that are relevant to their efficacy and/or stability and may be influenced, e.g., by processes for manufacturing the biologic. In certain embodiments, different combinations of analytical methods, each comprising different sequences and combinations of analytical stages are utilized in different structural characterization studies. Moreover, in certain embodiments, the particular analytical methods and analytical stages that they comprise that will be successful in allowing a desired structural characterization of a given biologic to be obtained depend in a complex fashion on the structure of the given biologic itself. Embodiments of various structural characterization studies and relevant analytical method approaches are described in detail in
PCT/US2014/059150 and PCT/US2016/053434, the contents of which are hereby incorporated by reference in their entirety. Several embodiments of various analytical stages are summarized in Sections A.i and A.ii below. Several embodiments of structural characterization studies that apply various combinations of analytical methods are summarized in Section A.iii. A.i Sample Extraction, Purification, and Preparation Analytical Stages
A.La Extraction
In certain embodiments, a given biologic of interest (e.g., a target biologic; e.g., a known biologic) to be characterized is initially obtained as a component within a biological sample, such as a cell, tissue, or bodily fluid (e.g., blood, urine, saliva, and the like). In certain embodiments, a particular extraction procedure used depends on the particular sample in which the biologic is present. For example, if the biologic of interest is present in blood of a subject, extraction may be effectuated by removing a sample of blood from the subject.
A.Lb Purification
In certain embodiments, a sample comprising a given biologic of interest (e.g., a target biologic; e.g., a known biologic) is purified in order to obtain a desired level of purity of the biologic of interest (e.g., the biologic of interest comprises a specific percentage by weight of all components in the sample). For example, in certain embodiments, a sample comprising a biologic is purified to at least 50% purity (e.g., such that the biologic of interest comprises a at least 50% by weight of all components in the sample).
In certain embodiments, purification steps remove impurities such as process-related contaminants, unrelated biological macromolecules, misfolded proteins, and the like). In certain embodiments, a cell sample comprises a biologic of interest and purification steps are used to purify a biologic from other macromolecules present in the cell sample, such as unrelated nucleic acids, lipids, glycolipids, polysaccharides, lipopolysaccharides, proteins, or even misfolded and/or misprocessed forms of the biological molecule of interest. In certain embodiments, immunocapture (also known as immunoaffinity) purification techniques are used. Non-antibody based purification techniques such as protein precipitation, gel filtration, ion exchange chromatography and gel electrophoresis may alternatively (or also) be used. Immunocapture and non-antibody based purification techniques are described in additional detail in PCT/US2014/059150 and PCT/US2016/053434, the contents of which are incorporated herein by reference in their entirety.
A.Lc Denaturation
In order to characterize higher order structures of a biologic of interest, a denaturation step may be performed. Denaturation methods may involve exposure of the biologic of interest to elevated temperatures, chemical denaturants, and mechanical stress (e.g., freeze- thaw processes).
A.Ld Reduction and Alkylation
In certain embodiments, analytical methods used to characterize a biologic include a reduction and alkylation analytical stages. A reduction and alkylation stage may be used to cleave disulfide bridges during in order to characterize biologies that include disulfide bridges, such as monoclonal antibodies. In certain embodiments, a reduction and alkylation stage comprises exposure of a biologic to a reducing agent (e.g., 2-mercaptoethanol) followed by exposure to an alkylating agent (e.g., iodoacetic acid; e.g., iodacetamide (IAA); e.g., ethylmaleimide (NEM)). Parameters relevant for carrying out a reduction and alkylation stage include concentrations of the reducing and alkylating agents, as well as temperatures of each of the solutions comprising the reducing and alkylating agents and times for which the biologic of interest should be exposed (e.g., immersed in) the solutions.
A.Le Fragmentation
In certain embodiments, a structural characterization study of a given biologic includes one or more fragmentation stages. A variety of fragmentation techniques may be used and can be executed in solution (e.g., solution digestion), in a gel (e.g., in-gel digestion), or in a gas phase (e.g., gas-phase fragmentation). A. i. e.1 In Solution Digestion
In certain embodiments, enzymatic digestion in solution is used for fragmentation. For example, enzymatic digestion comprises exposure of a biologic to one or more digestion enzymes that cleave the biologic at various positions along its primary structure sequence (e.g., at specific sites within or adjacent to specific residues and/or combinations of residues; e.g., in between specific combinations of residues; e.g., at random positions along its primary structure sequence). Examples of digestion enzymes include, without limitation, various proteolytic enzymes, such as trypsin, Lys-C, Glu-C, Asp-N, and Arg-C, various
deglycosylating enzymes, such as PNGaseF, O-glyocosidase, sialidase, glucosaminidase, and beta-galactosidase, as well as other enzymes, such as pepsin, papain, chymotrypsin, aminopeptidases, and carboxypeptidases.
In certain embodiments, an enzymatic digestion step is a single digest, wherein a single enzyme is used for fragmentation. Alternatively, an enzymatic digestion step may be a serial digest, wherein multiple enzymes are used one after the other, in a serial fashion. Various enzymes used in a serial digest may be added to a solution comprising the biologic one after another. Prior to adding a given enzyme the solution in a serial digest, a composition may be added to the solution to terminate the reaction between the biologic and the previously added enzyme. In certain embodiments, a given enzyme is added to the solution following buffer exchange. In certain embodiments, an enzymatic digestion step is a cocktail digest, wherein a biologic of interest is exposed to multiple enzymes in a single reaction mixture. Examples of cocktails of enzymes that are viable include, without limitation, (i) a mixture of trypsin and Lys-C, (ii) a mixture of trypsin, Lyz-C, and Asp-N, and (iii) a mixture of trypsin, Lys-C, Asp-N, and PNGase. In certain embodiments, relevant parameters used for in solution enzymatic digestion stages include pHs of solutions, buffer compositions, temperatures, and incubation times. Chemical digestion may be used for fragmentation, wherein a biologic of interest is exposed to a solution comprising one or more particular digestion chemicals. Examples of digestion chemicals include cyanogen bromide (CNBr), 2-nitro-5-thiocyanobenzoate (NTCB), hydroxylamine, and formic acid (FA). In certain embodiments, as with enzymatic digestion, a chemical digestion step can be implemented using a single chemical, multiple chemicals in a serial fashion, or a cocktail of multiple chemicals. In certain embodiments, relevant parameters used for in solution chemical digestion steps include pHs of solutions, buffer compositions, temperatures, and incubation times.
In certain embodiments, fragmentation is achieved via a combination of (i) one or more chemicals with (ii) one or more enzymes. Combinations of chemical and enzymatic digestion may be applied in a serial fashion.
A. i. e.2 In Gel Digestion
In certain embodiments, an in gel digestion stage is used for fragmentation. In gel digestion comprises subjecting a biologic of interest that is captured in a gel band to digestion in situ - that is, within the gel band. In certain embodiments, digestion enzymes and/or digestion chemicals are added to a gel band comprising the biologic of interest in a manner similar to that described above with respect to in solution digestion. In particular, digestion via a single enzyme or chemical, or multiple enzymes and/or chemicals (e.g., using serial digestion wherein multiple enzymes and or chemicals are added one after another in a serial fashion; e.g., using a cocktail comprising the multiple enzymes and/or chemicals) may be used for in gel digestion, as with in solution digestion.
A. i. e.3 Gas Phase Fragmentation
Biologies may also be fragmented via gas phase fragmentation techniques. Gas phase fragmentation may be carried out as part of mass spectrometry analysis. Examples of gas- phase fragmentation used in mass spectrometers include collision induced dissociation (CID), higher energy collisional dissociation (HCD), electron capture dissociation (ECD), electron transfer dissociation (ETD), infrared multi-photon dissociation (IRMPD), ultraviolet photo dissociation mass spectrometry, and CID of the isolated charge-reduced ions followed by ETD (CRCID). A particular gas-phase fragmentation technique used may depend on the type of mass spectrometry instrument used. A particular gas-phase fragmentation technique used may also depend on the nature of the biologic molecule and the desired peptide fragments. For example, low energy CID fragmentation typically occurs at amide bonds of the peptide backbone of proteins and typically generates b and y sequencing ions. ECD and ETD fragmentation can cleave proteins at the N-Ca bond within their peptide backbone and generates c and z ions. In certain embodiments, a particular type of gas-phase fragmentation to be used is identified as a parameter in a mass analysis analytical method step.
A.ii Separation, Detection, and Mass Spectrometry
A.iLa Separation
In certain embodiments, a separation stage is used. Examples of separation stages include, without limitation, gel electrophoresis (GE), sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE), 2-dimensional GE, isoelectric focusing (IEF), high- performance liquid chromatography (HPLC), reverse-phase HPLC (RP-HPLC), hydrophobic interaction chromatography (HIC), hydrophilic interaction chromatography (HILIC), size- exclusion chromatography (SEC), ion-exchange chromatography (IEC), anion exchange chromatography (AEX), cation exchange chromatography (CEX), capillary electrophoresis (CE), and 2D-HPLC. Relevant separation method parameters include column selection, mobile phase composition, flow rates, gradients. A.ilb Detection
In certain embodiments, a detection stage is included to measure retention times. Examples of detection methods include, without limitation, fluorescence detection (FL), , ultraviolet to visible absorbance detection (UV/Vis) , multiple-wavelength diode array detection (DAD), light-scattering spectroscopy.
A.ilc Mass Spectrometry
In certain embodiments, mass spectrometry is used for characterization of complex biologies. Mass spectrometers directly read the mass fingerprints (mass/charge ratios, m/z) of intact or fragmented proteins or molecules. Types of mass spectrometers used for structural characterization of biologies include quadrupole, ion trap, time-of-fiight (TOF), orbitrap and Fourier transform ion cyclotron resonance (FTICR). Modern hybrid mass analyzers such as a hybrid linear ion trap-obitrap (e.g., Thermo LTQ Obitrap Elite™) and a quadrupole-time-of-flight (e.g., Agilent Q-TOF mass spectrometers) have been developed for structural characterization of biologies to support biopharmaceutical discovery and development pipelines.
In certain embodiments, mass spectrometry includes use of a gas-phase fragmentation technique that is carried out within the mass spectrometer. Examples of gas-phase fragmentation techniques include collision induced dissociation (CID), higher-energy collisional dissociation (HCD), electron capture dissociation (ECD), electron transfer dissociation (ETD), infrared multi-photon dissociation (IRMPD), and CID of the isolated charge-reduced ions followed by ETD (CRCID). A particular type of gas-phase
fragmentation used may depend on the type of mass spectrometers used (Scigelova,
PRACTICAL PROTEOMICS (2006) 1-2: 16-21; Elviri, TANDEM MASS
SPECTROMETRY - APPLICATIONS AND PRINCIPLES (2012) 162-178). For example, CID, HCD, and ETD are built on a hybrid linear ion trap-obitrap mass spectrometer; CID and ECD are constructed on FTICR MS; low-energy CID is configured on Q-TOF MS; high- energy CID is included on TOF/TOF MS
Additionally, as described above, and in detail in PCT/US2014/059150 and
PCT/US2016/053434, the contents of which are incorporated by reference herein in their entirety, selection of a particular fragmentation technique, such as CID fragmentation or ECD fragmentation, also depends on the nature of the biologic molecule and the desired peptide fragments.
In certain embodiments, to permit the introduction of biologies, such as proteins and peptides, into a mass spectrometer, an ionization source is used. Examples of ionization source technologies employed on mass spectrometers include electrospray ionization (ESI) and matrix-assisted laser desorption ionization (MALDI). A particular ionization technology to be used in a mass spectrometry stage varies with the type of structural characterization desired, the nature of the biologic to be characterized, as well as other analytical stages that are used, such as a type of separation stage implemented prior to introduction of a sample into a mass spectrometer. For example, an ESI interface enables on-line introduction of samples (analytes) using HPLC, CE, or an infusion pump to deliver analytes from solution phase into gas phase on a mass analyzer. Mass spectrometric methods using ESI ionization with CID, HCD, ETD, and CRCID fragmentation techniques can be used to facilitate structural characterization of biologies. A MALDI interface is especially beneficial for a sample where the amount is limited. For example, a protein or peptide sample (1 μί) typically is spotted on a MALDI plate having a matrix such as a-cyano-4-hydroxycinnamic acid or sinapinic acid to form a crystal prior to MS analysis.
Accordingly, parameters relevant to a mass spectrometry analytical stage include identifiers of acquisition mode, fragmentation technique, and ionization technology, as well as various instrument settings (e.g., ESI source temperature and voltage, polarity, capillary temperature, gas flow rate, scan range, scan resolution, collision energy, AGC target ion values, injection times, isolation windows, ion mode).
A. in Structural Characterization Studies
In certain embodiments, a user decides which of a variety of structural
characterization studies he wants to perform in the characterization of a particular biologic of interest. The automated approaches described herein are then used to systematically identify a set of one or more real world analytical methods, each in turn comprising a sequence of analytical stages that should be performed to carry out the chosen characterization study. A given structural characterization study is used to obtain a particular set of data and information that conveys particular structural information about the biologic of interest. The particular type of structural characterization study will, along with the particular biologic, influence the specific set of analytical methods that are automatically determined by the standardized techniques described herein, which, once determined, can be carried out in the real world in order to obtain the desired structural characterization. Examples of various types of structural characterization studies are summarized below. PCT/US2014/059150 and PCT/US2016/053434, the contents of which are incorporated by reference herein in their entirety, include detailed discussion of structural characterization studies such as the below, and discuss analytical methods and stages that are of relevance to particular studies.
AAila Molecular Weight
Molecular weight is an important aspect of biologies of interest. A variety of techniques can be used to measure a molecular weight of a biologic of interest. In certain embodiments, mass spectrometry is used to determine a molecular weight of an intact protein, following purification. A. Hi. b Primary Structure (Amino Acid Sequence)
The primary structure of a protein refers to the amino acid sequence of the polypeptide. It is very important to confirm the amino acid sequence of the protein backbone since amino acid modifications may occur during manufacture or storage of biologies, which can result in the loss of stability and/or biological function. A profile of amino acid modifications is one of the major quality characteristics for protein biologies.
Mass spectrometry-based platforms using multiple fragmentation techniques can be used to obtain a comprehensive coverage of peptide sequence on protein biologies. In particular, characterization of a primary structure of a protein may utilize a single analytical method or, as described above with respect to FIG. 12, two or more analytical methods, each employing different fragmentation stages, in combination. A detailed discussion of various mass spectrometry-based approaches, including analytical methods corresponding to bottom- up, middle-down, and top-down approaches, is included in PCT/US2014/059150 and PCT/US2016/053434, the contents of which are incorporated herein by reference in their entirety.
For example, in an analytical method for characterizing the primary structure of a protein, the protein is digested into large peptide fragments, which are subsequently fragmented using CID and/or ETD, on a mass spectrometer. In general, use of bottom-up and middle-down approaches covers most peptide sequences on proteins. Yet, a full coverage of protein sequence may not be possible for some proteins. A top-down approach can be used as an alternative to bottom-up and middle-down methods. An intact protein without digestion can be directly measured by mass analyzers and subsequently fragmented by CID, HCD and ETD. Top-down sequencing allows location of post-translational modifications and differentiating isomers which could be lost in bottom-up and/or middle-down approaches. The use of two or more of these approaches of mapping peptide fragments enables high probabilities of a full coverage of peptide sequences on proteins.
A.iilc Amino Acid Modifications
Amino acid modifications can occur in biological molecules during manufacture, formulation, or storage as a consequence of protein degradation or post-translational modifications. Protein degradation can occur as a result of chemical and physical modification. Chemical modification can change peptide backbone amino acids through oxidation, deamidation, isomerization, and racemization. Physical modification can trigger unfolding, misfolding, or aggregation on proteins. The altered or modified amino acids can be detected via peptide sequencing using enzymes, chemicals, and MS fragmentation techniques, as described herein. The key for identifying amino acid
modifications is to locate modification sites which can be found according to mass differences between modified (observed MS spectra) and un-modified (theoretically predicted MS spectra) amino acids. To confirm the modification sites, the peptide containing modified amino acid(s) is subjected to MS/MS analysis. Use of MS/MS methods depends upon the modification site of interest. Examples of particular amino acid modifications and their characterization are described in detail in PCT/US2014/059150 and
PCT/US2016/053434, the contents of which are incorporated herein by reference in their entirety.
(i) N-Terminal Modifications
N-terminal modifications include, for example, acetylation, methylation, formylation, cyclization of glutamine, myristoylation, phosphorylation, and glycosylation (Meinnel et al., PROTEOMICS (2008) 8: 626-649). For example, acetylation takes place mostly at a lysine (Lys) residue; formylation is often observed on an N-methionine (Met) residue; cyclization converts glutamine (Gin) to pyroglutamic acid (pGlu) which is observed in mAb; and myristoylation usually occurs to a glycine (Gly) residue. Based on the type of enzyme chosen for digestion, the digested protein containing peptide fragments including N-terminal peptide is subjected to LC-MS/MS analysis. The mass of the N-terminal peptide with a modification on amino acid can be monitored by LC-MS/MS. The type of N-terminal modification can be distinguished by mass differences (for example, +42 for acetylation; + 14 for methylation; +28 for formylation; + 17 for cyclization of glutamine to pyroglutamic acid; +210 for myristoylation; +80 for phosphorylation) through the MS full scan followed by LC separation. Once detecting the predicted mass of the modified N-peptide, the N-peptide containing amino acid modification is selected as a precursor ion and then subjected to CID- MS2 and/or ETD-MS2 to produce more fragments. Generally, the modification on the N- amino acid residue can be detected by additional masses on b fragmentation ions observed from CID-MS2 (for example, +42 for acetylation) and no mass shift on y fragmentation ions.
(ii) C-Terminal Modifications
C-terminal heterogeneities often occur in recombinant monoclonal antibodies (Liu et al, JOURNAL OF PHARMACEUTICAL SCIENCES (2008) 97(7): 2426-2447). One of the most common C-terminal heterogeneities (Liu (2008) supra) is the incomplete C-terminal lysine processing of the heavy chain during production of monoclonal antibodies to produce three antibody species containing zero, one, and two C-lysine residues. To characterize C- terminal lysine modification, the antibody is subjected to digestion using enzymes and then separation of heavy chains from light chains using molecular weight cut-off centrifugation filters (for example, 10,000 Da cut-off). The heavy-chain fragments containing C-terminal processing lysine peptide species are subjected to LC separation (e.g., use of reverse-phase LC column), followed by MS/MS analysis. Generally, peptides containing heterogeneous C- terminal lysine residues can be separated by LC and then identified by MS. For example, a reduction of 128 Da in mass indicates a removal of one C-terminal lysine residue. The positive charge state on the removal of one C-terminal lysine peptide is decreased by 1 unit as well. Therefore, various C-terminal lysine species can be identified by LC-MS based on the charged state and the masses. C-terminal amidation species are also often noted in the heavy chain of monoclonal antibodies (Tsubaki et al., INTERNATIONAL JOURNAL OF
BIOLOGICAL MACROMOLECULES (2013) 52: 139-147). Like C-terminal lysine modification, amidation (a reduction of 1 Da) occurs as a result of post-translational modification. Peptidylglycine a-amidating monooxygenase (PAM) can cleave C-terminal glycine (Gly) and amidate the penultimate amino acid, resulting in a reduction of 58 Da in mass if the second last amino acid to Gly residue is proline (Pro) or leucine (Leu). Similar to C-terminal lysine species, the heavy-chain peptide fragments are analyzed by MS using CID- MS2 followed by LC separation, C-terminal amidation species can be identified.
(Hi) Oxidation
Biologies can be oxidized if oxygen radicals or metals are present in the environment. In proteins, the most common oxidation occurs to amino acids containing a sulfur atom such as methionine (Met) and cysteine (Cys) or an aromatic ring such as histidine (His), tyrosine (Tyr), tryptophan (Tip), and phenylalanine (Phe) (Patal et al, BIOPROCESS
INTERNATIONAL (2011) 20-31). For example, the sulfur atom (S) on Met reacts with oxygen radicals in solution to form methionine sulfoxide (S=0) and methionine sulfone (0=S=0). Cys oxidation opens disulfide links and forms new disulfide bonds, leading to mispaired disulfide bonds and scrambled disulfide bridges. Spontaneous oxidation (also known as auto-oxidation) may cause Cys to form sulfinic acid (SOOH) and cysteic acid (SOOOH) if metal ions are present in solution. This oxidation occurs through the reaction between imidazole rings to generate oxidation products such as 2-oxo-histidine (2-0-His), aspartic acid (Asp), and asparagine (Asn). Trp can be oxidized by light (also known as photo-oxidation) to form oxidation products such as N-formylkynurenine and kynurenine (Li et al, BIOTECHNOLOGY AND BIOENGINEERING (1995) 48: 490-500). Photo- oxidation of Tyr can form 3,4-dihydroxyphenylalanine (DOPA) and dityrosine, resulting in covalent aggregation through forming Tyr-Tyr cross links. Protein oxidation can be measured through LC-MS analysis of a protein digest. Use of theoretically predicted masses of peptides containing potential oxidation products (for example,+ 16 Da or +32 Da) and fragmentation pattern observed on peptide fragments can identify oxidation products on protein.
(iv) Deamidation, Isomerization, and Racemization
Deamidation occurs in many recombinant proteins by removing an amide group from an amino acid such as asparagine (Asn) and glutamine (Gin) (Patal (2011) supra).
Deamidation is a non-enzymatic process that can take place spontaneously in proteins or peptides in vivo or in vitro systems. Consequently, proteins undergo isomerization and racemization after deamidation. For example, Asn is initially converted to aspartic acid (Asp) by the non-enzymatic deamidation process, which can be identified through a mass shift of +1 Da on a mass spectrometer. Isoaspartic acid (isoAsp ), as the most commonly found deamidation product, is then formed via isomerization of Asp. The isoAsp and Asp peptide products are normally separated by LC and subsequently identified by MS/MS. Besides, succinimide intermediate generated during Asn deamidation process can be converted to D-Asp (refer to racemization). Overall, the rate of deamidation on an intact protein is very slow; whereas the deamidation rate can be increased significantly for peptides under alkaline condition" (Hao et al, (2011) MOLECULAR & CELLULARPROTEOMICS 10.10). For example, D,L-Asp and D,L-isoAsp peptides (predominated with isoAsp peptides) are formed as a consequence of deamidation, isomerization, and racemization during trypsin digestion using buffer at pH 8. Normally, deamidation of Gin is much slower compared to the deamidation of Asn. It may be important to avoid inducing in-vitro deamidation during sample preparation while identifying in-vivo deamidation sites on proteins. Modified sample preparation procedures may be needed to identify deamidation modifications on proteins. For example, a protein sample can be subjected to trypsin digestions under pH 6.5 and pH 8, respectively. In vivo deamidation products can be distinguished from in-vitro products by profiling the digested proteins (pH 6.5 vs. pH 8) by LC-MS. Peptides obtained from protein digestion at pH 6.5 serves as a control to filter in-vitro induced deamidation peptide products at pH 8.
A. Hi d Post- Translational Modifications
Post-translational modifications play an essential role in protein functions which regulate cellular process. Post-translational modifications occur after the translation of mRNA. A post-translational modification is a biochemical process where amino acid residues are covalently modified by removing or adding molecules in a protein. These modifications can change a protein's folding, biological function, immunogenicity, and/or stability (Farley et al, METHODS IN ENZYMOLOGY (2009) 463: 725-762; Walsh et al, NATURE BIOTECHNOLOGY (2006) 24(10): 1241-1252).
Post-translational modifications include, but are not limited to, acetylation, acylation, γ-carboxylation, β-hydroxylation, disulfide bond formation, glycosylation, methylation, phosphorylation, proteolysis processing, and sulfation. Among these modifications acetylation, methylation, amidation, phosphorylation, and glycosylation are commonly found in approved therapeutic protein drugs and candidates in discovery or clinical trial stages. These modifications may take place in N-terminal, C-terminal, or side chain amino acids.
Heterogeneous species can be formed after post-translational modifications, such as glycosylation or amidation, which may or may not alter protein folding and function. Post- translational modifications usually occur during production of biologies in a cell or cell system. Accordingly, characterization of post-translational modifications provides structural insight to allow associating structure with biologic functions, as well as maintaining control over biologic manufacturing procedures. Use of mass spectrometric methods for characterization of post-translational modifications on proteins, and MS fragmentation techniques play critical roles in producing specific types of fragments to allow identifying post-translational modification sites in which amino acid residues are modified. Examples of characterization of protein post-translational modifications are described in detail in
PCT/US2014/059150 and PCT/US2016/053434, the contents of which are incorporated herein by reference in their entirety.
(i) Methylation and Acetylation
For example, methylation involves adding one or more methyl groups onto amino acids. For example, N-methylation can be found at the N-Terminal alanine, isoleucine, leucine, methionine, phenylalanine, proline, tyrsosine and/or the side chains of lysine, arginine, glutamine, asparagine or the imidazole ring of histidine residues (Paik et al, YONSEI MEDICAL JOURNAL (1986) 27(3): 159-177). O-methylation can be observed either at a C-Terminal cysteine, leucine, lysine or at the side chain of glutamic acid and aspartic acid residues (Paik (1986) supra). S-methylation can be noted at the side chains of methionine and/or cysteine residues. Acetylation transfers an acetyl group to the side chain of lysine (also known as lysine acetylation) or the N-terminal amino acid residue (also known as N-terminal acetylation). In general, methylation and acetylation modifications on proteins remain unchanged during sample preparation (for example, digestion by enzymes). After protein digestion, peptide fragments in solution including methylated and/or acetylated peptides are subjected to LC-MS analysis. Methylated peptide species can be identified by additional masses (for example,+ 14 Da for mono-methylation, +28 Da for di-methylation) followed by LC separation. Though trimethylation and acetylation modifications provide the same additional mass of 42 (Da), the identification can be carried out by the CID-MS2 fragmentation. For example, an unique immonium ion of mlz 126 can be observed in acetylated lysine but not present in tri-methylated lysine residues (Farley (2009) supra). In addition, a neutral loss of 59 Da, corresponding to the loss of trimethylamine, is unique for a tri-methylated lysine.
(ii) Phosphorylation
Phosphorylation typically occurs at serine, threonine or tyrosine residues of proteins or peptides. In general, phosphorylation is a reversible post-translational modification occurring in cellular process to control protein activities. Phosphorylation is one of liable posttranslational modifications. The phosphate groups on serine and threonine residues can compete with the peptide bones as preferable cleaved sites. Upon CID activation, peptides containing phosphorylated amino acid residues tend to lose the phosphor groups before they fragment along with the peptide backbone. As a result, mixed fragments are obtained, which cannot be differentiated between unmodified and phosphorylated peptides. To avoid obtaining ambiguous peptide sequences from phosphorylated proteins, a combination of CID (CID-MS2), ETD (ETD-MS2) and CRCID (CID-MS3) fragmentation techniques can be used to characterize phosphorylation modifications on proteins (Wu et al, JOURNAL OF PROTEOME RESEARCH 15 (2007) 6: 4230-4244).
For example, after denaturation, reduction, and alkylation a phosphorylated protein sample is subjected to digestion using Lys-C to generate large proteolytic peptides. A large pore size of monolithic LC column such as polystyrene-divinylbenzene (PS-DVB, 50 μιτι i.d. x 10 cm) can be used to separate large peptides including unmodified and phosphorylated peptides. CID and ETD are performed under both dependent and independent modes on a mass spectrometer (for example, Thermo LTQ Obitrap Elite ETD). Under an independent mode, use of CID and ETD can manually select less intensity of precursor ions (usually phosphorylated peptides) for subsequent fragmentation (for example CID-MS3), which is normally missed during data dependent experiments. Furthermore, fewer fragment ions ( c and z ions) are obtained for a large peptide with a less charge (for example, +2) in the ETD- MS2 scan, resulting in insufficient fragmentation for peptide assignment. A combination of using ETD (ETD-MS2) following CRCID (CID-MS3) via isolating an product ion produced in the ETD scan step can produce substantial c and z ion series along with phosphorylation sites on peptides. The peptide assignment for phosphorylated peptides and unmodified peptides is achieved using software(s) (for example, PepFinder and/or Proteome Discoverer). Besides mapping fragment ions, HPLC retention times and masses (for example, a loss of 98 Da as signature of phosphorylated peptide) are the key to assign the peptide identity.
A.iiie Higher Order Structures (HOS)
Higher order structures (HOS) of a biologic protein include the secondary, tertiary, and quaternary structures. HOS provide a three-dimensional (3D) confirmation, which plays an important role in its biological function. HOS are considered to be critical quality attributes because changes in HOS may affect efficacy or safety of biologic drugs.
Characterizing HOS of a biologic protein is required by regulatory agencies (for example, USFDA quality by design, QbD and ICH Q5E (ICH HARMONISED TRIPARTITE
GUIDELINE "Comparability of Biotechnological/Biological Products Subject to Changes in Their Manufacturing Process," Q5E, Current Step 4 Version, dated November 18, 2004)). HOS are often required during manufacturing of biologies (for example, comparability evaluations), formulation, stability assessment, and process development. Circular dichroism (CD) spectroscopy (Li et al, JOURNAL OF PHARMACEUTICAL SCIENCES (2011) 100(11): 4642-4654), X-ray crystallography (Harris et al, J. MOL. BIO. (1998) 275: 861- 872), and nuclear magnetic resonance (NMR) (Amezcua et al, JOURNAL OF
PHARMACEUTICAL SCIENCES (2013) 102(6): 1724-1733) are the conventional tools used to analyze HOS of a protein. Hydrogen/deuterium exchange coupled with mass spectrometry (HDX MS) can be used to probe HOS of a biologic. Unlike CD spectroscopy, HDX MS can provide a 3D confirmation of an intact molecule (Engen, ANAL. CHEM. (2009) 81(19): 7870-7875) and a local confirmation of fragments of a biologic, such as peptides (for example, peptide epitopes) (Coales et al, RAPID COMM. MASS SPECTROM. (2009) 23: 639-647).
Advantages of HDX MS over x-ray crystallography and NMR spectroscopy are: 1) that it provides dynamic conformational information of native biologies in solution; 2) that it is unlimited by the size of proteins or biologies being interrogated; and 3) sensitivity (i.e., less material required for HDX MS analysis) (Berkowitz et al., NATURE REVIEWS DRUG DISCOVERY (2012) 11 : 527-540). HOS of biologies also include disulfide bonds, disulfide knots, and glycosylation, which are described in detail in PCT/US2014/059150 and
PCT/US2016/053434, the contents of which are incorporated herein by reference in their entirety.
(i) Disulfide Bonds
Disulfide bonds (-S-S-) primarily control the folding of three-dimensional protein structure, and generally fall into three groups: 1) intra-chain disulfide bonds; 2) inter- chain disulfide bonds; and 3) disulfide knots. In general, intra-chain disulfide bonds stabilize the tertiary structure and the inter-chain disulfide bonds involve in stabilizing quaternary structure. Disulfide knots can improve protein structural stability. Any modifications to the process of producing a biologic (e.g., changes in cell lines, cell culture medium, agitation force etc.) have the potential to cause protein conformational changes due to disulfide bond rearrangements (for example, unpaired or mispaired disulfide bonds). Thus, disulfide bonds are critical structural attributes, which should be monitored for quality control purposes during manufacture or storage of biologic or biologic reference. Intra-chain disulfide bonds occur within a single polypeptide whereas interchain disulfide bonds are formed between two polypeptide chains through oxidation of thio (-SH) groups on cysteine residues. A conventional approach for characterizing disulfide bonds includes comparing reduced and non-reduced peptide maps to help locate disulfide bonds on peptide backbones. A protein sample is digested by enzyme with and without reduction and alkylation to generate two protein digests (for example, protein digest 1 (PD 1) with reduction and alkylation; protein digest 2 (PD2) without reduction and alkylation). PD1 and PD2 are subjected to LC-MS analysis. The disulfide-linking peptide (DSLP) can be found in the PD2 sample using a theoretical mass to locate retention time on LC-MS mass chromatogram. T hen, the sequence of DSLP can be determined using ETD to cleave a disulfide bridge followed by CID to break peptide amide bonds subsequently. As expected, DSLP should not be detected in a PD 1 sample. However, LC-MS analysis cannot differentiate an intra-chain disulfide bond from an inter-chain disulfide bond. The sample can be subjected to SDS-PAGE gel electrophoresis under reduction and non-reduction conditions. For a sample containing an intrachain disulfide bond, no additional band can be discerned in both reducing and non-reducing gels. If it is an inter-chain disulfide bond, two additional bands (lower molecular weights) can be found in a SDS-PAGE gel run under reducing conditions.
If a molecule of interest contains multiple disulfide bonds, the analysis of disulfide bonds is more complicated. Depending on the protein sequence and the locations of disulfide bonds, a strategy of using multiple enzymes and multi-fragmentation techniques to digest proteins into peptides containing only a single disulfide bond is ideal for mapping the disulfide bonds (Wu et al, ANAL. CHEM. (2009) 81(1): 112-122); Wu et al, ANAL.
CHEM. (2010) 82(12): 5296-5303).
(ii) Disulfide Knots Disulfide knots are structural motifs often found in proteins and typically comprise at least three disulfide bonds (six cysteine residues), where one disulfide bond passes through the ring of the other two disulfide bonds. Some therapeutic protein biologies (for example, recombinant human arylsulfatase A) contain disulfide knots, which can be scrambled or shuffled during expression, purification, or storage. It can be difficult to verify a protein bearing disulfide knots with a correct position since there are many ways to arrange a disulfide knot (Ni et al, J. AM. Soc. MASS SPECTROM. (2012) 24: 125-133). Enzymes or CID typically do not cut the peptide backbone disposed within a disulfide knot. The generation of desirable sizes of peptide fragments is important in the successful
characterization of biologies containing disulfide knots.
(in) Glycosylation
Glycosylation is also important for the production of biologies because, for example, more than 90% of the protein drugs such as monoclonal antibodies are glycoproteins.
Furthermore, glycosylation is the most complex post-translational modification, where sugar moieties play roles in protein binding, conformation, stability, and activity (Walsh (2006) supra). Glycosylation can significantly impact the potency, pharmacokinetic, or immunogenicity of a biological drug if any modifications (for example, changing cell lines) occur during the manufacturing process. Additionally, it can be difficult and impractical to produce a homogeneously glycosylated protein. Although the production of biologies is monitored under a good manufacturing process (GMP), heterogeneous species of glycoproteins (for example, different forms of glycan linked with a protein) can only be minimized. Thus, glycosylation is a critical attribute for a therapeutic protein.
Based on the glycosidic linkage between protein and glycan, glycosylation can be grouped into five types: N-linked (glycan attached to the amino group of asparagine ), O- linked (glycan bound to the hydroxyl group of serine or threonine), C-linked (glycan added onto the indole ring of tryptophan), phospho-linked (glycan linked to serine through phosphodiester bond), and glypiation (glycosylphosphatidylinositol anchor linked a phospholipid and a protein through a glycan linkage) (Ni (2012) supra). Among these five glycosylation types, the most common types are N-linked and 0-linked. Characterization of glycosylation involves four steps: 1) glycan removal (known as deglycosylation); 2) glycosylation site determination; 3) peptide sequencing; and 4) glycan analysis.
Deglycosylation is essential for identification of the peptide and the site of glycosylation. After producing the peptide backbone via deglycosylation, glycan attached on the peptide (known as glycopeptide) can be predicted by subtracting the molecular weight of the peptide from that of the glycopeptide.
Based on the type of glycosylation, various approaches can be used to remove glycans from a glycoprotein. For example, PNGase F can remove most N-linked glycans except for a fucose-a(l-3) bound to the Asn-GlcNAc linkage. N-glycosidase A can be used to release oligosaccharides containing an a(l-3) fucose core. There is no enzyme like PNGase F that can remove "intact" O-linked glycans. Rather, the removal of O-linked glycans can be achieved using a series of exoglycosidases to hydrolyze various types of monosacchrides until only the Gal-J3(l,3)-GalNAc core remains. 0-glycosidase (endo-a-N- acetylgalactosamindase) can then release the Gal-J3(l,3)-GalNAc core structure from the serine or threonine residues (Iwase et al, METHODS IN MOLECULAR BIOLOGY (1993) 14: 151-159). Determination of glycosylation site can be accomplished in parallel with peptide sequencing because N-linked asparagine or O-linked serine/threonine residues are the known as glycosylation sites. For glycan analysis, glycan can be collected after enzymatic or chemical digestion. A.tiLg Critical Quality Attribute (CQA) Mapping
Critical Quality Attributes (CQAs) refer to key features in the structure of a biologic that are critical to its stability, safety, purity, efficacy, and/or potency. A CQA mapping study generates a map of such features in a given biologic. The generated CQA map identifies, among other things, what structural features of the given biologic are critical for efficacy, what substructures are susceptible to degradation or modification, and what areas may induce aggregation.
Accordingly, a CQA map for a target biologic directly informs development of reproducible manufacture and control protocols, and permits the prediction of the effect of selection of particular starting materials or manufacturing process steps. Such a map also provides developers, manufacturers, and potentially regulators, with both prospective and retrospective assurances that product quality specifications will be and have been met, and permits the development and production of consistent quality product. The map also can inform development of in-process testing, release testing, process monitoring,
characterization and/or comparability testing, and regulatory program criteria and design relating to both the most relevant and least relevant structural features of the biologic.
In certain embodiments, a CQA map is generated by characterizing the target biologic directly. In certain embodiments, a CQA map for a target biologic is determined by characterizing a reference biologic that is representative of the target biologic. Accordingly, generating a CQA map for a target biologic may comprise performing the steps described below, and in additional detail in PCT/US2014/059150, the contents of which is incorporated herein by reference in its entirety, directly on the target biologic itself, or on a reference biologic for the target biologic.
Generating the CQA map for a target biologic includes, in a first phase, determining structural features of the target biologic and/or a reference biologic. Such structural features include, without limitation, a molecular weight of the target and/or reference biologic, a primary structure of the target and/or reference biologic, identification and/or quantification of amino acid modifications, identification and/or quantification of various post-translational modifications, and determination of HOS of the target and/or reference biologic. Such structural features may be determined via appropriate structural characterization studies, as described herein as well as in PCT/US2014/059150 and PCT/US2016/053434, the contents of which are incorporated herein by reference in their entirety.
In certain embodiments, in a second phase, the target and/or reference biologic is subjected to various conditions that (a) stress the molecule potentially to result in its modification, degradation, denaturation, contamination, instability or aggregation, and/or (b) assess the efficacy/safety of the target and/or reference biologic including in cell-based, in vivo or clinical assays in order to determine overall stability and/or efficacy/safety profiles. Thus, for example, the target and/or reference biologic is subjected to high temperature, physiological temperature, light, pH changes, enzymes that are commonly present in production broths, lyophilization and reconstitution, changes in ionic environment, mechanical stresses such as filtration and other separation techniques, accelerated aging conditions, and/or conditions that the molecule might encounter in vivo, such as physiological temperature, and various biomolecules (e.g., proteases), ions, or dissolved gases.
The as-stressed target and/or reference biologic, and/or its derivatives, fragments, or degradation products are themselves analyzed, for example, to determine the effect, if any, of the stress on its structure. The structures then are compared to that of the intact, target and/or reference biologic, and optionally, as may be necessary, activity assays are conducted on the intact and/or as-stressed molecular species which exhibit an alteration in structure. This process results in (for example, by mass spectrometric analysis) data indicative of the structures of the as-stressed reference biologic, and optionally derivatives, fragments, or degradation products thereof. The data then can be analyzed computationally to determine which operational parameters used in the expression, purification, formulation or storage of the reference biologic result in or pose a risk of degrading, modifying, or contaminating the biological.
Thereafter, these data are used to create a record of structural features of the biologic, that is, a map which identifies and characterizes parts of the molecular structure at risk of modification when exposed to various conditions. The generated CQA map can reveal which particular attributes or modifications affect the molecule's stability or activity, which attributes or modifications are innocuous, and what specific conditions induce the modifications. The map thus allows determination of (a) which attributes actually relate to and/or impart function, or not, and (b) which processing parameters degrade or risk degrading the structural features of the biologic known to be material to its function and safety. Stated differently, the map enables determination of which structural features of the protein matter and which don't, which features are susceptible to alteration or degradation caused by selection of particular raw materials used and/or processing steps employed in its expression, purification, or formulation, and from this directly informs downstream decisions in development, regulatory manufacturing, and Quality Assurance (QA)/Quality Control (QC) processes. Further details regarding CQA mapping are described in PCT/US2014/059150, the contents of which is incorporated herein by reference in its entirety.
A.iilh In vivo Comparability Profile
When a biologic is administered to a subject, various structural features of the biologic may or may not be modified during its in vivo residence within the subject.
An in vivo comparability profile provides a measure of in vivo comparability - that is, a degree of structural similarity between two biologic molecules and/or their metabolites after a period of in vivo residence in a subject. In particular, an in vivo comparability profile is a compilation of comparative data (including statistically processed data) indicating a structural feature or set of structural features of a biologic molecule and/or one or more of its metabolites after a period of in vivo residence in a subject when compared to the same structural feature or set of structural features in a reference biologic molecule and/or one or more of its metabolites.
Determining an in vivo comparability profile of a target biologic comprises obtaining data indicative of the structure of the target biologic or a metabolite thereof following its extraction from a sample (e.g., a cell; e.g., a tissue; e.g., a bodily fluid) removed from a subject at a specific time interval following administration of the target biologic to the subject. The data indicative of the structure of the target biologic is compared to data indicative of the structure of a reference biologic the structure of the reference biologic drug or a metabolite thereof generated through analysis of a sample taken at the same time interval from a subject to whom the reference biologic drug had been administered, thereby to produce an in vivo comparability profile of the candidate biologic drug to the reference biologic drug.
Structural features of the target biologic and the reference biologic can be analyzed following in vivo residence using any of the approaches described above.
In addition to analyzing the effect of in vivo residence on structure of the
biologic, it can be helpful to determine if the structural changes result in a measureable effect on safety, efficacy and potency. The structural changes may vary from having little or no effect on function to having profound effect on function. As a result, the biological information can further define those aspects of the structure of the candidate and reference biologies that are critical to safety, purity, and/or potency.
Once the structural and the optional function features of the candidate and reference biologies have been determined, the resulting data can be analyzed to determine the effect of in vivo residence on the biologic. The data preferably is analyzed computationally using a conventional computer or computer system to identify structural changes, and to identify critical structure or substructures of the biologic.
This information can then be assimilated to produce a comparative chart that identifies and may quantitatively or statistically compare one or more structural features, or groups of structural features, of the candidate and reference biologies that are modified in vivo, which may affect the function of the biologic, and/or which may have limited or no effect on the function of the biologic. The in vivo comparability profile may include data indicative of whether a feature is more or less common in the candidate biologic drug or a metabolite thereof, relative to the reference biologic drug or a corresponding metabolite thereof. For example, the profile may indicate that the candidate biologic drug is phosphorylated at a different amino acid residue than in the reference biologic drug, or it may indicate that the candidate biologic drug is phosphorylated more or less often at a particular residue than in the reference biologic drug. Similarly, structural similarities between the candidate biologic or a metabolite thereof and a reference biological or a corresponding metabolite thereof can be used to support the comparability and similarity of the candidate biologic and the reference biologic. Further details regarding determination of in vivo comparability profiles are described in PCT/US2016/053434, the contents of which is incorporated herein by reference in its entirety.
A.iili Biosimilar/Reference Comparison and Lot Comparison A biosimilar/reference comparison study aims to demonstrate biosimilarity between a target biologic and a reference biologic, also referred to as an "originator biologic" within the context of demonstrating biosimilarity. Biosimilar/reference comparison studies leverage analytical methods to structurally characterize both the target biologic and the reference biologic and compare structural features of the target biologic with structural features of the reference biologic. Structural characterization of the target and reference biologies may be carried out via any combination of one or more of the structural characterization studies described in Sections A.iii.a - A.iii.h above.
Biosimilar/reference comparison studies address the expiration of originator biologic drug patent protection and the advent of the biosimilar 351(k) approval pathway, which have triggered a number of changes to the characterization and the regulation of these products. Previously, Biologic License Application (BLA) reviews have primarily focused on the manufacturing process to ensure consistent production of the "same" product through control and validation of process conditions. Because biologic drugs are produced in living cells, small changes in the production conditions can results in larger changes to the drug substance that may impact safety or efficacy. Thus, biosimilar manufacturers may not be able to exactly replicate the originator's manufacturing process for a variety of reasons. Accordingly, demonstration of biosimilarity to the originator biologic needs to be demonstrated through analytical similarity assessments - that is, the structural characterization of target and a reference biologies determined through the use of analytical methods as described above.
A draft guidance document from the US FDA states: "A meaningful assessment as to whether the proposed biosimilar product is highly similar to the reference product depends on, among other things, the capabilities of available state-of-the-art analytical assays to assess, for example, the molecular weight of the protein, complexity of the protein (higher order structure and post-translational modifications), degree of heterogeneity, functional properties, impurity profiles, and degradation profiles denoting stability." (see U.S.
Department of Health and Human Services, Food and Drug Administration, Center for Drug Evaluation and Research (CDER), Center for Biologies Evaluation and Research (CBER), "Quality Considerations in Demonstrating Biosimilarity of a Therapeutic Protein Product to a Reference Product, April, 2015). This guidance indicates that the design of structural characterization studies for determining structural similarity between a target biologic and a reference corresponding to an originator biologic is a key step for demonstrating analytical similarity and defining residual uncertainty, both in the choice of analytical methods (and the specific analytical stages they comprise) implemented and in the interpretation of the resulting data. Statistical tools for assessing biosimilarity from analytical data obtained via use of analytical methods to characterize a biologic are typically the same ones used to demonstrate comparability between pre- and post-change commercial processes, and include tests of statistical equivalence, statistical tests for differences, statistical intervals, and visualization and summary statistics. Thus, structural characterization of biosimilars and comparisons with the structural characterization of their reference products using analytical methods that comprise various analytical stages such as those described in A.i and A.ii is an essential component of the development and approval process.
A lot comparison study aims to compare structural features of two or more biologies manufactured in different lots. In the lifecycle of a typical biologic product, manufacturing changes to the drug substance and/or the drug product are common. The manufacturer is responsible to demonstrate adequate and appropriate comparability between pre-change and post-change product lots. Lot comparison studies provide structural characterization data that allows a manufacturer of a biologic to demonstrate that the changes (for example, changes in upstream processing such as cell lines or cell culture conditions, or changes in downstream processing such as purifications resins, buffers, or formulation, or changes of manufacturing facility or equipment) do not have an adverse effect on the quality, safety and efficacy of the manufactured biologic products. Examples of product quality attributes that may be impacted by manufacturing changes include: glycosylation profile, charge distribution, product impurities, process impurities, aggregates, particulate matters, stability and potency. As with biosimilar/reference comparison studies, any combination of one or more of the structural characterization studies described in Sections A.iii.a - A.iii.h above may be used to obtain structural characterization data used to demonstrate that relevant structural features of biologies are maintained from lot to lot. Accordingly, determining appropriate study designs via selection of appropriate combinations analytical stages, such as those described in Sections A.i and A.ii above, for characterization of biologies produced in different lots is an essential component of the development and approval process.
B. User Input
In certain embodiments, the study design and method capture technology described herein receives as input a user query comprising information regarding (i) the target biologic and (ii) the particular desired structural characterization.
B. i Target Biologic Input
In certain embodiments, the information regarding the target biologic comprises at least a portion (up to all) of a nominal primary structure of the target biologic. The nominal primary structure is the nominal amino acid sequence that is coded for via the nucleic acid sequence (e.g., a DNA sequence; e.g., an RNA sequence) used to produce the target biologic. The nominal amino acid sequence may cover the full target biologic, or a portion thereof, such as a specific fusion portion of a fusion protein to be analyzed.
In certain embodiments, a user provides the nominal primary structure via a text file that comprises an ordered list of the nominal amino acid sequence of the target biologic. In certain embodiments, other forms of input are used. For example, a user may input a reference number to a public or proprietary database. In certain embodiments, information regarding the target biologic comprises results from previously carried out structural characterization of the target biologic or from generally known information about the biologic (e.g., a molecule type). For example, if a primary structure of the target biologic has been previously characterized via an experimental study, then a user may input the measured primary structure of the target biologic (e.g., in addition to the nominal primary structure; e.g., instead of the nominal primary structure). In certain embodiments, other results from other types of previously carried out studies are included as information about the target biologic. For example, previous measurements of disulfide linkages or glycosylation of the target biologic may be included in the user input query. As shown in FIG. 2A, for example, such information (e.g., glycans, post-translational modifications, disulfide bonds) may be directly input by the user.
In certain embodiments, the tool receives as input additional information about the target biologic, such as an identifier of a molecule type. Molecule types that may be identified include, but are not limited to, a recombinant protein, a fusion protein, a monoclonal antibody, and an antibody-drug conjugate. A user may input an identifier of a molecule type through a variety of user interactions. For example, molecule type may be identified via a textual label that refers to one or more entries in a molecule type dictionary stored in memory. For example, the label "Fc-fusion" may be used to identify an Fc-fusion protein molecule type. Accordingly, a user may thus provide text input identifying the molecule type at a command line prompt. User input of a molecule type may also be provided via a GUI. For example, a user may select one or more molecule types from a drop down list, or other types of graphical control elements (e.g., radio boxes, check boxes, and the like).
In certain embodiments, the tool receives as input additional information about the bioprocess used to produce the target biologic. Bioprocess information may include an identifier of a cell culture type that is used to produce the target biologic. A user may input an identifier of a cell culture type through a variety of user interactions. For example, a cell culture type may be identified via a textual label that refers to one or more entries in a cell culture type dictionary stored in memory. For example, the label, "CHO", may be used to identify that a Chinese Hamster Ovary cell culture was used to produce the target biologic, while the label, "E-coli", may be used to identify that the target biologic was produced using an E-coli cell culture. In certain embodiments, various levels of specificity are used to identify a cell culture type. For example, a cell culture type may be identified as bacterial or not, or mammal or not. In certain embodiments, a user provides text input identifying the cell culture type at a command line prompt. User input of a cell culture type may also be provided via a GUI. For example, a user may select one or more cell culture types from a drop down list, or other types of graphical control elements (e.g., radio boxes, check boxes, and the like). Bioprocess information may also include information about purification stages subsequent to harvesting the cell culture supernatant. For example, bioprocess information may identify purification steps such as of Protein A initial purification followed by hydrophobic interaction chromatography. Such steps may be identified by various parameters, such as textual labels of particular purification steps.
B. ii Study Class Input
In certain embodiments, the tool also receives as input one or more study class attributes that serve as identifiers of one or more particular desired structural
characterizations.
A variety of different structural characterizations are relevant for analysis of biologies. For example, for a given target biologic of interest, a user may be interested in determining one or more of (i) molecular weight, (ii) primary structure, (iii) identification, site localization, and/or quantification of various specific post-translational modifications, (iv) characterization of higher order structures (HOS) (e.g., secondary structures; e.g., tertiary structures; e.g., quaternary structures), (vi) a map of critical quality attributes (CQAs), and (vii) an in vivo comparability profile. In certain embodiments, a user may also be interested in (viii) a biosimilar to reference product comparison, and/or (ix) a lot comparison study (e.g., a lot release study; e.g., lot-to-lot comparison study).
For each of (i) - (ix) above, as well as other types of structural characterization studies, a particular set of data is measured for the target biologic, and, accordingly, a particular sequence of analytical methods is used to obtain the desired measurement data. In order to allow an appropriate study design result to be determined for a given desired type of structural characterization, study class attributes are used to identify one or more types of structural characterization studies. In certain embodiments, the tool uses a dictionary of predefined study class attributes to allow a user to select the particular one or more types of structural characterizations that they would like to obtain for a target biologic. The user may input the study class attributes by inputting specific keywords at a command prompt for entering an input query or selecting options via a GUI (e.g., from a dropdown menu; e.g., via check-boxes).
For example, a user may enter the text "molecular weight" to indicate that the desired structural characterization that they are interested in is a measurement of a molecular weight of the target biologic. Similar textual labels may be used as study class attributes in order to identify any of the various types of structural characterizations [e.g., molecular weight, e.g., primary structure; e.g., amino acid modifications (e.g., N-terminal modifications; e.g., C- terminal modifications; e.g., oxidation; e.g., deamidation, isomerization, and racemization); e.g., post-translational modifications (e.g., methylation and acetylation; e.g.,
phosphorylation); e.g., higher order structures (e.g., secondary structure motifs; e.g., tertiary structure; e.g., quaternary structure; e.g., disulfide bonds; e.g., disulfide knots; e.g., glycosylation); e.g., epitope mapping; e.g., biosimilar comparison; e.g., lot comparison; e.g., CQA mapping; e.g., in vivo comparability profile].
In certain embodiments, multiple study class attributes may be entered. In certain embodiments, study class attributes are combined, to add further specificity to a type of desired structural characterization. For example, additional attributes such as
"characterization" and/or "quantification" can be used to specify whether a characterization (e.g., a map identifying locations and/or nature of specific types of structural features) is desired, or a quantification study is desired. Additional examples of study class attributes include combinations of structural characterization studies together with study objectives. For example, additional attributes such as "lot release testing" can be combined with
"glycosylation analysis" to indicate the purpose of the study. These combinations provide additional specificity with direct impact on study design and method development, as will be illustrated in the detailed examples that follow.
C. Determination of Generalizable Biologic Attributes
In certain embodiments, the study design and method capture technology described herein determines a set of GBAs for the target biologic. The set of target biologic GBAs is determined from received input corresponding to information about the target biologic, such as the nominal primary structure of the target biologic. A variety of GBAs may be determined from a target biologic' s nominal primary structure.
For example, numbers or fractions of specific amino acids (e.g., Arginine (also referred to as Arg, or R); e.g., Lysine (also referred to as Lys, or K), numbers or fractions of particular types of amino acids (e.g., aromatic amino acids; e.g., hydrophobic amino acids; e.g., hydrophilic amino acids) can be determined and included in the set of target biologic GBAs.
In certain embodiments, characteristics of expected di-sulfide bridges and knots (e.g., both intra-chain and inter-chain) are predicted using a target biologic' s molecule type or may be known a priori from other sources. Characteristics of expected di-sulfide bridges and knots include a predicted nature and/or location of di-sulfide bridges and or knots, as well as an extent of disulfide bridges and/or knots in the target biologic (e.g., number of disulfide bridges and/or knots; e.g., pairs of nested cysteines, unpaired cysteines, and the post- translational modification of cysteine to formylglycine). In certain embodiments, these determined characteristics of di-sulfide bridges and/or knots are included in the set of target biologic GBAs.
In certain embodiments, metrics characterizing predicted results of various enzymatic digestion steps are determined and used as target biologic GBAs. For example, in certain embodiments, a number of or average size of (e.g., monoisotopic molecular weight of digest fragments; e.g., average molecular weight of digest fragments; e.g., sequence length expressed in terms of number of amino acids) peptide fragments resulting from one or more specific enzymatic digestion steps can be predicted using the target biologic' s nominal primary structure, and included in the set of target biologic GBAs. Likewise, statistical distributions (e.g., frequency of fragment lengths or weights) of peptide fragments resulting from one or more specific enzymatic digestion steps can be predicted using the target biologic' s nominal structure, and included in the set of target biologic GBAs. In certain embodiments, these predictions of enzymatic digests may be built on deterministic models of enzyme specificity. In other embodiments, these predictions of enzymatic digests may be built on stochastic models of enzyme activity. In certain embodiments, where a molecule type of the target biologic is received as input, the molecule type is used as a target biologic GBA (e.g., included in the set of target biologic GBAs).
Accordingly, in certain embodiments, the set of target biologic GBAs includes GBAs that characterize structural elements that either do or are likely to occur in the target biologic. These structural elements influence the applicability of particular analytical methods to the target biologic. Accordingly, for a given target biologic, the tool utilizes a determined set of target biologic GBAs to identify analytical methods that can be used to obtain a desired structural characterization of the target biologic, as identified by the input study class. In particular, as described in the following, a set of target biologic GBAs is compared with GBAs associated with analytical method records stored in the method store in order to determine the sequences of analytical methods that form the study design results.
In certain embodiments, GBAs of a target biologic are determined via a preprocessing step, which may be implemented by a preprocessing module, as shown in FIG. 2A. The preprocessing module may be the same preprocessing module that is used to determine GBAs of known biologies, for inclusion in the method store as described below, and illustrated in FIG. 1A and FIG. IB. FIG. IB shows an example of output from a preprocessing module, that illustrates several GBAs including identification of cleavage sites by particular enzymes in a protein primary sequence, identification of N-linked glycosylation motifs, digest fragment statistics and mass tabulations, amino acid statistics, and various physiochemical properties.
In certain embodiments, GBAs of the target biologic include one or more study class attributes, input by the user as described above. In certain embodiments, GBAs of a given target include one or more bioprocess attributes that represent parameters of a biomanufacturing process used to produce the given biologic. Such bioprocess attributes may be input directly by a user, as described above. D. Method Store
FIG. ID is a block flow diagram illustrating an exemplary process 140 for populating a method store. At step 142, an analytical stage record is created. The analytical stage record corresponds to a specific analytical stage that has been implemented as a step of an analytical method used in an analytical study for structural characterization of an associated known biologic. At step 144, the analytical stage record is stored in the method store. At step 146, one or more GBAs of the associated known biologic are stored in the method store. At step 146, the one or more known biologic GBAs are linked with the analytical stage record.
D.i Analytical Method Records
In certain embodiments, the method store is a database that stores a plurality of analytical method records and/or a plurality of analytical stage records. In certain embodiments, each analytical method record is a data structure that represents a particular analytical method that was (e.g., previously) applied to characterize a specific associated biologic. Similarly, in certain embodiments, each analytical stage record is a data structure that represents a particular analytical stage that was (e.g., previously) applied (e.g., as part of an analytical method) to characterize a specific associated known biologic.
An analytical stage record includes data representing (i) an identifier of the particular analytical stage it represents and (ii) a series of parameter values that were used in the application of the analytical stage that the analytical stage record represents. For example, an analytical stage record representing an enzymatic digestion stage may be represented by a data structure such as the one shown in FIG. 3. The analytical stage record, "AS 1 ", includes a series of field/value pairs that identify the particular type of analytical stage along with a set of parameters used in application of the stage. In particular, the "stage" field stores a text label (e.g., a string value) that identifies the particular stage. In the example of FIG. 3, the label "single digest" identifies a single digest enzymatic digestion stage. Example parameter fields shown in FIG. 3 include an "enzyme map" field that stores a text label identifying the particular type of enzyme used (e.g., trypsin in the case of FIG. 3), an "incubation time field" that stores a value specifying the amount of time (e.g., in minutes; e.g., via a numeric value) of incubation, an "incubation temp" field that stores a value specifying a temperature at which incubation was performed (e.g., in degrees Celsius; e.g., via a numeric value), and an "incubation pH" field that stores a value specifying a pH at which the incubation was performed (e.g., via a numeric value).
FIG. 1C and FIG. 4 shows example hierarchical organizations of various analytical stages, along with relevant parameters for various stages that may be represented by analytical stages records. In certain embodiments, various other analytical stages may be utilized and represented by analytical stage records.
An analytical method record may include a sequence of analytical stage records representing the sequence of analytical stages used in the analytical method that the analytical method record represents.
Accordingly, the method store may store information about analytical methods used to characterize known biologies by virtue of a plurality of analytical method records, each of which represents a particular analytical method comprising a sequence of analytical stages, or via a plurality of individual analytical stage records, each of which represents an analytical stage used in an analytical method applied in characterizing a known biologic. Approaches for linking GBAs with analytical method records, described below, may be applied equivalently to analytical stage records. Similarly, machine learning techniques described for identifying groups of related analytical methods represented by analytical method records, may be applied similarly, to identify groups of related analytical stages represented by analytical stage records.
An example hierarchical organization 400 of various analytical stages is shown in FIG. 4. The organization 400 includes a sample preparation stage 410, a digestion strategy stage 420, a separation stage 430, a detection stage 440, and a mass spectrometry stage 450. Each stage can be associated with an analytical stage record which includes detailed process step information and parameters.
D.ii Association of Analytical Stage and Method Records with GBAs
In certain embodiments, in order to allow patterns in different analytical methods and analytical stages thereof to be identified and used to determine analytical method results to be included in various study design results for characterizing target biologies, each analytical method record is associated (e.g., linked, as in a stored association in computer
code/memory) with one or more (e.g., a plurality of) GBAs of the biologic with which it is associated. Similarly, in certain embodiments, analytical stage records are stored in the method store, and each analytical stage record is associated (e.g., linked, as in a stored association in computer code/memory) with one or more (e.g., a plurality of) GBAs of the biologic with which it is associated.
An example process for building a method store, including the determination of GBAs is shown in FIG. 1 A and FIG. IB. As shown in the figure, a preprocessing step ("Biologic molecule pre-processing") is used to determine, for a given known biologic, a set of GBAs. The set of GBAs may be determined from an amino acid sequence of the known biologic. Additionally, because the known biologic has been characterized, additional information such as experimentally characterized glycans, post-translational modifications, disulfide bonds, and the like, may also be used to determine GBAs of the known biologic.
In certain embodiments, for a given analytical method record, the one or more GBAs associated with the analytical method record are determined and stored when an analytical method record is created. Once the GBAs associated with a particular analytical method record are determined, they may be stored within the analytical method record, or elsewhere, and linked with the analytical method record. FIG. 5 shows an example organization of analytical method records and their links to GBAs in the method store. GBA sets 502, 504, 506, and 508 are linked with analytical methods 512, 514, 516, and 518, respectively. As will be described below, each analytical method record is not necessarily linked to a unique set of GBAs.
In certain embodiments, once the GBAs associated with a particular analytical method record are determined it is no longer necessary for any record of an identification of a specific target biologic associated with the analytical method record to be maintained. Accordingly, in certain embodiments, a given analytical method record in the method store is linked to one or more GBAs of an associated biologic, but other identifying information of the associated biologic (e.g., a nominal primary structure of the associated biologic; e.g., any measured data from a study performed on the associated biologic) is not stored. In this manner, in certain embodiments, linking a given analytical method record with GBAs of an associated biologic provides for safeguards relevant to data security and confidentiality considerations.
In certain embodiments, an analytical method record stores, or is linked to information about the associated biologic (e.g., a nominal or measured primary structure of the associated biologic; e.g., a molecule type of the associated biologic). In this manner, one or more GBAs for the analytical method record can be determined as needed. Accordingly, in certain embodiments, if new or different sets of GBAs are relevant for a given application (e.g., addressing a given user input query), they can be determined using the stored or linked information about the associated biologic.
The same approaches described above for linking GBAs with analytical method records can be applied similarly to link GBAs with analytical stage records. In particular, In certain embodiments, for a given analytical stage record, the one or more GBAs associated with the analytical stage record are determined and stored when an analytical method record is created. Once the GBAs associated with a particular analytical stage record are determined, they may be stored within the analytical stage record, or elsewhere, and linked with the analytical stage record. As will be described below, each analytical stage record is not necessarily linked to a unique set of GBAs.
In certain embodiments, once the GBAs associated with a particular analytical stage record are determined it is no longer necessary for any record of an identification of a specific target biologic associated with the analytical stage record to be maintained. Accordingly, in certain embodiments, a given analytical stage record in the method store is linked to one or more GBAs of an associated biologic, but other identifying information of the associated biologic (e.g., a nominal primary structure of the associated biologic; e.g., any measured data from a study performed on the associated biologic) is not stored. In this manner, in certain embodiments, linking a given analytical stage record with GBAs of an associated biologic provides for safeguards relevant to data security and confidentiality considerations.
In certain embodiments, an analytical stage record stores, or is linked to information about the associated biologic (e.g., a nominal or measured primary structure of the associated biologic; e.g., a molecule type of the associated biologic). In this manner, one or more GBAs for the analytical stage record can be determined as needed. Accordingly, in certain embodiments, if new or different sets of GBAs are relevant for a given application (e.g., addressing a given user input query), they can be determined using the stored or linked information about the associated biologic.
In certain embodiments, an analytical method record stores, or is linked with GBAs corresponding to study class attributes that identify the particular type of structural characterization study and/or study objective in which the analytical method that the analytical method record represents was applied. Similarly, in certain embodiments, an analytical stage record stores, or is linked with GBAs corresponding to study class attributes that identify the particular type structural characterization study and/or study objective in which the analytical stage that the analytical stage record represents was applied.
In certain embodiments, the study class attributes are textual labels. In particular, in certain embodiments, the study class attributes are selected from a set of predefined study class attributes (e.g., stored in a dictionary). For example, an analytical stage record such as the analytical method record shown in FIG. 3 may also include or be linked with the name value pair:
"study class": "di-sulfide linkage";.
The name/value pair accordingly indicates that the particular analytical stage represented by the analytical stage record represents an analytical stage (e.g., a trypsin digestion step) that was used in a study characterizing di-sulfide linkages in an associated biologic. An analytical method record representing an analytical method used in a study characterizing di-sulfide linkages in an associated known biologic might store, or be linked with a similar name/value pair.
In certain embodiments, analytical method records and/or analytical stage records are linked with, but need not store study class attributes. For example, links between (i) analytical method records and/or analytical stage records and (ii) various study class attributes may be established via a series of lists (e.g., represented as arrays), each corresponding to a particular study class attribute. A list corresponding to a particular study class attributes may include identifiers (e.g., textual labels; e.g., pointers) of (i) analytical method records that represent analytical methods used in the particular type of structural characterization study that the corresponding particular study class attribute identifies and/or (ii) analytical stage records that represent analytical stages used in the particular type of structural characterization study that the corresponding particular study class attribute identifies.
In certain embodiments, an analytical method record stores or is linked to GBAs representing additional information about the bioprocess used to produce the associated biologic that the analytical method represented by the analytical method record was used to characterize. In certain embodiments, an analytical stage record stores or is linked to GBAs representing additional information about the bioprocess used to produce the associated biologic that the analytical stage represented by the analytical stage record was used to characterize.
As with bioprocess information about a target biologic queried by a user, bioprocess information about an associated biologic may include an identifier of a cell culture type that was used to produce the associated biologic. In certain embodiments, a cell culture type may be identified via a textual label, and stored as a name/value pair in the analytical method record. In certain embodiments, analytical method records and/or analytical stage records are linked with, but need not store particular cell culture types. For example, links between (i) analytical method records and/or analytical stage records and (ii) various cell culture types may be established via a series of lists (e.g., represented as arrays), each corresponding to a particular cell culture type. A list corresponding to a particular cell culture type may include identifiers (e.g., textual labels; e.g., pointers) of (i) analytical method records that represent analytical methods applied to biologies produced using the particular cell culture type and/or (ii) analytical stage records that represent analytical stages applied to biologies produced using the particular cell culture type. In certain embodiments, other additional bioprocess information may be stored in or linked with analytical method records and/or analytical stage records in a similar fashion. For example, additional bioprocess information may include information about primary recovery, initial purification, polishing, and/or formulation stages of the biomanufacturing process used in the manufacture of the biologic molecule.
D. in Performance Indices
In certain embodiments, an analytical method record may also include (e.g., be linked with or comprise) one or more performance indices that quantifies performance of the analytical method that the analytical method record represents. For example, performance indices associated with particular analytical method records may include values representing a percent coverage of the biologic molecule's primary amino acid sequence, a sensitivity of detection of specific modifications (e.g., threshold quantification of modified peptides as a percentage of the corresponding unmodified peptides), a sensitivity of detection of sequence variants (e.g., as a percentage of the unmodified sequence), and a sensitivity of detection of impurities such as host cell proteins (e.g., in parts per million or ppm). As with analytical method records, in certain embodiments, analytical stage records may be linked with or comprise performance indices. Performance indices linked with or comprised in a given analytical stage record may be performance indices of an analytical method comprising the analytical stage that the given analytical stage record represents, or the performance indices may be representative of the analytical stage record itself. D. iv Populating the Method Store
The analytical method records and/or analytical stage records in the method store may be obtained from a variety of sources. For example, analytical method records and/or analytical stage records may be created from publicly available records of studies (e.g., published literature using article databases such as PubMed
(h Hp : //w w . Ϊ icbi . n 1m. nih . ov /puhmedQ ; e.g., public data repositories such as the
Proteomics Identifications [PRIDE] Archive database (htlps : //ww .ebi■ ac. iik/pride/arclvi ve/) or the UniProtKB/Swiss-Prot database of annotated functional information on proteins 0iitp. Avww. uniprot.org/) or from in-house studies.
In certain embodiments, a given study of a particular biologic may include one or more analytical methods, each comprising multiple analytical stages. Accordingly, a given study can be used to generate one or more analytical method records, each corresponding to a respective analytical method of the given study, as well as multiple analytical stage records, each representing a respective analytical stage of an analytical method used in the given study.
For example, as shown in FIG. 6, a first study 610 ("Study A") employs a single analytical method 612 ("Anal. Meth. Al"), which comprises N analytical stages, including a sample preparation stage ("Anal. Stage Al . l"), an enzymatic digestion stage ("Anal. Stage A1.2"), and a mass spectrometry stage ("Anal. Stage A1JV"). The various analytical stages included in the first study are used to create corresponding analytical stage records that are stored in the method store. Each of the corresponding analytical stage stores, as described above, the particular type of analytical stage that it represents along with the parameters used in application of the stage in the first study ("Study A"). In certain embodiments, the analytical stage records created from the first study ("Study A") also store or are linked with additional information, such as one or more study class parameters that identify the particular structural characterization carried out in the first study, a bioprocess used to produce a biologic characterized in the study, and/or performance indices determined directly within the study, or obtained from data produced by the study. In certain embodiments, an analytical method record representing the analytical method used in the first study is created, and comprises a sequence of analytical stage records representing the stages of the analytical method. As with analytical stage records created from the first study, the analytical method record created from the first study may store or be linked with one or more study class attributes, bioprocess information, and performance indices determined within the study.
Data stored in analytical stage records and/or analytical method records can be obtained from a source that describes a study in a variety of ways. In certain embodiments, the source is a published document, and an analytical stage record and/or analytical method record is created from the source manually, by an expert or a technician who reads and interprets the study, and inputs data stored in the analytical stage record and/or analytical method records manually. In certain embodiments, analytical stage records and/or analytical method records may be created from published documents via automated processing using text mining and/or natural language processing. In certain embodiments, a hybrid combination of interaction with a user and automated processing is used to create analytical stage records and/or analytical method records from published documents. In certain embodiments, analytical stage records and/or analytical method records generated from in- house studies are created in an automated fashion via dedicated software as part of a laboratory information management system. In certain embodiments, data (e.g., type of analytical stage; e.g., parameters; e.g., bioprocess information; e.g., performance indices) stored in analytical stage records and/or analytical method records are directly obtained from a study source. In certain embodiments, data stored in analytical stage records and/or analytical method records is computed from data in a study source. In certain embodiments, GBAs are determined for one or more known biologies characterized in a study, and linked with the analytical stage records and/or analytical method records created from the study. In particular, for a given known biologic characterized in a given study, GBAs may be determined based on information about the known biologic and the particular study. For example, as shown in FIG. 6, GBA sets 632, 634, and 636 are linked to analytical methods 612, 622, and 624, respectively. In certain embodiments, a nominal primary structure of the known biologic is used to determine a variety of GBAs in a manner similar to that described above with regard to determination of GBAs from nominal primary structure of a target biologic.
In certain embodiments, known information about the known biologic is used to determine GBAs. Such known information may include results of the study itself, other information included in the study source, information from other studies performed on the same known biologic, and generally available information, such as information stored in public databases. For example, results of the study itself may be used to determine GBAs of the known biologic. For example, a particular structural characterization study that identifies glycosylation sites of a particular known biologic can be used to directly provides GBAs corresponding glycosylation sites. Such GBAs may be used instead of, or in addition to GBAs corresponding to predicted glycosylation sites.
In certain embodiments, a particular study employs multiple analytical methods to characterize a given known biologic, each analytical method comprising a sequence of analytical stages. For example, as shown in FIG. 6, a second study 620 ("Study B") includes two analytical methods 622 and 624. An analytical stage record representing each analytical stage used in each analytical method of the study ("Study B") may be created and stored in the method store. Similarly, an analytical method record can be created for each analytical method of the study. Each analytical stage record, and analytical method used in the characterization of the known biologic of the study can be linked with GBAs determined for the known biologic that they were used to characterize, as described above.
In certain embodiments, a particular study source comprises multiple studies, performed on multiple known biologies. Accordingly, analytical stage records and/or analytical method records can be created for each analytical stage and/or method of each study, and linked with GBAs of the known biologic that they were used to characterize.
E. Identification of Analytical Study Parameters
In certain embodiments, the systems and methods describe herein utilize (e.g., via a machine learning approach) the method store to determine one or more study design results in response to a user query. In particular, the approaches described herein use the GBAs associated with analytical method records and/or analytical stage records of the method store and determined GBAs of a target biologic to identify relevant analytical method records and/or analytical stage records. Relevant analytical method records and/or analytical stage records can be used to determine analytical stage results that represent specific analytical stages to be applied to the target biologic in its characterization. Sequences of determined analytical stage results may be combined to form analytical method results, representing specific analytical methods to be applied. Determined study design results may include a single analytical method result, or a combination of two or more analytical method results.
FIG. 7 is a block flow diagram illustrating an example process 700 for determining study design results. In certain embodiments, a user input is received 710. As described above, the user input comprises information about the target biologic along with one or more study class parameters that identify particular desired structural characterizations of the target biologic. In another step, target biologic GBAs are determined 720 using the information about the target biologic input by the user. For example, as described above, the information about the target biologic may comprise a nominal primary structure from which target biologic GBAs are determined. In certain embodiments, once the target biologic GBAs are determined, the method store is accessed, and the target biologic GBAs are compared with known biologic GBAs linked to analytical method records and/or analytical stage records of the method store in order to determine study design results 730.
FIG. 2A illustrates an example interaction 200 between various components (e.g., modules, databases, and data elements) in determining study design results. A received user input 202 includes GBAs corresponding to study class attributes ("Study Class"), information about the target biologic, and information about bioprocess parameters.
Information about the target biologic input by the user includes a nominal amino acid sequence of the target biologic ("Nominal AA Sequence"). The biologic preprocessing module 110 determines, using the user input nominal amino acid sequence, a plurality of GBAs corresponding to structural features of the target biologic. Additional user inputs of study class attributes, an identification of the biologic molecule type, and bioprocess parameters are directly used as GBAs of the target biologic. A machine learning module 204 takes the GBAs of the target biologic (e.g., the GBAs determined from the nominal amino acid sequence by the biologic molecule preprocessing module, the study class attributes, the user input molecule type identifier, and the bioprocess parameters) as input. The machine learning module 204 compares the GBAs of the target biologic with GBAs of known biologies for which analytical method records and/or analytical stage records are stored in the method store 120, and, based on this comparison, identifies relevant analytical methods records and/or analytical stage records and uses them to create a study design result. In particular, the machine learning module 204 identifies patterns within the GBAs linked to the analytical method records and/or analytical stage records of the method store, and uses the identified patterns to identify relevant analytical method records and/or analytical stage records based on GBAs of the target biologic. The identified relevant analytical method records and/or analytical stage records can be used to determine analytical stage results of one or more analytical method results, thereby determining a study design result 206.
In this manner, as illustrated in the example 220 in FIG. 2B, from a machine learning perspective, the known biologies 224 act as training examples used to construct the method store 226. The patterns of GBA and analytical method record relationships within the method store 226 act as a "hypothesis set" provided to a learning algorithm 228 when determining the analytical methods 230 for a target biologic (e.g., the "final hypothesis", which is an approximation to the unknown "target function" 222).
In certain embodiments, comparing target biologic GBAs with known biologic GBAs to determine study design results comprises performing classification and/or cluster analysis. For example, as shown in FIG. 8A and FIG. 8B, cluster analysis is used to categorize analytical method records in the method store as belonging to one or more particular analytical method groups based on the set of known biologic GBAs that they are linked to. In particular, in certain embodiments, one or more GBA vectors are determined for each set of known biologic GBAs.
In certain embodiments, a GBA vector is determined as a weighted sum of a subset of a given set of GBAs (e.g., a set of target biologic GBAs; e.g., a set of known biologic GBAs). Different GBA vectors are determined using different weightings and/or different subsets of GBAs. For a given set of GBAs, determined values of each of the GBA vectors identify a point in a multi-dimensional space (e.g., the number of dimensions corresponding to the number of GBA vectors). Accordingly, each set of known biologic GBAs maps to a point in a multi-dimensional space.
FIG. 8A shows a graph illustrating use of cluster analysis to group related analytical method records based on the GBAs with which they are linked. As described above, an analytical method record represents a specific analytical method and is linked with GBAs of an associated known biologies that was characterized using the specific analytical method. In FIG. 8A for each known biologic, values of two or more GBA vectors, GBA; and GBAj, are determined. The determined values of the two or more GBA vectors map each set of known biologic GBAs to a point in the illustrated two-dimensional space, as identified by the green "x's". FIG. 8 A shows three identified analytical method clusters -AMU, AMV, and AMW. As shown in the figure, each cluster corresponds to a region of the two-dimensional space represented via the two GBA vectors.
In certain embodiments, in order to identify relevant analytical methods for a given target biologic, values of the two or more GBA vectors are determined for the set of GBAs of the target biologic. The values of the two or more GBA vectors for the target biologic thus map the GBAs of the target biologic to a point in two or higher dimensional space. In certain embodiments, the target biologic can be identified as belonging to a particular analytical method cluster based on whether it's GBAs map to a point within the region in space to which the analytical method cluster corresponds. Two such examples are shown in FIG. 8 A, wherein the GBAs of two different target biologies map to two different points in two- dimensional space, as indicated by the red "+'s". As shown in the figure, a first target biologic is associated with analytical method cluster AMV, and a second target biologic is associated with analytical method cluster AMU. In this manner, various combinations of analytical methods relevant to a given target biologic may be identified by cluster analysis. Multiple different sets of GBA vectors can be used, either in combination (e.g., such that N GBA vectors define an N-dimensional space) or in multiple rounds (e.g., using a first set of N GBA vectors in a first round and a second set of M GBA vectors in a second round).
Different sets of GBA vectors can be used to define different multi-dimensional spaces and to map various known and target biologies to different points in these spaces based on the specific combinations and weightings of various GBAs from which the different GBA vectors are computed.
In certain embodiments, use of multiple different sets (e.g., in multiple rounds) of GBA vectors is valuable if an analytical method cluster cannot be identified for a given target biologic using a first set of GBA vectors. For example, as shown in FIG. 8B, in certain embodiments, a particular set of GBA vectors maps GBAs of a target biologic to a point that does not fall within any of the regions corresponding to identified analytical method clusters (e.g., as illustrated via the red "x" in FIG. 8B). Accordingly, another set of GBA vectors may be used to associate the target biologic with a particular analytical method cluster.
In certain embodiments, a non-linear transform is used in combination with computation of GBA vectors to separate groups of analytical methods of known biologies into different clusters. FIG. 9 shows an example of an approach wherein a non-linear transform is applied to values of GBA vectors determined for a series of known biologies to determine transformed biologic attribute (TBA) vectors. As shown in the figures, applying a non-linear transform in this manner allows attributes that were previously non-separable (e.g., as shown in the left-hand graph) to be separated via a linear function (e.g., a line in two- dimensional space, as shown in the right-hand figure).
In certain embodiments, the same cluster analysis approaches described herein can be applied to identify analytical stages relevant to a given target biologic. In particular, in manner similar to that described above with respect to identification of analytical methods, GBA vector values determined for known biologies can be used to identify groups of related analytical stages (e.g., as analytical stage clusters) and associated target biologies with particular analytical stage clusters based on a mapping of target biologic GBAs to points in multi-dimensional spaces defined by various sets of GBA vectors. Multiple sets of GBA vectors, and non-linear transformation approaches may also be utilized for identification of relevant analytical stages.
In certain embodiments, various other machine learning approaches may be utilized, in combination with or in place of the cluster analysis approach described with respect to FIG. 8A, FIG. 8B, and FIG. 9. For example, in addition to cluster analysis, a variety of other unsupervised machine learning techniques, such as k-means clustering and self-organizing maps may be used. Unsupervised machine leaming techniques are useful where training data such as the analytical method and/or analytical stage records of known biologies does not include output information that characterized the performance or suitability of a particular represented analytical method or stage (represented by a record in the method store) to a given known biologic. In this manner, unsupervised leaming is viewed as the task of finding patterns and structure in input data - a way of creating a higher-level representation of the data.
In certain embodiments, performance indices included within and/or linked to analytical method records and/or analytical stage records are used as a measure of performance of the particular analytical method or stage that a given analytical method record or analytical stage record, respectively, represents. This allows reinforcement learning machine learning techniques to be used.
In certain embodiments, a through use of the study design and method capture platform described herein, analytical method records and analytical stage records can be generated from corresponding analytical method results and analytical stage results, respectively. Analytical method records and analytical stage records generated in this fashion can be linked with the GBAs of the target biologic that were received as input in order to determine the corresponding analytical method and analytical stage results. Thus, such analytical method and analytical stage records represent examples of desired, or correct, outputs for known inputs - the target biologic GBAs that were used to determine them.
Accordingly, analytical method records and analytical stage records can be used as training data for supervised machine learning techniques, such as artificial neural networks, decision trees, regression modes, and k-nearest neighbor techniques.
In certain embodiments, multiple machine learning techniques are used in
combination. For example, in certain embodiments, an unsupervised machine learning technique (e.g., cluster analysis) is used as a precursor to a supervised machine learning technique.
In this manner, by virtue of the specific manner in which the method store codifies knowledge regarding previously applied analytical methods and stages thereof via analytical method records and/or analytical stage records that are linked with GBAs of known biologies, the systems and methods described herein allow a variety of machine learning techniques to be utilized to determine study design results for characterizing target biologies. F. Study Design Results and Iterative Refinement of Study Design Results
In certain embodiments, once determined, study design results may be provided to a user in a variety of forms. For example, in certain embodiments a study design for complex biologic molecules is typically arrived at iteratively rather than in one simple recipe. For example, the first output (e.g., a first study design) from the study design and method capture technology may represent an interim method, which a human expert can then examine and adjust. In certain embodiments, the iterative adjustment of study design results by a human expert can provide a basis for method research and improvement. For example, once a given study design result is determined, and then adjusted by the human expert, analytical method results and/or analytical stage results of the adjusted study design result may be extracted, and stored as analytical method records and/or analytical stage records in the method store. In this manner, the technology "leams" a new method to be applied to a new molecule, and settings for the prior methods may then be re-adjusted accordingly. For example, as described above in Section E, analytical method records and/or analytical stage records generated from previously determined study design results can be used as training data for supervised machine learning techniques.
G. Constructive Examples
G.i Example 1
Example 1 is an example of use of the study design and method capture technology described herein for determining a study design result for performing an N-linked glycan characterization and quantitation study on a Fc-fusion protein.
G. i. a User Input and Target Biologic GBAs
In Example 1 , a user inputs (i) a molecule type, (ii) bioprocess information, and (iii) an amino acid sequence (nominal primary structure) of the target biologic. The molecule type is specified as a Fc-fusion protein. One GBA implication of this molecule type is that the glycan profile is likely to be more complex and heterogeneous than for a monoclonal antibody (which tend to have one N-glycosylation site in the CH2 domain of each heavy chain at Asn297). Consequently, the analysis method will be experimentally iterative as one works to optimize the method. Another implication of this molecule type is for the present technology to raise a warning flag for aggregation potential.
With regard to bioprocess information, the user input specifies that the target biologic is produced via a Chinese Hamster Ovary (CHO) cell culture and purified via protein A purification followed by hydrophobic interaction chromatography. An implication of this method of production is that data analysis and interpretation would need to use a CHO N- glycan database. By contrast, if the biologic were to originate from a human clinical sample, an additional human N-glycan database would be needed to narrow down the search space.
The protein A purification and hydrophobic interaction chromatography bioprocess steps have no direct implication on the analysis method per se, but may be used themselves as GBAs that indicate aggregation potential, or may be used to determine a specific GBA corresponding to aggregation potential (e.g., as described below see below).
The desired structural characterization is a characterization and quantitation of N- linked glycans for potential QC-lot release. Accordingly, study class attributes input by the user include attribute specifying an N-linked glycan study and that both characterization and quantification is desired. These may be input separately, e.g., as "N-linked glycan characterization", "N-linked glycan quantification", or in a hierarchical fashion, e.g., by first selecting N-linked glycan study (e.g., from a graphical control element of a GUI), followed by selection of characterization and quantification options (e.g., also via graphical control elements of a GUI). An additional attribute specifies that a lot-release study is desired (e.g., a user may input "lot-release").
The study class attributes have implications for the types of analytical methods that will be included in the determined study design. For example, the characterization attribute implies a need to identify glycan composition and linkage isoforms. The quantification and lot-release attributes imply use of robust HPLC-based separation, rather than capillary electrophoresis. The lot-release attribute also has an impact on the use of a fluorescent label, as will be described below.
Preprocessing of the amino acid sequence input by the user is used to obtain target biologic GBAs that include the following:
• A GBA corresponding to the locations of likely N-linked glycosylation sites.
In particular, four likely N-linked glycosylation sites are predicted, three of which are located on the fusion portion of the target biologic and one of which is on the Fc end of the target biologic, and
• A GBA corresponding to a percentage - approximately 50% - of hydrophobic amino acids on the fusion portion of the target biologic.
The GBA corresponding to the locations of the N-linked glycosylation sites will help guide data interpretation. In particular the one site that is on the Fc end will exhibit a similar profile to known monoclonal antibodies, whereas the three sites that are on the fusion portion will look markedly different.
The approximately 50% fraction of hydrophobic amino acids on the fusion portion GBA is relevant with regard to the inclusion of a protein denaturation step, as described below.
An additional GBA corresponds to the molecule type input by the user. An implication based on the molecule type is that Fc-fusion tends to generate aggregates during purification steps and during storage. Aggregation propensity is not a universal rule, but a good heuristic with Fc-fusion proteins.
G. i. b Analytical Stage Results (Study Design Results)
Based on the GBAs of the target biologic, as described above, including the bioprocess information and study class attributes, the following series of analytical stage results are determined as appropriate for obtaining the desired structural characterization of the target biologic.
• Analytical Stages: Sample preparation (protein denaturation and reduction)
As a result of the GBAs of this biologic, sample preparation stages will be guided to comprise protein denaturation and reduction with a mild non-ionic detergent, such as PS-20 at a low 0.1% concentration, to solubilize the protein and open up its 3D structure for complete deglycosylation. PS-20, being mild, will not harm the PNGAseF enzyme (relevant for an enzymatic digestion step, listed below). By contrast, a stronger denaturing agent like SDS could potentially inactivate PNGaseF and can also affect downstream mass spectrometry performance. Use of a detergent solution as part of the sample preparation step is informed by the hydrophobic amino acid content and the aggregation propensity GBAs identified above.
In a reduction stage, 5mM DTT should be used as reducing agent -a stronger reducing agent like TCEP could potentially harm PNGaseF.
• Analytical Stages: Enzymatic digestion
The glycosylation analysis study class attribute will enroll a deglycosylating class of enzymes for digestion. Most common is peptide N-glycosidase F (PNGase F), which would be used as a starting point.
• Analytical Stages: Detection
Based on the glycosylation analysis study class attribute a fluorescence labeling detection approach will be used to detect and quantify glycans. Options for fluorescent labeling agents include 2-AB, 2-AA, APTS, and RapiFluor. Based on additional study class attributes, either 2-AB or 2-AA should be used. In particular, 2-AB or 2-AA are compatible with HPLC-based separation and are well-established in quality control (QC) environments, as required due to the lot-release study class attribute. APTS is used with capillary electrophoresis-based separation, and therefore would not be selected as an appropriate fluorescent label. The RapidFluor label also would not be selected due to the lot-release study class attribute, as RapiFluor is a more recent label, and therefore more appropriate for early-stage characterization rather than lot release.
• Analytical Stages: Separation A Hydrophilic Interaction Chromatography (HILIC) separation stage would be determined based on the 2-AB labeled glycan analysis. RPLC has the downside of a shallow gradient ramp-up, which implies a high delivery gradient and therefore potential reproducibility issues - accordingly not a good fit given the lot-release study class attribute.
Specific HILIC columns and bead size options are based on the instrumentation available (UPLC vs HPLC for example) and can be guided iteratively by the quality of the data. Parameters specifying properties of the mobile phases determined would include: Mobile Phase A: pure acetonitrile; and Mobile Phase B: 50mM ammonium acetate, pH 4.
• Analytical Stages: Mass spectrometry
Mass spectrometry parameters determined based on the study class attributes and GBAs include the following:
o Positive polarity, to characterize the glycan composition and provide linkage information,
o Additional negative-polarity mass spec could provide complementary information.
o Instrument: any high-resolution ion-trap instrument (such as Orbitrap) to enable multi-stage MSn analysis,
o 300-2000 scan range, data-dependent acquisition mode for the top 10 precursor ions.
G. ii Example 2
Example 2 is an example of use of the study design and method capture technology described herein for determining a study design result for performing a di-sulfide linkage study on a Fc-fusion protein. G. it a User Input and Target Biologic GBAs
In Example 2, a user inputs (i) a molecule type, (ii) bioprocess information, and (iii) an amino acid sequence (nominal primary structure) of the target biologic. The molecule type is specified as a Fc-fusion protein. With regard to bioprocess information, the user input specifies that the target biologic is produced via an E. coli cell culture. The bacterial host cell raises a warning that the subsequent downstream processing could lead to disulfide bond scrambling.
The desired structural characterization is a characterization of disulfide linkages and free cysteines, if any are present in the target biologic. Accordingly study class attributes identify a characterization study, disulfide linkages, and free cysteine characterization. An impact of the study class attributes on sample prep is to avoid reduction/alkylation for the main study group, and use reduction/alkylation for a negative control group. The study class attribute also dictates use of multiple enzymes for digestion.
Preprocessing of the amino acid sequence is used to obtain target biologic GBAs that include:
• A GBA corresponding to a molecular weight of the target biologic with a value of 50 kDa.
• A GBA corresponding to a total number of cysteines of the target biologic, specifying that there are 12 cysteines.
The molecular weight GBA influences the particular mass spectrometry step that is determined. In particular, it indicates that top-down intact protein MS may be feasible (as opposed to the case for a full-sized monoclonal antibody, where it is considerably more difficult).
The number of cysteines GBA is relevant for the likelihood that unpaired cysteines are present in the target biologic. In particular, if the number of cysteines were odd, this would point to an unpaired Cys residue, which would need to be identified as part of the study design as it could be the site of unwanted modifications. The even number in this case does not exclude the possibility of having unpaired cysteines, but it reduces it from a certainty to a possibility.
Additionally, a priori disulfide linkages may be known. This would usually be the case for a monoclonal antibody. In the case of a fusion protein, this information may be available from either prior measurements or from homology to a known molecule, in which case it can enable the following 2 GBAs, which will guide the data interpretation.
• Enzymatic digestion with trypsin yields 3 peptides containing 1 disulfide each, and 1 peptide containing 3 disulfides - this will facilitate matching of experimental and predicted results, increasing confidence in data interpretation.
• Enzymatic digestion with Glu-C yields 1 peptide with 1 disulfide and 1
peptide with 2 disulfides -same benefit as above.
G. it b Analytical Stage Results (Analytical Method and Study Design Results) Based on the GBAs of the target biologic, as described above, along with the bioprocess information and study class attributes, two study design results are determined.
Study 1; Peptide mapping
• Analytical Stages: Sample preparation (protein deiiatu ration and reduction)
Protein denaturation using 6M GuHCl or 8M urea is identified as an analytical method step. Both are very strong denaturants at these high concentrations, and are identified in order to open up the protein completely to maximize enzymatic digestion. This approach contrasts with Example 1 above, where the glycans are relatively exposed and therefore only in need of mild denaturation. Also, the enzyme choices in the study design results for Example 2, as described below, are more tolerant of strong denaturants than those identified in Example 1.
No reduction or alkylation step is used for the main study group. Reduction or alkylation is only used for the negative control group.
· Analytical Stages: Enzymatic digestion
A trypsin enzymatic digestion stage is included in the analytical method result of the determined study design result. The trypsin digestion stage include exposure to trypsin followed by Glu-C to help elucidate disulfide linkages in single (tryptic) fragments having multiple disulfides.
A pepsin enzymatic digestion stage is also included in the determined study design result as an orthogonal assay providing complementary information. In the pepsin digestion step, a low pH is specified to be used for minimal disulfide bond scrambling.
• Analytical Stages: Detection
No detection stages are needed for liquid chromatography (LC), based on the study class attributes. If the study class attributes had included QC/lot release as an attribute, then an LC detection technique would have been recommended (e.g., UV-Vis detection).
• Analytical Stages: Separation
As a separation stage, RP-HPLC, using either C12 or C18 columns is determined. If the target biologic were identified as having a high hydrophobic content (e.g., via another GBA), a C12 column would be preferable to CI 8.
The mobile phase compositions and gradients for typical RP-HPLC would be recommended as a starting point.
• Analytical Stages: Mass spectrometry
Mass spectrometry parameters determined based on the study class attributes and GBAs include the following: o Polarity: positive ions only
o Instrument: any high-resolution ion-trap instrument (such as Orbitrap) to enable multi-stage MSn analysis,
o Scan range: 300-2000, data-dependent acquisition mode for top 10 precursor ions
Use 1st round data to identify low-confidence peals for targeted MS/MS as in the following parameter (Targeted MS/MS). o Targeted MS/MS for improved sequence coverage
CID, HCD and ETD fragmentations to yield complementary information
Study 2; Intact mass analysis (top-down analysis without enzyme digestion to map the disulfide linkages)
A second study design result for intact mass analysis is determined. No sample preparation stage is included in the second study design result.
• Analytical Stage: Separation
The determined separation stage would identify a C4 column or low hydrophobicity polymer-based column (e.g., PS-DVB) as an intact protein is typically more hydrophobic than peptides.
• Analytical Stages: Mass spectrometry
Mass spectrometry parameters determined based on the study class attributes and GBAs include the following:
o Instrument: need highest resolution possible to enable proper
interpretation of intact protein spectra -for example Orbitrap Lumos- class instruments.
o CID, HCD, ETD fragmentations to yield complementary information. H. Computer System and Network Environment
As shown in FIG. 10, an implementation of a network environment 1000 for use in providing systems and methods for determining study design results described herein is shown and described. In brief overview, referring now to FIG. 10, a block diagram of an exemplary cloud computing environment 1000 is shown and described. The cloud computing environment 1000 may include one or more resource providers 1002a, 1002b, 1002c (collectively, 1002). Each resource provider 1002 may include computing resources. In some implementations, computing resources may include any hardware and/or software used to process data. For example, computing resources may include hardware and/or software capable of executing algorithms, computer programs, and/or computer applications. In some implementations, exemplary computing resources may include application servers and/or databases with storage and retrieval capabilities. Each resource provider 1002 may be connected to any other resource provider 1002 in the cloud computing environment 1000. In some implementations, the resource providers 1002 may be connected over a computer network 1008. Each resource provider 1002 may be connected to one or more computing device 1004a, 1004b, 1004c (collectively, 1004), over the computer network 1008.
The cloud computing environment 1000 may include a resource manager 1006. The resource manager 1006 may be connected to the resource providers 1002 and the computing devices 1004 over the computer network 1008. In some implementations, the resource manager 1006 may facilitate the provision of computing resources by one or more resource providers 1002 to one or more computing devices 1004. The resource manager 1006 may receive a request for a computing resource from a particular computing device 1004. The resource manager 1006 may identify one or more resource providers 1002 capable of providing the computing resource requested by the computing device 1004. The resource manager 1006 may select a resource provider 1002 to provide the computing resource. The resource manager 1006 may facilitate a connection between the resource provider 1002 and a particular computing device 1004. In some implementations, the resource manager 1006 may establish a connection between a particular resource provider 1002 and a particular computing device 1004. In some implementations, the resource manager 1006 may redirect a particular computing device 1004 to a particular resource provider 1002 with the requested computing resource.
FIG. 11 shows an example of a computing device 1100 and a mobile computing device 1 150 that can be used to implement the techniques described in this disclosure. The computing device 1 100 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The mobile computing device 1 150 is intended to represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart-phones, and other similar computing devices. The components shown here, their connections and relationships, and their functions, are meant to be examples only, and are not meant to be limiting.
The computing device 1 100 includes a processor 1102, a memory 1104, a storage device 1 106, a high-speed interface 1 108 connecting to the memory 1104 and multiple highspeed expansion ports 1 110, and a low-speed interface 1 112 connecting to a low-speed expansion port 1 114 and the storage device 1 106. Each of the processor 1102, the memory 1104, the storage device 1106, the high-speed interface 1108, the high-speed expansion ports 1 110, and the low-speed interface 1 112, are interconnected using various busses, and may be mounted on a common motherboard or in other manners as appropriate. The processor 1102 can process instructions for execution within the computing device 1 100, including instructions stored in the memory 1104 or on the storage device 1 106 to display graphical information for a GUI on an external input/output device, such as a display 1 116 coupled to the high-speed interface 1108. In other implementations, multiple processors and/or multiple buses may be used, as appropriate, along with multiple memories and types of memory. Also, multiple computing devices may be connected, with each device providing portions of the necessary operations (e.g., as a server bank, a group of blade servers, or a multi-processor system). Thus, as the term is used herein, where a plurality of functions are described as being performed by "a processor", this encompasses embodiments wherein the plurality of functions are performed by any number of processors (one or more) of any number of computing devices (one or more). Furthermore, where a function is described as being performed by "a processor", this encompasses embodiments wherein the function is performed by any number of processors (one or more) of any number of computing devices (one or more) (e.g., in a distributed computing system).
The memory 1 104 stores information within the computing device 1100. In some implementations, the memory 1 104 is a volatile memory unit or units. In some
implementations, the memory 1104 is a non-volatile memory unit or units. The memory
1 104 may also be another form of computer-readable medium, such as a magnetic or optical disk.
The storage device 1 106 is capable of providing mass storage for the computing device 1 100. In some implementations, the storage device 1106 may be or contain a computer-readable medium, such as a floppy disk device, a hard disk device, an optical disk device, or a tape device, a flash memory or other similar solid state memory device, or an array of devices, including devices in a storage area network or other configurations.
Instructions can be stored in an information carrier. The instructions, when executed by one or more processing devices (for example, processor 1 102), perform one or more methods, such as those described above. The instructions can also be stored by one or more storage devices such as computer- or machine-readable mediums (for example, the memory 1 104, the storage device 1106, or memory on the processor 1 102).
The high-speed interface 1108 manages bandwidth-intensive operations for the computing device 1 100, while the low-speed interface 11 12 manages lower bandwidth- intensive operations. Such allocation of functions is an example only. In some
implementations, the high-speed interface 1 108 is coupled to the memory 1 104, the display 11 16 (e.g., through a graphics processor or accelerator), and to the high-speed expansion ports 11 10, which may accept various expansion cards (not shown). In the implementation, the low-speed interface 1 1 12 is coupled to the storage device 1 106 and the low-speed expansion port 1 114. The low-speed expansion port 1114, which may include various communication ports (e.g., USB, Bluetooth®, Ethernet, wireless Ethernet) may be coupled to one or more input/output devices, such as a keyboard, a pointing device, a scanner, or a networking device such as a switch or router, e.g., through a network adapter.
The computing device 1100 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a standard server 1 120, or multiple times in a group of such servers. In addition, it may be implemented in a personal computer such as a laptop computer 1 122. It may also be implemented as part of a rack server system 1 124. Alternatively, components from the computing device 1 100 may be combined with other components in a mobile device (not shown), such as a mobile computing device 1 150. Each of such devices may contain one or more of the computing device 1 100 and the mobile computing device 1150, and an entire system may be made up of multiple computing devices communicating with each other.
The mobile computing device 1 150 includes a processor 1 152, a memory 1164, an input/output device such as a display 1154, a communication interface 1 166, and a transceiver 1168, among other components. The mobile computing device 1 150 may also be provided with a storage device, such as a micro-drive or other device, to provide additional storage. Each of the processor 1152, the memory 1 164, the display 1154, the communication interface 1166, and the transceiver 1168, are interconnected using various buses, and several of the components may be mounted on a common motherboard or in other manners as appropriate.
The processor 1 152 can execute instructions within the mobile computing device 1 150, including instructions stored in the memory 1 164. The processor 1152 may be implemented as a chipset of chips that include separate and multiple analog and digital processors. The processor 1152 may provide, for example, for coordination of the other components of the mobile computing device 1 150, such as control of user interfaces, applications run by the mobile computing device 1 150, and wireless communication by the mobile computing device 1150.
The processor 1 152 may communicate with a user through a control interface 1158 and a display interface 1 156 coupled to the display 1 154. The display 1 154 may be, for example, a TFT (Thin-Film-Transistor Liquid Crystal Display) display or an OLED (Organic Light Emitting Diode) display, or other appropriate display technology. The display interface 1156 may comprise appropriate circuitry for driving the display 1 154 to present graphical and other information to a user. The control interface 1 158 may receive commands from a user and convert them for submission to the processor 1152. In addition, an external interface 1 162 may provide communication with the processor 1 152, so as to enable near area communication of the mobile computing device 1 150 with other devices. The external interface 1162 may provide, for example, for wired communication in some implementations, or for wireless communication in other implementations, and multiple interfaces may also be used. The memory 1 164 stores information within the mobile computing device 1 150. The memory 1164 can be implemented as one or more of a computer-readable medium or media, a volatile memory unit or units, or a non-volatile memory unit or units. An expansion memory 1174 may also be provided and connected to the mobile computing device 1150 through an expansion interface 1172, which may include, for example, a SIMM (Single In Line Memory Module) card interface. The expansion memory 1 174 may provide extra storage space for the mobile computing device 1 150, or may also store applications or other information for the mobile computing device 1 150. Specifically, the expansion memory 1 174 may include instructions to carry out or supplement the processes described above, and may include secure information also. Thus, for example, the expansion memory 1174 may be provide as a security module for the mobile computing device 1 150, and may be programmed with instructions that permit secure use of the mobile computing device 1 150. In addition, secure applications may be provided via the SIMM cards, along with additional information, such as placing identifying information on the SIMM card in a non-hackable manner.
The memory may include, for example, flash memory and/or NVRAM memory (nonvolatile random access memory), as discussed below. In some implementations, instructions are stored in an information carrier, that the instructions, when executed by one or more processing devices (for example, processor 1152), perform one or more methods, such as those described above. The instructions can also be stored by one or more storage devices, such as one or more computer- or machine-readable mediums (for example, the memory 1 164, the expansion memory 1174, or memory on the processor 1 152). In some
implementations, the instructions can be received in a propagated signal, for example, over the transceiver 1168 or the external interface 1162. The mobile computing device 1150 may communicate wirelessly through the communication interface 1166, which may include digital signal processing circuitry where necessary. The communication interface 1166 may provide for communications under various modes or protocols, such as GSM voice calls (Global System for Mobile
communications), SMS (Short Message Service), EMS (Enhanced Messaging Service), or MMS messaging (Multimedia Messaging Service), CDMA (code division multiple access), TDMA (time division multiple access), PDC (Personal Digital Cellular), WCDMA
(Wideband Code Division Multiple Access), CDMA2000, or GPRS (General Packet Radio Service), among others. Such communication may occur, for example, through the transceiver 1168 using a radio-frequency. In addition, short-range communication may occur, such as using a Bluetooth®, Wi-Fi™, or other such transceiver (not shown). In addition, a GPS (Global Positioning System) receiver module 1170 may provide additional navigation- and location-related wireless data to the mobile computing device 1150, which may be used as appropriate by applications running on the mobile computing device 1150.
The mobile computing device 1150 may also communicate audibly using an audio codec 1160, which may receive spoken information from a user and convert it to usable digital information. The audio codec 1160 may likewise generate audible sound for a user, such as through a speaker, e.g., in a handset of the mobile computing device 1150. Such sound may include sound from voice telephone calls, may include recorded sound (e.g., voice messages, music files, etc.) and may also include sound generated by applications operating on the mobile computing device 1150.
The mobile computing device 1150 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a cellular telephone 1180. It may also be implemented as part of a smart-phone 1182, personal digital assistant, or other similar mobile device. Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various implementations can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software, software applications or code) include machine instructions for a programmable processor, and can be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms machine-readable medium and computer-readable medium refer to any computer program product, apparatus and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine- readable medium that receives machine instructions as a machine-readable signal. The term machine-readable signal refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front end component (e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back end, middleware, or front end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network (LAN), a wide area network (WAN), and the Internet.
The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
In some implementations, the modules (e.g. biologic preprocessing module, machine learning module) described herein can be separated, combined or incorporated into single or combined modules. The modules depicted in the figures are not intended to limit the systems described herein to the software architectures shown therein.
Throughout the description, where apparatus and systems are described as having, including, or comprising specific components, or where processes and methods are described as having, including, or comprising specific steps, it is contemplated that, additionally, there are apparatus, and systems of the present invention that consist essentially of, or consist of, the recited components, and that there are processes and methods according to the present invention that consist essentially of, or consist of, the recited processing steps.
Ill It should be understood that the order of steps or order for performing certain action is immaterial so long as the invention remains operable. Moreover, two or more steps or actions may be conducted simultaneously.
While the invention has been particularly shown and described with reference to specific preferred embodiments, it should be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

CLAIMS What is claimed is:
1. A method of automatically identifying analytical study design parameters for analysis of a target biologic, the method comprising:
(a) receiving, by a processor of a computing device, an input query comprising one or more generalizable biologic attributes (GBAs) of the target biologic, wherein the one or more GBAs comprise(s) one or more study class attributes;
(b) accessing, by the processor, a method store comprising a plurality of analytical stage records, each analytical stage record corresponding to a specific analytical stage having been implemented as a step of an analytical method used in an analytical study for structural characterization of an associated known biologic, wherein:
(i) each analytical stage record comprises an identifier of the corresponding specific analytical stage,
(ii) each analytical stage record comprises a series of parameter values used to implement the corresponding analytical stage for characterizing the associated known biologic, and
(iii) each analytical stage record is linked to one or more GBAs of the associated known biologic ;
(c) determining, by the processor, responsive to the input query, one or more study design results based on the GBAs of the target biologic and the one or more study class attributes, wherein step (c) is performed using a machine learning module that identifies patterns linking the GBAs with the analytical stage records of the method store; and
(d) providing, by the processor, the one or more study design results for display and/or further processing.
2. The method of claim 1 , wherein the one or more GBAs of the target biologic from the input query are determined based on one or more members selected from the group consisting of an amino acid sequence, a molecule type, and a known identification of one or more disulfide linkage sites.
3. The method of claim 1 , wherein the one or more GBAs of the target biologic comprise(s) one or more members selected from the group consisting of:
a sequence fragment;
a molecular weight of the target biologic;
a molecule type of the target biologic;
a quantification of one or more specific amino acids within the target biologic;
a proportion of amino acids within the target biologic having a particular
classification based on one or more specific properties;
an identification and/or quantification of patterns of amino acid motifs associated with propensity towards one or more specific types of amino acid modifications; and
one or more metrics corresponding to a result of applying one or more sample handling techniques, including one or more fragmentation techniques, to the target biologic.
4. The method of claim 3, wherein the one or more specific types of amino acid modifications comprise(s) at least one member selected from the group consisting of (i) to (vii) as follows: (i) positions and/or number of potential sites of oxidation; (ii) positions and/or number of potential sites of deamidation; (iii) positions and/or number of potential sites of post-translational modifications; (iv) positions and/or number of potential sites of N- terminal modification; (v) positions and/or number of potential sites of C-terminal modifications;; (vi) positions and/or number of potential instances of isomerization; and (vii) positions and/or number of instances of racemization.
5. The method of claim 4, wherein the one or more specific types of amino acid modifications comprise(s) positions and/or number of potential sites of post-translation modifications, wherein the potential sites of post-translational modifications comprise at least one member selected from the group consisting of N-linked glycosylation, disulfide bridges, disulfide knots, and modification of cysteine to formylglycine.
6. The method of any one of the preceding claims, wherein the one or more GBAs of the target biologic comprise(s) a proportion of amino acids within the target biologic having a particular classification based on one or more specific properties, wherein the one or more specific properties comprise(s) one or more members selected from the group consisting of: hydrophobicity;
hydrophilicity;
charge;
acidity; and
aromaticity.
7. The method of any one of the preceding claims, wherein the one or more GBAs of the target biologic comprise(s) one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more fragmentation techniques comprise(s) one or more members selected from the group consisting of enzymatic digestion, chemical digestion, and gas-phase fragmentation.
8. The method of any one of the preceding claims, wherein the one or more GBAs of the target biologic comprise(s) one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more metrics comprise(s) one or more members selected from the group consisting of:
a likelihood of enzymatic cleavage resulting from digestion with select proteolytic enzymes for which cleavage sites are highly specific, whether the enzymes are applied singly, serially or simultaneously;
a fragmentation partem; and
a statistical distribution of fragments by fragment length and/or by fragment molecular weight.
9. The method of any one of the preceding claims, wherein the one or more GBAs of the target biologic comprise(s) one or more bioprocess attributes representing parameters of a bioprocess used to produce the target biologic.
10. The method of claim 9, wherein the one or more bioprocess attributes comprise(s) one or more members selected from the group consisting of:
an identification of a cell culture type used to produce the target biologic; and an identification of a purification stage performed subsequently to harvesting a cell culture supernatant.
11. The method of any one of the preceding claims, wherein the one or more GBAs of the associated known biologic comprise(s) one or more members selected from the group consisting of:
a sequence fragment; a molecular weight of the associated known biologic;
a molecule type of the associated known biologic;
a quantification of one or more specific amino acids within the associated known biologic;
a proportion of amino acids within the associated known biologic having a particular classification based on one or more specific properties;
an identification and/or quantification of patterns of amino acid motifs associated with propensity towards one or more specific types of amino acid modifications; and
one or more metrics corresponding to a result of applying one or more sample handling techniques, including one or more fragmentation techniques, to the associated known biologic.
12. The method of claim 11 , wherein the one or more specific types of amino acid modifications comprise(s) at least one member selected from the group consisting of (i) to (vii) as follows: (i) positions and/or number of potential sites of oxidation; (ii) positions and/or number of potential sites of deamidation; (iii) positions and/or number of potential sites of post-translational modifications; (iv) positions and/or number of potential sites of N- terminal modification; (v) positions and/or number of potential sites of C-terminal modifications; (vi) positions and/or number of potential instances of isomerization; and (vii) positions and/or number of potential instances of racemization.
13. The method of claim 12, wherein the one or more specific types of amino acid modifications comprise(s) positions and/or number of potential sites of post-translational modifications, wherein the potential sites of post-translational modifications comprise at least one member selected from the group consisting of N-linked glycosylation, disulfide bridges, disulfide knots, and modification of cysteine to formylglycine.
14. The method of any one of the preceding claims, wherein the one or more GBAs of the associated known biologic comprise(s) a proportion of amino acids within the associated known biologic having a particular classification based on one or more specific properties, wherein the one or more specific properties comprise(s) one or more members selected from the group consisting of:
hydrophobicity;
hydrophilicity;
charge;
acidity; and
aromaticity.
15. The method of any one of the preceding claims, wherein the one or more GBAs of the associated known biologic comprise(s) one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more fragmentation techniques comprise(s) one or more members selected from the group consisting of enzymatic digestion, chemical digestion, and gas-phase fragmentation.
16. The method of any one of the preceding claims, wherein the one or more GBAs of the associated known biologic comprise(s) one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more metrics comprise(s) one or more members selected from the group consisting of: a likelihood of enzymatic cleavage resulting from digestion with select proteolytic enzymes for which cleavage sites are highly specific, whether the enzymes are applied singly, serially or simultaneously;
a fragmentation partem; and
a statistical distribution of fragments by fragment length and/or by fragment molecular weight.
17. The method of any one of the preceding claims, wherein the one or more GBAs of the associated known biologic comprise(s) one or more bioprocess attributes representing parameters of a biomanufacturing process used to produce the associated known biologic.
18. The method of claim 17, wherein the one or more bioprocess attributes comprise(s) one or more members selected from the group consisting of:
an identification of a cell culture type used to produce the associated known biologic; and
an identification of a purification stage performed subsequently to harvesting a cell culture supernatant.
19. The method of any one of the preceding claims, wherein the method comprises:
receiving, by the processor, a user input comprising one or more known or expected structural features of the target biologic;
determining, by the processor, using the one or more known or expected structural features of the target biologic, the one or more GBAs of the target biologic; and providing, by the processor, the determined one or more target biologic GBAs via the input query for automated identification of analytical study design parameters for analysis of the target biologic.
20. The method of claim 19, wherein the one or more known or expected structural features of the target biologic comprise(s) at least one feature selected from the group consisting of an amino acid sequence, locations of disulfide bonds, locations and/or types of glycan structures attached to the target biologic.
21. The method of either of claims 19 or 20, wherein the received user input comprises an identification of a molecule type of the target biologic and the identification of the molecule type is used as a GBA of the one or more GBAs of the target biologic.
22. The method of any one of the preceding claims, wherein the input query comprises one or more bioprocess parameters that represent properties of the bioprocess used to produce the target biologic and the one or more study design results determined in step (c) based further on the one or more bioprocess parameters.
23. The method of any one of the preceding claims, wherein at least one member of the one or more study class attributes corresponds to an identifier of a specific analytical study type selected from the group consisting of:
determination of a molecular weight of the target biologic;
determination of a primary structure of the target biologic;
determination of post-translational modifications;
determination of one or more higher order structures of the target biologic; comparison of the target biologic with a reference biologic;
a lot comparison study;
determination of a critical quality attribute (CQA) map of the target biologic; and determination of an in vivo comparability profile of the target biologic;
24. The method of any one of the preceding claims, wherein each of one or more of the analytical stage records of the method store comprises or is linked to one or more prior study class attributes that identify the structural characterization study in which the analytical stage that the analytical stage record represents was implemented.
25. The method of any one of the preceding claims, wherein each of one or more of the analytical stage records of the method store corresponds to a specific analytical stage selected from the group consisting of:
a separation stage;
a detection stage;
a mass spectrometry stage;
a digestion strategy; and
a sample preparation stage.
26. The method of any one of the preceding claims, wherein at least a portion of the plurality of analytical stage records is created from published documents via automated processing using text mining and/or natural language processing.
27. The method of any one of the preceding claims, wherein at least a portion of the plurality of analytical stage records is created from published documents via automated processing in combination with a user interaction.
28. The method of any one of the preceding claims, wherein at least a portion of the analytical stage records is created from in-house studies in an automated fashion via dedicated software as part of a laboratory information management system.
29. The method of any one of the preceding claims, wherein:
each determined study design result comprises a set of analytical stage results, each representing a specific analytical stage to be applied to the target biologic, and comprising a list of parameters to be used when applying the analytical stage that the analytical stage result represents to the target biologic, and
the analytical stage results of a given study design result are determined via a machine learning module that receives as input the GBAs of the target biologic and determines the set of analytical stage results and, for each analytical stage result, parameter values associated with that stage, based on patterns identified using the GBAs associated with analytical stage records of the method store.
30. The method of claim 29, further comprising:
computing, by the machine learning module, relevant analytical stage record by matching the GBAs of the target biologic with GBAs of a subset of the analytical stage records according to an identified pattern in GBAs of the analytical stage records.
31. The method of any one of the preceding claims, wherein the machine learning module implements a supervised machine learning technique .
32. The method of claim 31, wherein the plurality examples of correct output results comprise previously determined study design results or previously determined analytical stage results.
33. The method of claim 31, wherein the set of input features comprises one or more GBAs.
34. The method of any one of the preceding claims, wherein the machine learning module implements a reinforcement machine learning technique.
35. The method of claim 34, wherein the reinforcement machine learning technique comprises:
determining study design results using a set of training data that comprising a plurality of analytical stage records; and
for each example outputting a performance index.
36. The method of any one of the preceding claims, wherein the machine learning module implements an unsupervised machine learning technique.
37. The method of claim 36, wherein the machine learning module implements the unsupervised machine learning technique as a precursor to a supervised machine learning technique.
38. The method of any one of the preceding claims, comprising:
receiving, by the processor, a user input corresponding to a modification of a particular study design result of the one or more determined study design results;
updating, by the processor, the particular study design result according to the user input; and
storing, by the processor, one or more analytical stage results of the updated study design result as analytical stage records in the method store.
39. The method of claim 38, comprising using the stored analytical stage results of the updated study design result as training data in a supervised machine learning technique implemented by the machine learning module of step (c).
40. The method of any one of the preceding claims, wherein step (d) comprises, for at least one study design result of the determined study design results:
causing, by the processor, display of at least one of a graphical control element corresponding to an analytical stage result of the study design result;
receiving, by the processor, via a user interaction with the with the graphical control element, a user input corresponding to a modification of the analytical stage result;
responsive to the received user input, updating, by the processor, the particular study design result according to the user input; and
storing, by the processor, one or more analytical stage results of the updated study design result as analytical stage records in the method store.
41. The method of any one of the preceding claims, wherein the one or more study design results comprise(s) one or more documents corresponding to software file(s) that specify parameters for analytical instruments.
42. A method of populating a method store corresponding to a database of records representing analytical methods and/or analytical stages thereof having been previously applied in structural characterization of a plurality of known biologies, the method comprising:
(a) creating, by a processor of a computing device, an analytical stage record corresponding to a specific analytical stage having been implemented a step of an analytical method used in an analytical study for structural characterization of an associated known biologic, wherein the analytical stage record comprises:
(i) an identifier of the corresponding specific analytical stage, and
(ii) a series of parameter values used to implement the corresponding analytical stage for characterizing the associated known biologic;
(b) storing, by the processor, the analytical stage record in the method store;
(c) storing, by the processor, one or more generalized biologic attributes (GBAs) of the associated known biologic in the method store; and
(d) linking, by the processor, the one or more known biologic GBAs with the analytical stage record.
43. The method of claim 42, wherein the one or more GBAs of the associated known biologic comprise(s) one or more members selected from the group consisting of:
a sequence fragment;
a molecular weight of the associated known biologic; a molecule type of the associated known biologic;
a quantification of one or more specific amino acids within the associated known biologic;
a proportion of amino acids within the associated known biologic having a particular classification based on one or more specific properties;
an identification and/or quantification of patterns of amino acid motifs associated with propensity towards one or more specific types of amino acid modifications; and
one or more metrics corresponding to a result of applying one or more sample handling techniques, including one or more fragmentation techniques, to the associated known biologic.
44. The method of claim 43, wherein the one or more specific types of amino acid modifications comprise(s) at least one member selected from the group consisting of (i) to (vii) as follows: (i) positions and/or number of potential sites of oxidation; (ii) positions and/or number of potential sites of deamidation; (iii) positions and/or number of potential sites of post-translational modifications; (iv) positions and/or number of potential sites of N- terminal modification; (v) positions and/or number of potential sites of C-terminal modifications; (vi) positions and/or number of potential instances of isomerization; and (vii) positions and/or number of potential instances of racemization.
45. The method of claim 44, wherein the one or more specific types of amino acid modifications comprise(s) positions and/or number of potential sites of post-translational modifications, wherein the potential sites of post-translational modifications comprise at least one member selected from the group consisting of N-linked glycosylation; disulfide bridges; disulfide knots; modification of cysteine to formylglycine.
46. The method of any one of claims 42 to 45, wherein the one or more GBAs of the associated known biologic comprise(s) a proportion of amino acids within the associated known biologic having a particular classification based on one or more specific properties, wherein the one or more specific properties comprise(s) one or more members selected from the group consisting of:
hydrophobicity;
hydrophilicity;
charge;
acidity; and
aromaticity.
47. The method of any one of claims 42 to 46, wherein the one or more GBAs of the associated known biologic comprise(s) one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more fragmentation techniques comprise(s) one or more members selected from the group consisting of enzymatic digestion, chemical digestion, and gas-phase fragmentation.
48. The method of any one of claims 42 to 47, wherein the one or more GBAs of the associated known biologic comprise(s) one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more metrics comprise(s) one or more members selected from the group consisting of:
a likelihood of enzymatic cleavage resulting from digestion with select proteolytic enzymes for which cleavage sites are highly specific, whether the enzymes are applied singly, serially or simultaneously; a fragmentation partem; and
a statistical distribution of fragments by fragment length and/or by fragment molecular weight.
49. The method of any one of claims 42 to 48, wherein the one or more GBAs of the associated known biologic comprise(s) one or more bioprocess attributes representing parameters of a biomanufacturing process used to produce the associated known biologic.
50. The method of claim 49, wherein the one or more bioprocess attributes comprise(s) one or more members selected from the group consisting of:
an identification of a cell culture type used to produce the associated known biologic; and
an identification of a purification stage performed subsequently to harvesting a cell culture supernatant.
51. The method of any one of claims 42 to 50, wherein the one or more GBAs of the associated known biologic comprise(s) one or more study class attributes, each of which identifies a type of analytical study performed on the associated known biologic using an analytical method comprising the analytical stage that the analytical stage record represents.
52. The method of claim 51 , wherein at least one of the one or more study class attributes corresponds to an identifier of a specific analytical study type selected from the group consisting of:
determination of a molecular weight of the target biologic;
determination of a primary structure of the target biologic; determination of post-translational modifications;
determination of one or more higher order structures of the target biologic;
comparison of the target biologic with a reference biologic;
a lot comparison study;
determination of a critical quality attribute (CQA) map of the target biologic; and determination of an in vivo comparability profile of the target biologic.
53. The method of any one of claims 42 to 52, wherein the analytical stage record of the method store corresponds to a specific analytical stage selected from the group consisting of: a separation stage;
a detection stage;
a mass spectrometry stage;
a digestion strategy; and
a sample preparation stage.
54. The method of any one of claims 42 to 53, wherein creating the analytical stage record comprises extracting at least one of the identifier and one or more parameter values of the series of parameter values from published documents via automated processing using text mining and/or natural language processing.
55. The method of any one of claims 42 to 54, wherein creating the analytical stage record comprises extracting at least one of the identifier and one or more parameter values of the series of parameter values from published documents via automated processing in combination with a user interaction.
56. The method of any one of claims 42 to 55, wherein creating the analytical stage record comprises obtaining the identifier and series of parameter values from an in-house study in an automated fashion via dedicated software as part of a laboratory information management system.
57. A method of automatically identifying analytical study design parameters for analysis of a target biologic, the method comprising:
(a) receiving, by a processor of a computing device, an input query comprising one or more generalizable biologic attributes (GBAs) of the target biologic, wherein the one or more GBAs comprise(s) one or more study class attributes;
(b) accessing, by the processor, a method store comprising a plurality of analytical method records, each analytical method record corresponding to a specific analytical method used in an analytical study for structural characterization of an associated known biologic, wherein:
(i) each analytical method record comprises a sequence of analytical stage records, each representing a specific analytical stage used in the analytical method that the analytical method record represents; and
(ii) each analytical method record is linked to one or more GBAs of the associated known biologic;
(c) determining, by the processor, responsive to the input query, one or more study design results based on the GBAs of the target biologic and the one or more study class attributes, wherein step (c) is performed using a machine learning module that identifies patterns linking the GBAs with the analytical method records of the method store; and
(d) providing, by the processor, the one or more study design results for display and/or further processing.
58. The method of claim 57, wherein the one or more GBAs of the target biologic from the input query are determined based on one or more members selected from the group consisting of an amino acid sequence, a molecule type; and a known identification of one or more disulfide linkage sites.
59. The method of claim 57, wherein the one or more GBAs of the target biologic comprise(s) one or more members selected from the group consisting of:
a sequence fragment;
a molecular weight of the target biologic;
a molecule type of the target biologic;
a quantification of one or more specific amino acids within the target biologic;
a proportion of amino acids within the target biologic having a particular
classification based on one or more specific properties;
an identification and/or quantification of patterns of amino acid motifs associated with propensity towards one or more specific types of amino acid modifications; and
one or more metrics corresponding to a result of applying one or more sample handling techniques, including one or more fragmentation techniques, to the target biologic.
60. The method of claim 59, wherein the one or more specific types of amino acid modifications comprise(s) at least one member selected from the group consisting of (i) to (vii) as follows: (i) positions and/or number of potential sites of oxidation; (ii) positions and/or number of potential sites of deamidation; (iii) positions and/or number of potential sites of post-translational modifications; (iv) positions and/or number of potential sites of N- terminal modification; (v) positions and/or number of potential sites of C-terminal modifications; (vi) positions and/or number of potential instances of isomerization; and (vii) positions and/or number of potential instances of racemization.
61. The method of claim 60, wherein the one or more specific types of amino acid modifications comprise(s) positions and/or number of potential sites of post-translational modifications, wherein the potential sites of post-translational modifications comprise at least one member selected from the group consisting of N-linked glycosylation; disulfide bridges; disulfide knots; modification of cysteine to formylglycine.
62. The method of any one of claims 57 to 61 , wherein the one or more GBAs of the target biologic comprise(s) a proportion of amino acids within the target biologic having a particular classification based on one or more specific properties, wherein the one or more specific properties comprise(s) one or more members selected from the group consisting of: hydrophobicity, hydrophilicity, charge, acidity, and aromaticity.
63. The method of any one of claims 57 to 62, wherein the one or more GBAs of the target biologic comprise(s) one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more fragmentation techniques comprise(s) one or more members selected from the group consisting of enzymatic digestion, chemical digestion, and gas-phase fragmentation.
64. The method of any one of claims 57 to 63, wherein the one or more GBAs of the target biologic comprise(s) one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more metrics comprise(s) one or more members selected from the group consisting of: a likelihood of enzymatic cleavage resulting from digestion with select proteolytic enzymes for which cleavage sites are highly specific, whether the enzymes are applied singly, serially or simultaneously;
a fragmentation partem; and
a statistical distribution of fragments by fragment length and/or by fragment molecular weight.
65. The method of any one of claims 57 to 64, wherein the one or more GBAs of the target biologic comprise(s) one or more bioprocess attributes representing parameters of a bioprocess used to produce the target biologic.
66. The method of claim 65, wherein the one or more bioprocess attributes comprise(s) one or more members selected from the group consisting of:
an identification of a cell culture type used to produce the target biologic; and an identification of a purification stage performed subsequently to harvesting a cell culture supernatant.
67. The method of any one of claims 57 to 66, wherein the one or more GBAs of the associated known biologic comprise(s) one or more members selected from the group consisting of:
a sequence fragment;
a molecular weight of the associated known biologic;
a molecule type of the associated known biologic;
a quantification of one or more specific amino acids within the associated known biologic; a proportion of amino acids within the associated known biologic having a particular classification based on one or more specific properties;
an identification and/or quantification of patterns of amino acid motifs associated with propensity towards one or more specific types of amino acid modifications; and
one or more metrics corresponding to a result of applying one or more sample handling techniques, including one or more fragmentation techniques, to the associated known biologic.
68. The method of claim 67, wherein the one or more specific types of amino acid modifications comprise(s) at least one member selected from the group consisting of (i) to (vii) as follows: (i) positions and/or number of potential sites of oxidation; (ii) positions and/or number of potential sites of deamidation; and (iii) positions and/or number of potential sites of post-translational modifications; (iv) positions and/or number of potential sites of N- terminal modification; (v) positions and/or number of potential sites of C-terminal modifications; (vi) positions and/or number of potential instances of isomerization; and (vii) positions and/or number of potential instances of racemization.
69. The method of claim 68, wherein the one or more specific types of amino acid modifications comprise(s) positions and/or number of potential sites of post-translational modifications, wherein the potential sites of post-translational modifications comprise at least one member selected from the group consisting of N-linked glycosylation; disulfide bridges; disulfide knots; modification of cysteine to formylglycine.
70. The method of any one of claims 57 to 67, wherein the one or more GBAs of the associated known biologic comprise(s) a proportion of amino acids within the associated known biologic having a particular classification based on one or more specific properties, wherein the one or more specific properties comprise(s) one or more members selected from the group consisting of hydrophobicity, hydrophilicity, charge, acidity, and aromaticity.
71. The method of any one of claims 57 to 70, wherein the one or more GBAs of the associated known biologic comprise(s) one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more fragmentation techniques comprise(s) one or more members selected from the group consisting of enzymatic digestion, chemical digestion, and gas-phase fragmentation.
72. The method of any one of claims 57 to 71 , wherein the one or more GBAs of the associated known biologic comprise(s) one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more metrics comprise(s) one or more members selected from the group consisting of:
a likelihood of enzymatic cleavage resulting from digestion with select proteolytic enzymes for which cleavage sites are highly specific, whether the enzymes are applied singly, serially or simultaneously;
a fragmentation partem; and
a statistical distribution of fragments by fragment length and/or by fragment molecular weight.
73. The method of any one of claims 57 to 72, wherein the one or more GBAs of the associated known biologic comprise(s) one or more bioprocess attributes representing parameters of a biomanufacturing process used to produce the associated known biologic.
74. The method of claim 73, wherein the one or more bioprocess attributes comprise(s) one or more members selected from the group consisting of:
an identification of a cell culture type used to produce the associated known biologic; and
an identification of a purification stage performed subsequently to harvesting a cell culture supernatant.
75. The method of any one of claims 57 to 74, wherein the method comprises:
receiving, by the processor, a user input comprising one or more known or expected structural features of the target biologic;
determining, by the processor, using the one or more known or expected structural features of the target biologic, the one or more GBAs of the target biologic; and
providing, by the processor, the determined one or more target biologic GBAs via the input query for automated identification of analytical study design parameters for analysis of the target biologic.
76. The method of claim 75, wherein the one or more known or expected structural features of the target biologic comprise(s) at least one feature selected from the group consisting of an amino acid sequence; locations of disulfide bonds; locations and/or types of glycan structures attached to the target biologic.
77. The method of either of claims 75 or 76, wherein the received user input comprises an identification of a molecule type of the target biologic and the identification of the molecule type is used as a GBA of the one or more GBAs of the target biologic.
78. The method of any one of claims 57 to 77, wherein the input query comprises one or more bioprocess parameters that represent properties of the bioprocess used to produce the target biologic and the one or more study design results determined in step (c) based further on the one or more bioprocess parameters.
79. The method of any one of claims 57 to 78, wherein at least one of the one or more study class attributes corresponds to an identifier of a specific analytical study type selected from the group consisting of:
determination of a molecular weight of the target biologic;
determination of a primary structure of the target biologic;
determination of post-translational modifications;
determination of one or more higher order structures of the target biologic;
comparison of the target biologic with a reference biologic;
a lot comparison study;
determination of a critical quality attribute (CQA) map of the target biologic; and determination of an in vivo comparability profile of the target biologic;
80. The method of any one of claims 57 to 79, wherein each of one or more of the analytical method records of the method store comprises or is linked to one or more prior study class attributes that identify the structural characterization study in which the analytical method that the analytical method record represents was implemented.
81. The method of any one of claims 57 to 80, wherein each of one or more of the analytical method records of the method store corresponds to a specific analytical method comprising at least one analytical stage selected from the group consisting of: a separation stage;
a detection stage;
a mass spectrometry stage;
a digestion strategy; and
a sample preparation stage.
82. The method of any one of claims 57 to 81, wherein at least a portion of the plurality of analytical method records were created from published documents via automated processing using text mining and/or natural language processing.
83. The method of any one of claims 57 to 82, wherein at least a portion of the plurality of analytical method records were created from published documents via automated processing in combination with a user interaction.
84. The method of any one of claims 57 to 83, wherein at least a portion of the analytical method records were created from in-house studies in an automated fashion via dedicated software as part of a laboratory information management system.
85. The method of any one of claims 57 to 84, wherein:
each determined study design result comprises a set of analytical method results, each representing a specific analytical method to be applied to the target biologic, and comprising a list of parameters to be used when applying the analytical method that the analytical method result represents to the target biologic, and
the analytical method results of a given study design result are determined via a machine learning module that receives as input the GBAs of the target biologic and determines the set of analytical method results and, for each analytical method result, parameter values associated with that method, based on patterns identified using the GBAs associated with analytical method records of the method store.
86. The method of claim 85, further comprising:
computing, by the machine learning module, relevant analytical stage record by matching the GBAs of the target biologic with GBAs of a subset of the analytical stage records according to an identified pattern in GBAs of the analytical stage records.
87. The method of any one of claims 57 to 86, wherein the machine learning module implements a supervised machine learning technique.
88. The method of claim 87, wherein the plurality examples of correct output results comprise previously determined study design results or previously determined analytical stage results.
89. The method of claim 87, wherein the set of input features comprises one or more GBAs.
90. The method of any one of claims 57 to 89, wherein the machine learning module implements a reinforcement machine learning technique.
91. The method of claim 90, wherein the reinforcement machine learning technique comprises: determining study design results using a set of training data that comprising a plurality of analytical stage records; and
for each example outputting a performance index.
92. The method of any one of claims 57 to 91 , wherein the machine learning module implements an unsupervised machine learning technique.
93. The method of claim 92, wherein the machine learning module implements the unsupervised machine learning technique as a precursor to a supervised machine learning technique.
94. The method of any one of claims 57 to 93, comprising:
receiving, by the processor, a user input corresponding to a modification of a particular study design result of the one or more determined study design results;
updating, by the processor, the particular study design result according to the user input; and
storing, by the processor, one or more analytical method results of the updated study design result as analytical method records in the method store.
95. The method of claim 94, comprising using the stored analytical method results of the updated study design result as training data in a supervised machine learning technique implemented by the machine learning module of step (c).
96. The method of any one of claims 57 to 95, wherein step (d) comprises, for at least one study design result of the determined study design results: causing, by the processor, display of at least one of a graphical control element corresponding to an analytical method result of the study design result;
receiving, by the processor, via a user interaction with the with the graphical control element, a user input corresponding to a modification of the analytical method result;
responsive to the received user input, updating, by the processor, the particular study design result according to the user input; and
storing, by the processor, one or more analytical method results of the updated study design result as analytical method records in the method store.
97. The method of any one of claims 57 to 96, wherein the one or more study design results comprise(s) one or more documents corresponding to software file(s) that specify parameters for analytical instruments.
98. A method of populating a method store corresponding to a database of records representing analytical methods and/or analytical stages thereof having been previously applied in structural characterization of a plurality of known biologies, the method comprising:
(a) creating, by a processor of a computing device, an analytical method record corresponding to a specific analytical method having been used in an analytical study for structural characterization of an associated known biologic, wherein the analytical method record comprises a sequence of analytical stage records, each representing a specific analytical stage used in the analytical method that the analytical method record represents;
(b) storing, by the processor, the analytical method record in the method store;
(c) storing, by the processor, one or more generalized biologic attributes (GBAs) of the associated known biologic in the method store; and (d) linking, by the processor, the one or more known biologic GBAs with the analytical method record.
99. The method of claim 98, wherein the one or more GBAs of the associated known biologic comprise(s) one or more members selected from the group consisting of:
a sequence fragment;
a molecular weight of the associated known biologic;
a molecule type of the associated known biologic;
a quantification of one or more specific amino acids within the associated known biologic;
a proportion of amino acids within the associated known biologic having a particular classification based on one or more specific properties;
an identification and/or quantification of patterns of amino acid motifs associated with propensity towards one or more specific types of amino acid modifications; and
one or more metrics corresponding to a result of applying one or more sample handling techniques, including one or more fragmentation techniques, to the associated known biologic.
100. The method of claim 99, wherein the one or more specific types of amino acid modifications comprise(s) at least one member selected from the group consisting of (i) to (vii) as follows: (i) positions and/or number of potential sites of oxidation; (ii) positions and/or number of potential sites of deamidation; (iii) positions and/or number of potential sites of post-translational modifications; (iv) positions and/or number of potential sites of N- terminal modification; (v) positions and/or number of potential sites of C-terminal modifications; (vi) positions and/or number of potential instances of isomerization; and (vii) positions and/or number of potential instances of racemization.
101. The method of claim 100, wherein the one or more specific types of amino acid modifications comprise(s) positions and/or number of potential sites of post-translational modifications, wherein the potential sites of post-translational modifications comprise at least one member selected from the group consisting of N-linked glycosylation; disulfide bridges; disulfide knots; modification of cysteine to formylglycine.
102. The method of any one of claims 98 to 101, wherein the one or more GBAs of the associated known biologic comprise(s) a proportion of amino acids within the associated known biologic having a particular classification based on one or more specific properties, wherein the one or more specific properties comprise(s) one or more members selected from the group consisting of hydrophobicity, hydrophilicity, charge, acidity, and aromaticity.
103. The method of any one of claims 98 to 102, wherein the one or more GBAs of the associated known biologic comprise(s) one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more fragmentation techniques comprise(s) one or more members selected from the group consisting of enzymatic digestion, chemical digestion, and gas-phase fragmentation.
104. The method of any one of claims 98 to 103, wherein the one or more GBAs of the associated known biologic comprise(s) one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more metrics comprise(s) one or more members selected from the group consisting of: a likelihood of enzymatic cleavage resulting from digestion with select proteolytic enzymes for which cleavage sites are highly specific, whether the enzymes are applied singly, serially or simultaneously;
a fragmentation partem; and
a statistical distribution of fragments by fragment length and/or by fragment molecular weight.
105. The method of any one of claims 98 to 104, wherein the one or more GBAs of the associated known biologic comprise(s) one or more bioprocess attributes representing parameters of a biomanufacturing process used to produce the associated known biologic.
106. The method of claim 105, wherein the one or more bioprocess attributes comprise(s) one or more members selected from the group consisting of:
an identification of a cell culture type used to produce the associated known biologic; and
an identification of a purification stage performed subsequently to harvesting a cell culture supernatant.
107. The method of any one of claims 98 to 106, wherein the one or more GBAs of the associated known biologic comprise(s) one or more study class attributes, each of which identifies a type of analytical study performed on the associated known biologic using an analytical method that the analytical method record represents.
108. The method of claim 107, wherein at least one of the one or more study class attributes corresponds to an identifier of a specific analytical study type selected from the group consisting of:
determination of a molecular weight of the target biologic;
determination of a primary structure of the target biologic;
determination of post-translational modifications;
determination of one or more higher order structures of the target biologic;
comparison of the target biologic with a reference biologic;
a lot comparison study;
determination of a critical quality attribute (CQA) map of the target biologic; and determination of an in vivo comparability profile of the target biologic.
109. The method of any one of claims 98 to 108, wherein the analytical method record of the method store corresponds to a specific analytical method comprising at least one analytical stage selected from the group consisting of:
a separation stage;
a detection stage;
a mass spectrometry stage;
a digestion strategy; and
a sample preparation stage.
110. The method of any one of claims 98 to 109, wherein creating the analytical method record comprises extracting at least one of the identifier and one or more parameter values of the series of parameter values from published documents via automated processing using text mining and/or natural language processing.
11 1. The method of any one of claims 98 to 110, wherein creating the analytical method record comprises extracting at least one of the identifier and one or more parameter values of the series of parameter values from published documents via automated processing in combination with a user interaction.
112. The method of any one of claims 98 to 11 1, wherein creating the analytical method record comprises obtaining the identifier and series of parameter values from an in-house study in an automated fashion via dedicated software as part of a laboratory information management system.
1 13. A system of automatically identifying analytical study design parameters for analysis of a target biologic, comprising:
a processor; and
a memory having instructions stored thereon, wherein the instructions, when executed by the processor cause the processor to:
(a) receive an input query comprising one or more generalizable biologic attributes (GBAs) of the target biologic, wherein the one or more GBAs comprise(s) one or more study class attributes;
(b) access a method store comprising a plurality of analytical stage records, each analytical stage record corresponding to a specific analytical stage having been implemented as a step of an analytical method used in an analytical study for structural characterization of an associated known biologic, wherein:
(i) each analytical stage record comprises an identifier of the corresponding specific analytical stage, (ii) each analytical stage record comprises a series of parameter values used to implement the corresponding analytical stage for characterizing the associated known biologic, and
(iii) each analytical stage record is linked to one or more GBAs of the associated known biologic;
(c) determine responsive to the input query, one or more study design results based on the GBAs of the target biologic and the one or more study class attributes, wherein step (c) is performed using a machine learning module that identifies patterns linking the GBAs with the analytical stage records of the method store; and
(d) provide the one or more study design results for display and/or further processing.
1 14. The system of claim 113, wherein the one or more GBAs of the target biologic from the input query are determined based on one or more members selected from the group consisting of an amino acid sequence, a molecule type; and a known identification of one or more disulfide linkage sites.
115. The system of claim 1 13, wherein the one or more GBAs of the target biologic comprise(s) one or more members selected from the group consisting of:
a sequence fragment;
a molecular weight of the target biologic;
a molecule type of the target biologic;
a quantification of one or more specific amino acids within the target biologic; a proportion of amino acids within the target biologic having a particular
classification based on one or more specific properties;
an identification and/or quantification of patterns of amino acid motifs associated with propensity towards one or more specific types of amino acid modifications; and
one or more metrics corresponding to a result of applying one or more sample handling techniques, including one or more fragmentation techniques, to the target biologic.
116. The system of claim 1 15, wherein the one or more specific types of amino acid modifications comprise(s) at least one member selected from the group consisting of (i) to (vii) as follows: (i) positions and/or number of potential sites of oxidation; (ii) positions and/or number of potential sites of deamidation; (iii) positions and/or number of potential sites of post-translational modifications; (iv) positions and/or number of potential sites of N- terminal modification; (v) positions and/or number of potential sites of C-terminal modifications; (vi) positions and/or number of potential instances of isomerization; and (vii) positions and/or number of potential instances of racemization.
117. The system of claim 1 16, wherein the one or more specific types of amino acid modifications comprise(s) positions and/or number of potential sites of post-translational modifications, wherein the potential sites of post-translational modifications comprise at least one member selected from the group consisting of N-linked glycosylation; disulfide bridges; disulfide knots; modification of cysteine to formylglycine.
118. The system of any one of claims 1 13 to 1 17, wherein the one or more GBAs of the target biologic comprise(s) a proportion of amino acids within the target biologic having a particular classification based on one or more specific properties, wherein the one or more specific properties comprise(s) one or more members selected from the group consisting of hydrophobicity, hydrophilicity, charge, acidity, and aromaticity.
119. The system of any one of claims 1 13 to 118, wherein the one or more GB As of the target biologic comprise(s) one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more fragmentation techniques comprise(s) one or more members selected from the group consisting of enzymatic digestion, chemical digestion, and gas-phase fragmentation.
120. The system of any one of claims 1 13 to 119, wherein the one or more GBAs of the target biologic comprise(s) one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more metrics comprise(s) one or more members selected from the group consisting of:
a likelihood of enzymatic cleavage resulting from digestion with select proteolytic enzymes for which cleavage sites are highly specific, whether the enzymes are applied singly, serially or simultaneously;
a fragmentation partem; and
a statistical distribution of fragments by fragment length and/or by fragment molecular weight.
121. The system of any one of claims 1 13 to 120, wherein the one or more GBAs of the target biologic comprise(s) one or more bioprocess attributes representing parameters of a bioprocess used to produce the target biologic.
122. The system of claim 121, wherein the one or more bioprocess attributes comprise(s) one or more members selected from the group consisting of:
an identification of a cell culture type used to produce the target biologic; and an identification of a purification stage performed subsequently to harvesting a cell culture supernatant.
123. The system of any one of claims 1 13 to 122, wherein the one or more GBAs of the associated known biologic comprise(s) one or more members selected from the group consisting of:
a sequence fragment;
a molecular weight of the associated known biologic;
a molecule type of the associated known biologic;
a quantification of one or more specific amino acids within the associated known biologic;
a proportion of amino acids within the associated known biologic having a particular classification based on one or more specific properties;
an identification and/or quantification of patterns of amino acid motifs associated with propensity towards one or more specific types of amino acid modifications; and
one or more metrics corresponding to a result of applying one or more sample handling techniques, including one or more fragmentation techniques, to the associated known biologic.
124. The system of claim 123, wherein the one or more specific types of amino acid modifications comprise(s) at least one member selected from the group consisting of (i) to (vii) as follows: (i) positions and/or number of potential sites of oxidation; (ii) positions and/or number of potential sites of deamidation; (iii) positions and/or number of potential sites of post-translational modifications; (iv) positions and/or number of potential sites of N- terminal modification; (v) positions and/or number of potential sites of C-terminal modifications; (vi) positions and/or number of potential instances of isomerization; and (vii) positions and/or number of potential instances of racemization.
125. The system of claim 124, wherein the one or more specific types of amino acid modifications comprise(s) position and/or number of potential sites of post-translational modifications, wherein the potential sites of post-translational modifications comprise at least one member selected from the group consisting of N-linked glycosylation; disulfide bridges; disulfide knots; modification of cysteine to formylglycine.
126. The system of any one of claims 1 13 to 125, wherein the one or more GBAs of the associated known biologic comprise(s) a proportion of amino acids within the associated known biologic having a particular classification based on one or more specific properties, wherein the one or more specific properties comprise(s) one or more members selected from the group consisting of hydrophobicity, hydrophilicity, charge, acidity, and aromaticity.
127. The system of any one of claims 1 13 to 126, wherein the one or more GBAs of the associated known biologic comprise(s) one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more fragmentation techniques comprise(s) one or more members selected from the group consisting of enzymatic digestion, chemical digestion, and gas-phase fragmentation.
128. The system of any one of claims 1 13 to 127, wherein the one or more GBAs of the associated known biologic comprise(s) one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more metrics comprises one or more members selected from the group consisting of:
a likelihood of enzymatic cleavage resulting from digestion with select proteolytic enzymes for which cleavage sites are highly specific, whether the enzymes are applied singly, serially or simultaneously;
a fragmentation partem; and
a statistical distribution of fragments by fragment length and/or by fragment molecular weight.
129. The system of any one of claims 1 13 to 128, wherein the one or more GBAs of the associated known biologic comprise(s) one or more bioprocess attributes representing parameters of a biomanufacturing process used to produce the associated known biologic.
130. The system of claim 129, wherein the one or more bioprocess attributes comprise(s) one or more members selected from the group consisting of:
an identification of a cell culture type used to produce the associated known biologic; and
an identification of a purification stage performed subsequently to harvesting a cell culture supernatant.
131. The system of any one of claims 1 13 to 130, wherein the instructions further cause the processor to: receive a user input comprising one or more known or expected structural features of the target biologic;
determine using the one or more known or expected structural features of the target biologic, the one or more GBAs of the target biologic; and
provide the determined one or more target biologic GBAs via the input query for automated identification of analytical study design parameters for analysis of the target biologic.
132. The system of claim 131 , wherein the one or more known or expected structural features of the target biologic comprise(s) at least one feature selected from the group consisting of an amino acid sequence; locations of disulfide bonds; locations and/or types of glycan structures attached to the target biologic.
133. The system of either of claims 131 or 132, wherein the received user input comprises an identification of a molecule type of the target biologic and the identification of the molecule type is used as a GBA of the one or more GBAs of the target biologic.
134. The system of any one of claims 1 13 to 133, wherein the input query comprises one or more bioprocess parameters that represent properties of the bioprocess used to produce the target biologic and the one or more study design results determined in step (c) based further on the one or more bioprocess parameters.
135. The system of any one of claims 1 13 to 134, wherein at least one of the one or more study class attributes corresponds to an identifier of a specific analytical study type selected from the group consisting of: determination of a molecular weight of the target biologic;
determination of a primary structure of the target biologic;
determination of post-translational modifications;
determination of one or more higher order structures of the target biologic;
comparison of the target biologic with a reference biologic;
a lot comparison study;
determination of a critical quality attribute (CQA) map of the target biologic; and determination of an in vivo comparability profile of the target biologic;
136. The system of any one of claims 113 to 135, wherein each of one or more of the analytical stage records of the method store comprises or is linked to one or more prior study class attributes that identify the structural characterization study in which the analytical stage that the analytical stage record represents was implemented.
137. The system of any one of claims 113 to 136, wherein each of one or more of the analytical stage records of the method store corresponds to a specific analytical stage selected from the group consisting of:
a separation stage;
a detection stage;
a mass spectrometry stage;
a digestion strategy; and
a sample preparation stage.
138. The system of any one of claims 1 13 to 137, wherein at least a portion of the plurality of analytical stage records were created from published documents via automated processing using text mining and/or natural language processing.
139. The system of any one of claims 1 13 to 138, wherein at least a portion of the plurality of analytical stage records is created from published documents via automated processing in combination with a user interaction.
140. The system of any one of claims 1 13 to 139, wherein at least a portion of the analytical stage records is created from in-house studies in an automated fashion via dedicated software as part of a laboratory information management system.
141. The system of any one of claims 1 13 to 140, wherein:
each determined study design result comprises a set of analytical stage results, each representing a specific analytical stage to be applied to the target biologic, and comprising a list of parameters to be used when applying the analytical stage that the analytical stage result represents to the target biologic, and
the analytical stage results of a given study design result are determined via a machine learning module that receives as input the GBAs of the target biologic and determines the set of analytical stage results and, for each analytical stage result, parameter values associated with that stage, based on patterns identified using the GBAs associated with analytical stage records of the method store.
142. The system of claim 141 , wherein the instructions further cause the processor to: compute, by the machine learning module, relevant analytical stage record by matching the GBAs of the target biologic with GBAs of a subset of the analytical stage records according to an identified pattern in GBAs of the analytical stage records.
143. The system of any one of claims 1 13 to 142, wherein the machine learning module implements a supervised machine learning technique.
144. The system of claim 143, wherein the plurality examples of correct output results comprise previously determined study design results or previously determined analytical stage results.
145. The system of claim 143, wherein the set of input features comprises one or more GBAs.
146. The system of any one of claims 1 13 to 145, wherein the machine learning module implements a reinforcement machine learning technique.
147. The system of claim 146, wherein the reinforcement machine learning technique comprises:
determining study design results using a set of training data that comprising a plurality of analytical stage records; and
for each example outputting a performance index.
148. The system of any one of claims 1 13 to 147, wherein the machine learning module implements an unsupervised machine learning technique.
149. The system of claim 148, wherein the machine learning module implements the unsupervised machine leaming technique as a precursor to a supervised machine learning technique.
150. The system of any one of claims 1 13 to 149, wherein the instructions further cause the processor to:
receive a user input corresponding to a modification of a particular study design result of the one or more determined study design results;
update the particular study design result according to the user input; and
store one or more analytical stage results of the updated study design result as analytical stage records in the method store.
151. The system of claim 150, wherein the instructions further cause the processor to use the stored analytical stage results of the updated study design result as training data in a supervised machine leaming technique implemented by the machine learning module of step (c).
152. The system of any one of claims 1 13 to 151, wherein the instructions cause the processor to, for at least one study design result of the determined study design results:
cause display of at least one of a graphical control element corresponding to an analytical stage result of the study design result;
receive via a user interaction with the with the graphical control element, a user input corresponding to a modification of the analytical stage result; responsive to the received user input, update the particular study design result according to the user input; and
store one or more analytical stage results of the updated study design result as analytical stage records in the method store.
153. The system of any one of claims 1 13 to 152, wherein the one or more study design results comprise(s) one or more documents corresponding to software file(s) that specify parameters for analytical instruments.
154. A system of populating a method store corresponding to a database of records representing analytical methods and/or analytical stages thereof having been previously applied in structural characterization of a plurality of known biologies, comprising:
a processor; and
a memory having instructions stored thereon, wherein the instructions, when executed by the processor cause the processor to:
(a) create an analytical stage record corresponding to a specific analytical stage having been implemented a step of an analytical method used in an analytical study for structural characterization of an associated known biologic, wherein the analytical stage record comprises:
(i) an identifier of the corresponding specific analytical stage, and
(ii) a series of parameter values used to implement the corresponding analytical stage for characterizing the associated known biologic;
(b) store the analytical stage record in the method store;
(c) store one or more generalized biologic attributes (GBAs) of the associated known biologic in the method store; and (d) link the one or more known biologic GBAs with the analytical stage record.
155. The system of claim 154, wherein the one or more GBAs of the associated known biologic comprise(s) one or more members selected from the group consisting of:
a sequence fragment;
a molecular weight of the associated known biologic;
a molecule type of the associated known biologic;
a quantification of one or more specific amino acids within the associated known biologic;
a proportion of amino acids within the associated known biologic having a particular classification based on one or more specific properties;
an identification and/or quantification of patterns of amino acid motifs associated with propensity towards one or more specific types of amino acid modifications; and
one or more metrics corresponding to a result of applying one or more sample handling techniques, including one or more fragmentation techniques, to the associated known biologic.
156. The system of claim 155, wherein the one or more specific types of amino acid modifications comprise(s) at least one member selected from the group consisting of (i) to (vii) as follows: (i) positions and/or number of potential sites of oxidation; (ii) positions and/or number of potential sites of deamidation; (iii) positions and/or number of potential sites of post-translational modifications; (iv) positions and/or number of potential sites of N- terminal modification; (v) positions and/or number of potential sites of C-terminal modifications; (vi) positions and/or number of potential instances of isomerization; and (vii) positions and/or number of potential instances of racemization.
157. The system of claim 156, wherein the one or more specific types of amino acid modifications comprise(s) positions and/or number of potential sites of post-translational modifications, wherein the potential sites of post-translational modifications comprise at least one member selected from the group consisting of N-linked glycosylation; disulfide bridges; disulfide knots; modification of cysteine to formylglycine.
158. The system of any one of claims 154 to 157, wherein the one or more GBAs of the associated known biologic comprise(s) a proportion of amino acids within the associated known biologic having a particular classification based on one or more specific properties, wherein the one or more specific properties comprise(s) one or more members selected from the group consisting of hydrophobicity, hydrophilicity, charge, acidity, and aromaticity.
159. The system of any one of claims 154 to 158, wherein the one or more GBAs of the associated known biologic comprise(s) one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more fragmentation techniques comprise(s) one or more members selected from the group consisting of enzymatic digestion, chemical digestion, and gas-phase fragmentation.
160. The system of any one of claims 154 to 159, wherein the one or more GBAs of the associated known biologic comprise(s) one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more metrics comprise(s) one or more members selected from the group consisting of: a likelihood of enzymatic cleavage resulting from digestion with select proteolytic enzymes for which cleavage sites are highly specific, whether the enzymes are applied singly, serially or simultaneously;
a fragmentation partem; and
a statistical distribution of fragments by fragment length and/or by fragment molecular weight.
161. The system of any one of claims 154 to 160, wherein the one or more GBAs of the associated known biologic comprise(s) one or more bioprocess attributes representing parameters of a biomanufacturing process used to produce the associated known biologic.
162. The system of claim 161, wherein the one or more bioprocess attributes comprise(s) one or more members selected from the group consisting of:
an identification of a cell culture type used to produce the associated known biologic; and
an identification of a purification stage performed subsequently to harvesting a cell culture supernatant.
163. The system of any one of claims 154 to 162, wherein the one or more GBAs of the associated known biologic comprise(s) one or more study class attributes, each of which identifies a type of analytical study performed on the associated known biologic using an analytical method comprising the analytical stage that the analytical stage record represents.
164. The system of claim 163, wherein at least one of the one or more study class attributes corresponds to an identifier of a specific analytical study type selected from the group consisting of:
determination of a molecular weight of the target biologic;
determination of a primary structure of the target biologic;
determination of post-translational modifications;
determination of one or more higher order structures of the target biologic;
comparison of the target biologic with a reference biologic;
a lot comparison study;
determination of a critical quality attribute (CQA) map of the target biologic; and determination of an in vivo comparability profile of the target biologic.
165. The system of any one of claims 154 to 164, wherein the analytical stage record of the method store corresponds to a specific analytical stage selected from the group consisting of: a separation stage;
a detection stage;
a mass spectrometry stage;
a digestion strategy; and
a sample preparation stage.
166. The system of any one of claims 154 to 165, wherein the instructions cause the processor to extract at least one of the identifier and one or more parameter values of the series of parameter values from published documents via automated processing using text mining and/or natural language processing.
167. The system of any one of claims 154 to 166, wherein the instructions cause the processor to extract at least one of the identifier and one or more parameter values of the series of parameter values from published documents via automated processing in combination with a user interaction.
168. The system of any one of claims 154 to 167, wherein the instructions cause the processor to obtain the identifier and series of parameter values from an in-house study in an automated fashion via dedicated software as part of a laboratory information management system.
169. A system of automatically identifying analytical study design parameters for analysis of a target biologic, comprising:
a processor; and
a memory having instructions stored thereon, wherein the instructions, when executed by the processor cause the processor to:
(a) receive an input query comprising one or more generalizable biologic attributes (GBAs) of the target biologic, wherein the one or more GBAs comprise(s) one or more study class attributes;
(b) access a method store comprising a plurality of analytical method records, each analytical method record corresponding to a specific analytical method used in an analytical study for structural characterization of an associated known biologic, wherein:
(i) each analytical method record comprises a sequence of analytical stage records, each representing a specific analytical stage used in the analytical method that the analytical method record represents; and (ii) each analytical method record is linked to one or more GBAs of the associated known biologic;
(c) determine responsive to the input query, one or more study design results based on the GBAs of the target biologic and the one or more study class attributes, wherein step (c) is performed using a machine learning module that identifies patterns linking the GBAs with the analytical method records of the method store; and
(d) provide the one or more study design results for display and/or further processing.
170. The system of claim 169, wherein the one or more GBAs of the target biologic from the input query are determined based on one or more members selected from the group consisting of an amino acid sequence, a molecule type; and a known identification of one or more disulfide linkage sites.
171. The system of claim 169, wherein the one or more GBAs of the target biologic comprise(s) one or more members selected from the group consisting of:
a sequence fragment;
a molecular weight of the target biologic;
a molecule type of the target biologic;
a quantification of one or more specific amino acids within the target biologic;
a proportion of amino acids within the target biologic having a particular
classification based on one or more specific properties;
an identification and/or quantification of patterns of amino acid motifs associated with propensity towards one or more specific types of amino acid modifications; and one or more metrics corresponding to a result of applying one or more sample handling techniques, including one or more fragmentation techniques, to the target biologic.
172. The system of claim 171, wherein the one or more specific types of amino acid modifications comprise(s) at least one member selected from the group consisting of (i) to (vii) as follows: (i) positions and/or number of potential sites of oxidation; (ii) positions and/or number of potential sites of deamidation; (iii) positions and/or number of potential sites of post-translational modifications; (iv) positions and/or number of potential sites of N- terminal modification; (v) positions and/or number of potential sites of C-terminal modifications; (vi) positions and/or number of potential instances of isomerization; and (vii) positions and/or number of potential instances of racemization.
173. The system of claim 172, wherein the one or more specific types of amino acid modifications comprise(s) positions and/or number of potential sites of post-translational modifications, wherein the potential sites of post-translational modifications comprise at least one member selected from the group consisting of N-linked glycosylation; disulfide bridges; disulfide knots; modification of cysteine to formylglycine.
174. The system of any one of claims 169 to 173, wherein the one or more GBAs of the target biologic comprise(s) a proportion of amino acids within the target biologic having a particular classification based on one or more specific properties, wherein the one or more specific properties comprise(s) one or more members selected from the group consisting of hydrophobicity, hydrophilicity, charge, acidity, and aromaticity.
175. The system of any one of claims 169 to 174, wherein the one or more GBAs of the target biologic comprise(s) one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more fragmentation techniques comprise(s) one or more members selected from the group consisting of enzymatic digestion, chemical digestion, and gas-phase fragmentation.
176. The system of any one of claims 169 to 175, wherein the one or more GBAs of the target biologic comprise(s) one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more metrics comprise(s) one or more members selected from the group consisting of:
a likelihood of enzymatic cleavage resulting from digestion with select proteolytic enzymes for which cleavage sites are highly specific, whether the enzymes are applied singly, serially or simultaneously;
a fragmentation partem; and
a statistical distribution of fragments by fragment length and/or by fragment molecular weight.
177. The system of any one of claims 169 to 176, wherein the one or more GBAs of the target biologic comprise(s) one or more bioprocess attributes representing parameters of a bioprocess used to produce the target biologic.
178. The system of claim 177, wherein the one or more bioprocess attributes comprise(s) one or more members selected from the group consisting of:
an identification of a cell culture type used to produce the target biologic; and an identification of a purification stage performed subsequently to harvesting a cell culture supernatant.
179. The system of any one of claims 169 to 178, wherein the one or more GBAs of the associated known biologic comprise(s) one or more members selected from the group consisting of:
a sequence fragment;
a molecular weight of the associated known biologic;
a molecule type of the associated known biologic;
a quantification of one or more specific amino acids within the associated known biologic;
a proportion of amino acids within the associated known biologic having a particular classification based on one or more specific properties;
an identification and/or quantification of patterns of amino acid motifs associated with propensity towards one or more specific types of amino acid modifications; and
one or more metrics corresponding to a result of applying one or more sample handling techniques, including one or more fragmentation techniques, to the associated known biologic.
180. The system of claim 179, wherein the one or more specific types of amino acid modifications comprise(s) at least one member selected from the group consisting of (i) to (vii) as follows: (i) positions and/or number of potential sites of oxidation; (ii) positions and/or number of potential sites of deamidation; (iii) positions and/or number of potential sites of post-translational modifications; (iv) positions and/or number of potential sites of N- terminal modification; (v) positions and/or number of potential sites of C-terminal modifications; (vi) positions and/or number of potential instances of isomerization; and (vii) positions and/or number of potential instances of racemization.
181. The method of claim 180, wherein the one or more specific types of amino acid modifications comprise(s) positions and/or number of potential sites of post-translational modifications, wherein the potential sites of post-translational modifications comprise at least one member selected from the group consisting of N-linked glycosylation; disulfide bridges; disulfide knots; modification of cysteine to formylglycine.
182. The system of any one of claims 169 to 181, wherein the one or more GBAs of the associated known biologic comprise(s) a proportion of amino acids within the associated known biologic having a particular classification based on one or more specific properties, wherein the one or more specific properties comprise(s) one or more members selected from the group consisting of hydrophobicity, hydrophilicity, charge, acidity, and aromaticity.
183. The system of any one of claims 169 to 182, wherein the one or more GBAs of the associated known biologic comprise(s) one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more fragmentation techniques comprise(s) one or more members selected from the group consisting of enzymatic digestion, chemical digestion, and gas-phase fragmentation.
184. The system of any one of claims 169 to 183, wherein the one or more GBAs of the associated known biologic comprise(s) one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more metrics comprise(s) one or more members selected from the group consisting of: a likelihood of enzymatic cleavage resulting from digestion with select proteolytic enzymes for which cleavage sites are highly specific, whether the enzymes are applied singly, serially or simultaneously;
a fragmentation partem; and
a statistical distribution of fragments by fragment length and/or by fragment molecular weight.
185. The system of any one of claims 169 to 184, wherein the one or more GBAs of the associated known biologic comprise(s) one or more bioprocess attributes representing parameters of a biomanufacturing process used to produce the associated known biologic.
186. The system of claim 185, wherein the one or more bioprocess attributes comprise(s) one or more members selected from the group consisting of:
an identification of a cell culture type used to produce the associated known biologic; and
an identification of a purification stage performed subsequently to harvesting a cell culture supernatant.
187. The system of any one of claims 169 to 186, wherein the instructions further cause the processor to:
receive a user input comprising one or more known or expected structural features of the target biologic;
determine using the one or more known or expected structural features of the target biologic, the one or more GBAs of the target biologic; and provide the determined one or more target biologic GBAs via the input query for automated identification of analytical study design parameters for analysis of the target biologic.
188. The method of claim 187, wherein the one or more known or expected structural features of the target biologic comprise(s) at least one feature selected from the group consisting of an amino acid sequence; locations of disulfide bonds; locations and/or types of glycan structures attached to the target biologic.
189. The system of either of claims 187 or 188, wherein the received user input comprises an identification of a molecule type of the target biologic and the identification of the molecule type is used as a GBA of the one or more GBAs of the target biologic.
190. The system of any one of claims 169 to 189, wherein the input query comprises one or more bioprocess parameters that represent properties of the bioprocess used to produce the target biologic and the one or more study design results determined in step (c) based further on the one or more bioprocess parameters.
191. The system of any one of claims 169 to 190, wherein at least one of the one or more study class attributes corresponds to an identifier of a specific analytical study type selected from the group consisting of:
determination of a molecular weight of the target biologic;
determination of a primary structure of the target biologic;
determination of post-translational modifications;
determination of one or more higher order structures of the target biologic; comparison of the target biologic with a reference biologic;
a lot comparison study;
determination of a critical quality attribute (CQA) map of the target biologic; and determination of an in vivo comparability profile of the target biologic;
192. The system of any one of claims 169 to 191, wherein each of one or more of the analytical method records of the method store comprises or is linked to one or more prior study class attributes that identify the structural characterization study in which the analytical method that the analytical method record represents was implemented.
193. The system of any one of claims 169 to 192, wherein each of one or more of the analytical method records of the method store corresponds to a specific analytical method comprising at least one analytical stage selected from the group consisting of:
a separation stage;
a detection stage;
a mass spectrometry stage;
a digestion strategy; and
a sample preparation stage.
194. The system of any one of claims 169 to 193, wherein at least a portion of the plurality of analytical method records were created from published documents via automated processing using text mining and/or natural language processing.
195. The system of any one of claims 169 to 194, wherein at least a portion of the plurality of analytical method records were created from published documents via automated processing in combination with a user interaction.
196. The system of any one of claims 169 to 195, wherein at least a portion of the analytical method records were created from in-house studies in an automated fashion via dedicated software as part of a laboratory information management system.
197. The system of any one of claims 169 to 196, wherein:
each determined study design result comprises a set of analytical method results, each representing a specific analytical method to be applied to the target biologic, and comprising a list of parameters to be used when applying the analytical method that the analytical method result represents to the target biologic, and
the analytical method results of a given study design result are determined via a machine learning module that receives as input the GBAs of the target biologic and determines the set of analytical method results and, for each analytical method result, parameter values associated with that method, based on patterns identified using the GBAs associated with analytical method records of the method store.
198. The system of claim 197, wherein the instructions further cause the processor to: compute, by the machine learning module, relevant analytical stage record by matching the GBAs of the target biologic with GBAs of a subset of the analytical stage records according to an identified pattern in GBAs of the analytical stage records.
199. The system of any one of claims 169 to 198, wherein the machine learning module implements a supervised machine learning technique.
200. The system of claim 199, wherein the plurality examples of correct output results comprise previously determined study design results or previously determined analytical stage results.
201. The system of claim 200, wherein the set of input features comprises one or more GBAs.
202. The system of any one of claims 169 to 201, wherein the machine learning module implements a reinforcement machine learning technique.
203. The system of claim 202, wherein the reinforcement machine learning technique comprises:
determining study design results using a set of training data that comprising a plurality of analytical stage records; and
for each example outputting a performance index.
204. The system of any one of claims 169 to 203, wherein the machine learning module implements an unsupervised machine learning technique.
205. The system of claim 204, wherein the machine learning module implements the unsupervised machine learning technique as a precursor to a supervised machine learning technique.
206. The system of any one of claims 169 to 205, wherein the instructions further cause the processor to:
receive a user input corresponding to a modification of a particular study design result of the one or more determined study design results;
update the particular study design result according to the user input; and
store one or more analytical method results of the updated study design result as analytical method records in the method store.
207. The system of claim 206, wherein the instructions cause the processor to use the stored analytical method results of the updated study design result as training data in a supervised machine learning technique implemented by the machine learning module of step (c).
208. The system of any one of claims 169 to 207, wherein the instructions cause the processor to, for at least one study design result of the determined study design results:
cause display of at least one of a graphical control element corresponding to an analytical method result of the study design result;
receive via a user interaction with the with the graphical control element, a user input corresponding to a modification of the analytical method result;
responsive to the received user input, update the particular study design result according to the user input; and
store one or more analytical method results of the updated study design result as analytical method records in the method store.
209. The system of any one of claims 169 to 208, wherein the one or more study design results comprise(s) one or more documents corresponding to software file(s) that specify parameters for analytical instruments.
210. A system of populating a method store corresponding to a database of records representing analytical methods and/or analytical stages thereof having been previously applied in structural characterization of a plurality of known biologies, comprising:
a processor; and
a memory having instructions stored thereon, wherein the instructions, when executed by the processor cause the processor to:
(a) create an analytical method record corresponding to a specific analytical method having been used in an analytical study for structural
characterization of an associated known biologic, wherein the analytical method record comprises a sequence of analytical stage records, each representing a specific analytical stage used in the analytical method that the analytical method record represents;
(b) store the analytical method record in the method store;
(c) store one or more generalized biologic attributes (GBAs) of the associated known biologic in the method store; and
(d) link the one or more known biologic GBAs with the analytical method record.
21 1. The system of claim 210, wherein the one or more GBAs of the associated known biologic comprise(s) one or more members selected from the group consisting of:
a sequence fragment; a molecular weight of the associated known biologic;
a molecule type of the associated known biologic;
a quantification of one or more specific amino acids within the associated known biologic;
a proportion of amino acids within the associated known biologic having a particular classification based on one or more specific properties;
an identification and/or quantification of patterns of amino acid motifs associated with propensity towards one or more specific types of amino acid modifications; and
one or more metrics corresponding to a result of applying one or more sample handling techniques, including one or more fragmentation techniques, to the associated known biologic.
212. The system of claim 21 1, wherein the one or more specific types of amino acid modifications comprise(s) at least one member selected from the group consisting of (i) to (vii) as follows: (i) positions and/or number of potential sites of oxidation; (ii) positions and/or number of potential sites of deamidation; (iii) positions and/or number of potential sites of post-translational modifications; (iv) positions and/or number of potential sites of N- terminal modification; (v) positions and/or number of potential sites of C-terminal modifications; (vi) positions and/or number of potential instances of isomerization; and (vii) positions and/or number of potential instances of racemization.
213. The system of claim 212, wherein the one or more specific types of amino acid modifications comprise(s) positions and/or number of potential sites of post-translational modifications, wherein the potential sites of post-translational modifications comprise at least one member selected from the group consisting of N-linked glycosylation; disulfide bridges; disulfide knots; modification of cysteine to formylglycine.
214. The system of any one of claims 210 to 213, wherein the one or more GBAs of the associated known biologic comprise(s) a proportion of amino acids within the associated known biologic having a particular classification based on one or more specific properties, wherein the one or more specific properties comprise(s) one or more members selected from the group consisting of hydrophobicity, hydrophilicity, charge, acidity, and aromaticity.
215. The system of any one of claims 210 to 214, wherein the one or more GBAs of the associated known biologic comprise(s) one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more fragmentation techniques comprise(s) one or more members selected from the group consisting of enzymatic digestion, chemical digestion, and gas-phase fragmentation.
216. The system of any one of claims 210 to 215, wherein the one or more GBAs of the associated known biologic comprise(s) one or more metrics corresponding to a result of applying one or more fragmentation techniques to the target biologic, wherein the one or more metrics comprise(s) one or more members selected from the group consisting of:
a likelihood of enzymatic cleavage resulting from digestion with select proteolytic enzymes for which cleavage sites are highly specific, whether the enzymes are applied singly, serially or simultaneously;
a fragmentation partem; and
a statistical distribution of fragments by fragment length and/or by fragment molecular weight.
217. The system of any one of claims 210 to 216, wherein the one or more GBAs of the associated known biologic comprise(s) one or more bioprocess attributes representing parameters of a biomanufacturing process used to produce the associated known biologic.
218. The system of claim 217, wherein the one or more bioprocess attributes comprise(s) one or more members selected from the group consisting of:
an identification of a cell culture type used to produce the associated known biologic; and
an identification of a purification stage performed subsequently to harvesting a cell culture supernatant.
219. The system of any one of claims 210 to 218, wherein the one or more GBAs of the associated known biologic comprise(s) one or more study class attributes, each of which identifies a type of analytical study performed on the associated known biologic using an analytical method that the analytical method record represents.
220. The system of claim 219, wherein at least one of the one or more study class attributes corresponds to an identifier of a specific analytical study type selected from the group consisting of:
determination of a molecular weight of the target biologic;
determination of a primary structure of the target biologic;
determination of post-translational modifications;
determination of one or more higher order structures of the target biologic;
comparison of the target biologic with a reference biologic; a lot comparison study;
determination of a critical quality attribute (CQA) map of the target biologic; and determination of an in vivo comparability profile of the target biologic.
221. The system of any one of claims 210 to 220, wherein the analytical method record of the method store corresponds to a specific analytical method comprising at least one analytical stage selected from the group consisting of:
a separation stage;
a detection stage;
a mass spectrometry stage;
a digestion strategy; and
a sample preparation stage.
222. The system of any one of claims 210 to 221, wherein the instructions cause the processor to extract at least one of the identifier and one or more parameter values of the series of parameter values from published documents via automated processing using text mining and/or natural language processing.
223. The system of any one of claims 210 to 222, wherein the instructions cause the processor to extract at least one of the identifier and one or more parameter values of the series of parameter values from published documents via automated processing in combination with a user interaction.
224. The system of any one of claims 210 to 223, wherein the instructions cause the processor to obtain the identifier and series of parameter values from an in-house study in an automated fashion via dedicated software as part of a laboratory information management system.
225. A computer-implemented method of automatically identifying analytical stage parameters for analysis of a target biologic, the method comprising:
(a) receiving, by a processor of a computing device, an input query comprising (i) a molecule type, (ii) one or more bioprocess steps, (iii) an amino acid sequence of the target biologic, and (iv) one or more study class attributes;
(b) determining, by the processor, one or more generalized biologic attributes (GBAs) of the target biologic based on the input query;
(c) accessing, by the processor, a method store comprising a plurality of analytical stage records;
(d) determining, by the processor, responsive to the input query and the one or more GBAs of the target biologic, (i) one or more analytical stages selected from the plurality of analytical stage records in the method store and (ii) one or more analytical stage parameters associated with the one or more analytical stages; and
(d) providing, by the processor, the one or more analytical stages and the one or more analytical stage parameters for display and/or further processing.
226. The method of claim 225, wherein the molecule type is a Fc-fusion protein.
227. The method of claim 226, wherein the one or more bioprocess steps comprise(s) one or more members selected from the group consisting of production via a Chinese Hamster Ovary cell culture, protein A purification, and hydrophobic interaction chromatography.
228. The method of claim 227, wherein the one or more study class attributes comprise(s) one or more members selected from the group consisting of N-linked glycan characterization, N-linked glycan quantification, and lot-release.
229. The method of claim 228, wherein the one or more GBAs comprise(s) one or more members selected from the group consisting of locations of N-linked glycosylation sites and a percentage of hydrophobic amino acids on the fusion portion of the target biologic.
230. The method of claim 229, wherein the percentage of hydrophobic amino acid on the fusion portion of the target biologic is between 40% and 60%.
231. The method of claim 229, wherein the one or more analytical stages comprises one or more members selected from the group consisting of a protein denaturation stage, a protein reduction stage, an enzymatic digestion stage, a detection stage, a separation stage, and a mass spectrometry stage.
232. The method of claim 231, wherein the protein denaturation stage is performed with PS-20 detergent.
233. The method of claim 232, wherein a concentration of the PS-20 detergent is 0.1%.
234. The method of claim 231, wherein the protein reduction stage is performed with a DTT solution.
235. The method of claim 234, wherein a concentration of the DTT solution is 5mM.
236. The method of claim 231, wherein the detection stage is performed with 2-AB fluorescent labeling agent.
237. The method of claim 231, wherein the separation stage is performed with acetonitrile and ammonium acetate.
238. The method of claim 226, wherein the one or more bioprocess steps comprise(s) production via an E. coli cell culture.
239. The method of claim 238, wherein the one or more study class attributes comprise(s) one or more members selected from the group consisting of a disulfide linkage
characterization and a free cysteine characterization.
240. The method of claim 239, wherein the one or more GBAs comprises one or more members selected from the group consisting of a molecular weight and a total number of cysteines of the target biologic.
241. The method of claim 240, wherein the molecular weight is 50 kDa.
242. The method of claim 240, wherein the total number of cysteines of the target biologic is 12.
243. The method of claim 240, wherein the one or more analytical stages comprises one or more members selected from the group consisting of a protein denaturation stage, a protein reduction stage, an enzymatic digestion stage, a detection stage, a separation stage, and a mass spectrometry stage.
244. The method of claim 243, wherein the protein denaturation stage is performed with either a GuHCl solution or a urea solution.
245. The method of claim 244, wherein a concentration of the GuHCl solution is 6M.
246. The method of claim 244, wherein a concentration of the urea solution is 8M.
247. The method of claim 243, wherein the enzymatic digestion stage is performed with trypsin and/or pepsin.
248. The method of claim 243, wherein the separation stage is performed with either a C 12 column or a C 18 column.
249. The method of claim 243, wherein the separation stage is performed with either a C4 column or a low hydrophobicity polymer-based column.
PCT/US2018/032217 2017-05-15 2018-05-11 Systems and methods for automated design of an analytical study for the structural characterization of a biologic composition WO2018213112A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201762506443P 2017-05-15 2017-05-15
US62/506,443 2017-05-15

Publications (1)

Publication Number Publication Date
WO2018213112A1 true WO2018213112A1 (en) 2018-11-22

Family

ID=64274831

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2018/032217 WO2018213112A1 (en) 2017-05-15 2018-05-11 Systems and methods for automated design of an analytical study for the structural characterization of a biologic composition

Country Status (1)

Country Link
WO (1) WO2018213112A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021176106A1 (en) * 2020-03-06 2021-09-10 National Institute For Bioprocessing Research And Training A system for producing a biopharmaceutical product
WO2023154943A1 (en) * 2022-02-14 2023-08-17 Venn Biosciences Corporation De novo glycopeptide sequencing

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6108635A (en) * 1996-05-22 2000-08-22 Interleukin Genetics, Inc. Integrated disease information system
US20060047616A1 (en) * 2004-08-25 2006-03-02 Jie Cheng System and method for biological data analysis using a bayesian network combined with a support vector machine
US20080228722A1 (en) * 2007-03-16 2008-09-18 Expanse Networks, Inc. Attribute Prediction Using Attribute Combinations
US20110137841A1 (en) * 2008-08-05 2011-06-09 Fujitsu Limited Sample class prediction method, prediction program, and prediction apparatus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6108635A (en) * 1996-05-22 2000-08-22 Interleukin Genetics, Inc. Integrated disease information system
US20060047616A1 (en) * 2004-08-25 2006-03-02 Jie Cheng System and method for biological data analysis using a bayesian network combined with a support vector machine
US20080228722A1 (en) * 2007-03-16 2008-09-18 Expanse Networks, Inc. Attribute Prediction Using Attribute Combinations
US20110137841A1 (en) * 2008-08-05 2011-06-09 Fujitsu Limited Sample class prediction method, prediction program, and prediction apparatus

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JENKINS ET AL.: "In silico target fishing: Predicting biological targets from chemical structure", DRUG DISCOVERY TODAY: TECHNOLOGIES, vol. 3, no. 4, 2006, pages 413 - 421, XP005873827, [retrieved on 20180809] *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021176106A1 (en) * 2020-03-06 2021-09-10 National Institute For Bioprocessing Research And Training A system for producing a biopharmaceutical product
WO2023154943A1 (en) * 2022-02-14 2023-08-17 Venn Biosciences Corporation De novo glycopeptide sequencing

Similar Documents

Publication Publication Date Title
Srzentic et al. Interlaboratory study for characterizing monoclonal antibodies by top-down and middle-down mass spectrometry
Dotz et al. Mass spectrometry for glycosylation analysis of biopharmaceuticals
Rathore et al. The role of mass spectrometry in the characterization of biologic protein products
Xu et al. Automated measurement of site‐specific N‐glycosylation occupancy with SWATH‐MS
Zhang et al. Protein analysis by shotgun/bottom-up proteomics
Suckau et al. T3-sequencing: targeted characterization of the N-and C-termini of undigested proteins by mass spectrometry
Zhang Large-scale identification and quantification of covalent modifications in therapeutic proteins
Tsai et al. Mass spectrometry-based strategies for protein disulfide bond identification
US20160266141A1 (en) Mass spectrometry-based method for identifying and maintaining quality control factors during the development and manufacture of a biologic
Nepomuceno et al. Accurate identification of deamidated peptides in global proteomics using a quadrupole orbitrap mass spectrometer
US20170336419A1 (en) Methods and systems for assembly of protein sequences
Delcourt et al. Spatially-resolved top-down proteomics bridged to MALDI MS imaging reveals the molecular physiome of brain regions
US20060269945A1 (en) Constellation mapping and uses thereof
Schmitt et al. Increasing top-down mass spectrometry sequence coverage by an order of magnitude through optimized internal fragment generation and assignment
Ren et al. N-Glycan structure annotation of glycopeptides using a linearized glycan structure database (GlyDB)
US11835434B2 (en) Methods for absolute quantification of low-abundance polypeptides using mass spectrometry
US20190073452A1 (en) Method for determining the in vivo comparability of a biologic drug and a reference drug
Xie et al. Characterization of protein impurities and site-specific modifications using peptide mapping with liquid chromatography and data independent acquisition mass spectrometry
Tian et al. A versatile isobaric tag enables proteome quantification in data-dependent and data-independent acquisition modes
Wilson et al. Online hydrophilic interaction chromatography (HILIC) enhanced top-down mass spectrometry characterization of the SARS-CoV-2 spike receptor-binding domain
Takemori et al. BAC-DROP: Rapid digestion of proteome fractionated via dissolvable polyacrylamide gel electrophoresis and its application to bottom-up proteomics workflow
Gallagher et al. Isotope depletion mass spectrometry (ID-MS) for accurate mass determination and improved top-down sequence coverage of intact proteins
WO2018213112A1 (en) Systems and methods for automated design of an analytical study for the structural characterization of a biologic composition
Hu et al. Comprehensive peptidome analysis of mouse livers by size exclusion chromatography prefractionation and nanoLC− MS/MS identification
Thakur et al. Identification, characterization and control of a sequence variant in monoclonal antibody drug product: a case study

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18803018

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18803018

Country of ref document: EP

Kind code of ref document: A1