WO2023191787A1

WO2023191787A1 - Recommendation for operations and asset failure prevention background

Info

Publication number: WO2023191787A1
Application number: PCT/US2022/022679
Authority: WO
Inventors: Yongqiang Zhang; Hareesh Kumar Reddy Kommepalli; Wei Lin
Original assignee: Hitachi Vantara Llc
Priority date: 2022-03-30
Filing date: 2022-03-30
Publication date: 2023-10-05

Abstract

A method for performing a failure prediction operation for an industrial process is disclosed. The method may include collecting a set of physical sensor data and generating a set of virtual sensor data by applying a physics-based model to a. subset of the set of physical sensor data. The method may also include identifying a first set of features from the set of physical sensor data, and the set of virtual sensor data. The method may further include identifying, using a first set of machine-trained models applied to the first set of features, a set of anomaly detection scores, a first set of contributing factors to the set of anomaly detection scores, and a second set of features that have a feature importance score that is above a threshold. The method may also include generating at least one failure prediction model based on the first set of machine-trained models and applying the at least one failure prediction model to the set of physical sensor data and the set of virtual sensor data to calculate a likelihood score related, to a predicted failure of an asset.

Description

RECOMMENDATION FOR OPERATIONS AND ASSET FAILURE PREVENTION BACKGROUND Field The present disclosure is generally directed to automated failure prediction in industrial systems. Related Art Automated failure prediction may be relevant to many complex industrial systems in many different industrial contexts (e.g., industries). For example, the different industrial contexts may include, but are not limited to, manufacturing, entertainment (e.g., theme parks), hospitals, airports, utilities, mining, oil & gas, warehouse, and transportation systems. Two major failure types may be defined by how distant the failure is in terms of the time of the failure from its symptoms. A short-term failure type may relate to symptoms and failures that are close in terms of time (e.g., several hours or days). For example, overloading failures on conveyor belts may be a short-term failure type associated with symptoms that are close in time. A long-term failure type may relate to symptoms that are distant in time from the failures (e.g., several weeks, months, or years). A long-term failure type of system-component failure often develops slowly and chronically, and may have a wider negative impact, such as shutting down a whole system including the failed component. For example, a long-term failure may include fracture and crack on a dam, or a component failure due to metal fatigue. Failures in industrial systems may be rare, but the cost of such failures may incur significant (or massive) financial (e.g., operational, maintenance, repair, logistics, etc.) costs, reputational (e.g., marketing, market share, sale, quality, etc.) costs, human (e.g., scheduling, skill set, etc.) costs, and/or liability (e.g., safety, health, etc.) costs. Some industrial systems may not have a failure detection process in place. In other industrial systems, detection of failures is performed manually based on domain knowledge based on real-time sensor data. Manual failure prediction and/or detection may pose great challenges when dealing with high frequency data (e.g., vibration sensor data, IoT data, etc.). Even if the failures can be detected and/or predicted, the failure may be too late to remediate or recover from as it may have already occurred or be very close to occurring. Thus, there is a need to predict the failure at a time that allows operators or technicians to have enough time to respond to the failures, remediate the failures, or even avoid the failures. Prediction of failures ahead of time can help avoid the failures and reduce the loss and/or the negative impacts resulting from the failures. Failure data, in some aspects, may be costly to collect and the data related to the failure may not be collected at all, or the failure data may be inaccurate, incomplete, and/or unreliable. This poses a challenge to build supervised solution which relies on the failure data as labels. Accordingly, a system that collects appropriate data and uses the collected data to predict one or more of a short-term or long-term failure in time to reduce or avoid the costs associated with the predicted failures is presented below. In some aspects, the system may further identify a set of root causes for the failures and may use the root causes as additional information to help diagnose the failures, and take actions to remediate or avoid the failures. In some aspects, the root cause analysis may be performed by the system without manual inspection of the components or visualizations of the sensor data and metrics (e.g., because the manual inspection may be time-consuming, costly, and/or error prone). The set of root causes, in some aspects, is a set of root causes identified at the sensor data level (e.g., data collected regarding a temperature, a level of vibration, or some other measured/monitored characteristic associated with a failure). SUMMARY Example implementations described herein include an innovative method. The method may include collecting a set of physical sensor data. The method may further include generating a set of virtual sensor data by applying a physics-based model to a subset of the set of physical sensor data. The method may also include identifying a first set of features from the set of physical sensor data and the set of virtual sensor data by performing at least one of a sampling operation, an aggregating operation, or a feature derivation operation on the set of physical sensor data and the set of virtual sensor data based on an optimized sampling rate or optimized aggregation statistics. The method may further include identifying, using a first set of machine- trained models applied to the first set of features, a set of anomaly detection scores, a first set of contributing factors to the set of anomaly detection scores, and a second set of features that have a feature importance score that is above a threshold. The method may further include generating at least one failure prediction model based on the first set of machine-trained models. The method may also include applying the at least one failure prediction model to the set of physical sensor data and the set of virtual sensor data to calculate a likelihood score related to a predicted failure of an asset. Example implementations described herein include an innovative computer-readable medium storing computer executable code. The computer executable code may include instructions for collecting a set of physical sensor data. The computer executable code may also include instructions for generating a set of virtual sensor data by applying a physics-based model to a subset of the set of physical sensor data. The computer executable code may further include instructions for identifying a first set of features from the set of physical sensor data and the set of virtual sensor data by performing at least one of a sampling operation, an aggregating operation, or a feature derivation operation on the set of physical sensor data and the set of virtual sensor data based on an optimized sampling rate or optimized aggregation statistics. The computer executable code may also include instructions for identifying, using a first set of machine-trained models applied to the first set of features, a set of anomaly detection scores, a first set of contributing factors to the set of anomaly detection scores, and a second set of features that have a feature importance score that is above a threshold. The computer executable code may also include instructions for generating at least one failure prediction model based on the first set of machine-trained models. The computer executable code may further include instructions for applying the at least one failure prediction model to the set of physical sensor data and the set of virtual sensor data to calculate a likelihood score related to a predicted failure of an asset. Example implementations described herein include an innovative apparatus. The apparatus may include a memory and at least one processor configured to collect a set of physical sensor data. The at least one processor may also be configured to generate a set of virtual sensor data by applying a physics-based model to a subset of the set of physical sensor data. The at least one processor may further be configured to identify a first set of features from the set of physical sensor data and the set of virtual sensor data by performing at least one of a sampling operation, an aggregating operation, or a feature derivation operation on the set of physical sensor data and the set of virtual sensor data based on an optimized sampling rate or optimized aggregation statistics. The at least one processor may also be configured to identify, using a first set of machine-trained models applied to the first set of features, a set of anomaly detection scores, a first set of contributing factors to the set of anomaly detection scores, and a second set of features that have a feature importance score that is above a threshold. The at least one processor may also be configured to generate at least one failure prediction model based on the first set of machine-trained models. The at least one processor may further be configured to apply the at least one failure prediction model to the set of physical sensor data and the set of virtual sensor data to calculate a likelihood score related to a predicted failure of an asset. BRIEF DESCRIPTION OF DRAWINGS FIG. 1 is a diagram illustrating a basic concept of a circular economy in accordance with some aspects of the disclosure. FIG. 2A is a diagram illustrating components of a physics-based model in accordance with some aspects of the disclosure. FIG.2B is a diagram illustrating a feature engineering module in accordance with some aspects of the disclosure. FIG. 3 is a flow diagram illustrating an example workflow for optimizing at least one of the sampling rate and the aggregation statistics in accordance with some aspects of the disclosure. FIG.4 is a diagram illustrating a set of components of an implementation of an anomaly detection module (e.g., an unsupervised anomaly detection model) in accordance with some aspects of the disclosure. FIG. 5 is a diagram illustrating a set of operations associated with short-term failure prediction. FIG. 6 is a diagram illustrating a set of operations associated with generating a long- term failure prediction score to identify and predict the potential long-term failures, in accordance with some aspects of the disclosure. FIG. 7 is a diagram illustrating a set of operations for identifying a set of root causes associated with a short-term failure in accordance with some aspects of the disclosure. FIG. 8 is a flow diagram illustrating a set of operations for identifying a set of contributing factors associated with a long-term failure regarding an asset in one or more asset classes in accordance with some aspects of the disclosure. FIG. 9 is a flow diagram illustrating a for a system performing a failure prediction operation for an industrial process. FIG.10 is a flow diagram illustrating a method for identifying a set of root causes (e.g., contributing factors) for a failure prediction, in accordance with some aspects of the disclosure. FIG. 11 illustrates an example computing environment with an example computer device suitable for use in some example implementations. DETAILED DESCRIPTION The following detailed description provides details of the figures and example implementations of the present application. Reference numerals and descriptions of redundant elements between figures are omitted for clarity. Terms used throughout the description are provided as examples and are not intended to be limiting. For example, the use of the term “automatic” may involve fully automatic or semi-automatic implementations involving user or administrator control over certain aspects of the implementation, depending on the desired implementation of one of the ordinary skills in the art practicing implementations of the present application. Selection can be conducted by a user through a user interface or other input means, or can be implemented through a desired algorithm. Example implementations as described herein can be utilized either singularly or in combination and the functionality of the example implementations can be implemented through any means according to the desired implementations. In this disclosure, a system, an apparatus, and a method are presented that addresses the problem of automated prediction of failures (both short-term failures and long-term failures) and identification of root causes for the predicted failures in an industrial system with unlabeled high-frequency sensor data. In this disclosure, a system, an apparatus, and a method are presented that provide techniques related to a failure prediction operation for an industrial process. For example, the method may include collecting a set of physical sensor data. The method may further include generating a set of virtual sensor data by applying a physics-based model to a subset of the set of physical sensor data. The method may also include identifying a first set of features from the set of physical sensor data and the set of virtual sensor data by performing at least one of a sampling operation, an aggregating operation, or a feature derivation operation on the set of physical sensor data and the set of virtual sensor data based on an optimized sampling rate or optimized aggregation statistics. The method may further include identifying, using a first set of machine-trained models applied to the first set of features, a set of anomaly detection scores, a first set of contributing factors (e.g., root causes) to the set of anomaly detection scores, and a second set of features that have a feature importance score that is above a threshold. The method may also include generating at least one failure prediction model based on the first set of machine-trained models. The method may further include applying the at least one failure prediction model to the set of physical sensor data and the set of virtual sensor data to calculate a likelihood score related to a predicted failure of an asset. In some aspects, the system, apparatus, and method described herein may be directed to predicting both short-term and long-term failures and to deriving root causes for the predicted failures in order to mitigate or avoid the negative impacts before the failure. There may be several major benefits of failure prediction and prevention solutions. The failure prediction and prevention solutions, in some aspects, may reduce unplanned downtime and operating delays while increasing productivity, output, and operational effectiveness. In some aspects, the failure prediction and prevention solutions may optimize yields and increase margins/profits. The failure prediction and prevention solutions, in some aspects, may maintain consistency of production and product quality. In some aspects, the failure prediction and prevention solutions may reduce unplanned cost for logistics, scheduling maintenance, labor, and repair costs. The failure prediction and prevention solutions, in some aspects, may reduce damage to the assets and the whole industrial system. In some aspects, the failure prediction and prevention solutions may reduce accidents to operators and improve the health and safety of the operators. The proposed solutions generally provide benefits to all the entities that get involved within the industrial systems, including but not limited to: operators, supervisors/managers, maintenance technicians, SME/domain experts, assets, and the system itself. Several problems (limitations and restrictions) of conventional systems and methods are discussed below. Techniques to solve these problems are discussed herein. For example, conventional systems/methods may rely heavily on accurate historical failure data. However, the accurate historical failure data is usually not available for several reasons. For example, historical failure-related data may not be collected or may be inaccurate or incomplete, a process to collect failure data may not be in place or may not collect sufficient data to perform useful analysis, and/or the collected data (e.g., IoT data) may be too voluminous for manual processing, detection, and identification of failure data. Additionally, in some aspects, there may be no standard process to effectively and efficiently detect and classify both common and rare events. Manual processes to collect failures by labeling the sensor data based on the domain knowledge, in some aspects, may be inaccurate, inconsistent, unreliable, and time consuming. The processes in place to collect failure related data by industrial system operators , in some aspects, may not be sufficiently complete to identify and investigate the root cause. In some aspects, the insufficient data collection may often be due to a lack of understanding of how the data can help identify root cause at the time of data collection. Accordingly, some aspects provide an automated and standard process or approach to detect and collect failures accurately, effectively, and efficiently in the industrial systems. Conventional failure prediction solutions, in some aspects, do not perform well for rare failure events with an associated lead time (e.g., a lead time sufficient to implement a solution to remediate or prevent the predicted failure). In some aspects, a failure prediction solution may fail to perform well for a rare event because sensor data may be very biased toward good operating conditions. Due to rarity of such failures, in some aspects, it is very hard to build a supervised machine learning modeling with high precision and sensitivity. For example, in some aspects, it may be difficult to determine one or more optimal windows to collect features/evidence related to failures or to identify signals that may be used to predict failure. In some aspects, there may not be enough data to identify patterns from the limited amount of failure data. For example, an industrial system usually runs in a normal state and failures are usually rare events, and may be hard to capture patterns related to a limited number of the failures and thus hard to predict such failures. Accordingly, in some aspects, it may be difficult to build the correct relationship between normal cases and rare failure events in temporal order because it is difficult to capture the sequence pattern of the progression of rare failures. Therefore, the system, apparatus, and/or method may identify the correct signals (e.g., features) for failure prediction within optimal feature windows to provide the failure prediction with a sufficient lead time to address the predicted failure. The system, apparatus, and/or method may be capable of building/identifying the correct relationships between normal cases and rare failures, and the progression of rare failures for one or both of short-term failures and long-term failure. The system, apparatus, and/or method may provide automated root cause analysis. Root cause analysis of failures, may be performed manually based on domain knowledge and data visualization, which may be subjective, time consuming, and prone to errors. In some cases, the root causes may be associated with raw sensor data that is not addressed by domain knowledge or the data visualizations used for the manual root cause analysis. The system, apparatus, and/or method may provide automated root cause based on a standardized approach to identify the root cause of the predicted failures, and output the root causes at different levels (including the raw sensor data level). In some aspects, sensor data (e.g., IoT sensor data, vibrational data) may be high frequency data (e.g., 1000Hz to 3000Hz). High frequency data, in some aspects, poses challenges to build the solution for the failure prediction problem. For example, high frequency data may be associated with high levels of noise or long or resource consuming analysis (e.g., computing) times. A sampling frequency or aggregation window may require optimization for accurately predicting one of a short-term failure or a long-term failure. Accordingly, the system, apparatus, and/or method may provide a window optimization operation to identify an optimized window and or aggregation statistics for a failure prediction. In some aspects, the physical sensor data may not be able to capture all the signals that may be useful for monitoring the system due to the severe environment for sensor installation, the cost of the sensors, and/or the functions of the sensors. As a result, the collected data may not be sufficient to monitor the system health and capture the potential risks and failures. In some aspects, this inability to capture all the potentially useful signals may pose challenges to building a failure prediction solution. Accordingly, the system, apparatus, and/or method may enrich the physical sensor data in order to capture necessary signals to help with the system monitoring and building failure prediction solution. For example, the physical sensor data may be processed by a set of physics-based models to generate virtual sensor data. In some aspects, the system, apparatus, and/or method may implement and/or include several techniques to generate one or more failure prediction models and/or to identify a set of root causes. The techniques may include a semi-empirical approach utilizing one or more of a physics-based model and a data-driven machine learning model. A physics-based model, in some aspects, may be used to enrich the sensor data with additional features based on the physics of the system. For example, a torque on a system component (e.g., a joint on a robotic arm) may be calculated based on a set of physical data sensors including one or more of an accelerometer (linear or rotational), a force sensor, an IoT motor, or other such sensor associated with related components (e.g., a motor, a boom end, etc.). Further, a sampling and aggregation optimization approach may be used to sample and aggregate the high-frequency data to derive features from aggregated data. The techniques may further include one or more unsupervised failure prediction techniques and/or solutions. In some aspects, unsupervised failure prediction techniques and/or solutions may be based on sensor data without relying on historical failure data. For example, an unsupervised ensemble anomaly detection model may be used to derive (1) anomaly scores as labels and/or (2) features for the failure prediction models. The techniques and/or solutions may further include a supervised surrogate model for anomaly detection model for feature selection, root cause analysis, and model evaluation. In some aspects, a short-term failure prediction model may be based on the ensemble anomaly scores, selected features and root causes identified using the anomaly detection model, and may use one or more window-based feature derivation techniques to derive aggregated features and predict failures with lead time by using machine learning, e.g., a deep learning sequence prediction model (such as long short-term memory (LSTM) or gated recurrent units (GRU)). A long-term failure prediction model, in some aspects, may be based on the aggregate ensemble anomaly scores, selected features, and root causes from the anomaly detection model for each asset in a set of assets of the system. The long-term failure prediction model may then be built based on aggregated features from multiple assets. A root cause analysis for short-term failure and long-term failure may further be performed, in some aspects, to identify root causes for the predicted failures (short-term failure and long-term failure) at different levels. For example, root causes may be identified based on a detected anomaly score, a set of selected features, sensor data (e.g., physical or virtual sensor data), by using a chain of explainable AI models and aggregation/ranking algorithms. FIG. 1 is a diagram 100 illustrating conceptual elements of a solution architecture for semi-empirical unsupervised failure prediction and root cause analysis in accordance with some aspects of the disclosure. Diagram 100 includes a set of sensor data 110 that is collected from multiple physical sensors (e.g., high-frequency data from IoT sensors). The physical sensor data 110 may be provided to a physics-based model 120. The physics-based model 120 may be applied to the physical sensor data (or a subset of the physical sensor data) to generate virtual sensor data (e.g., data that enriches a data set for failure prediction and root cause analysis). The physical sensor data and the virtual sensor data may be provided to a feature engineering module 130 to sample and aggregate the sensor data (physical and virtual) and derive new features based on an optimized sampling rate and/or an optimized set of aggregation statistics/characteristics. In some aspects, the feature engineering module 130 may perform an optimization function (e.g., a machine learning based optimization) for one or more of the sampling rate or the set of aggregation statistics/characteristics before performing. For example, in some aspects, one of the sampling rate or the aggregation statistics/characteristics may be provided by a user based on domain knowledge while the other of the sampling rate or the aggregation statistics/characteristics may be optimized. Data generated and/or processed by the feature engineering module 130 may be provided to an anomaly detection module 140. The anomaly detection module 140 may use the data from the feature engineering module 130 to build multiple anomaly detection models and an ensemble of the anomaly detection models. The anomaly detection module 140 may build the multiple anomaly detection models and the ensemble of the anomaly detection models using machine learning operations during a first learning phase and may use the them in a second prediction (inference) phase. The anomaly detection module 140 may use the multiple anomaly detection models and the ensemble of the anomaly detection models to generate an ensemble anomaly score and to derive a root cause (e.g., an associated physical or virtual sensor) for each data point and to select a feature through a surrogate supervised model for the ensemble anomaly detection model. Data processed by the anomaly detection module 140 may be provided to a short-term failure prediction module 150. The short-term failure prediction module 150 may derive features with a look-back feature window and use deep learning sequence prediction model (e.g., LSTM or GRU) to predict failures ahead of time. Based on the output of the short-term failure prediction module 150 the system may identify and/or derive a set of root causes using a root cause analysis for short-term failure module 160. For example, for each predicted failure, the root cause analysis for short-term failure module 160 may derive the root causes with a chain of explainable AI technique and ranking/aggregation algorithms. Similarly, data processed by the anomaly detection module 140 may be provided to a long-term failure prediction module 170. The long-term failure prediction module 170 may derive features with aggregation techniques based on the anomaly scores, root causes, and selected features. The long-term failure prediction module 170 may use at least one additional anomaly detection model to identify and predict long term failures. Based on the output of the long-term failure prediction module 170 the system may identify and/or derive a set of root causes using a root cause analysis for long-term failure module 180. The root cause analysis for long-term failure module 180 may, in some aspects, build a surrogate supervised model for another anomaly detection model. In some aspects, for each predicted failure, the root cause analysis for long-term failure module 180 may use a surrogate supervised model to derive a set of root causes with a chain of explainable AI techniques and ranking/aggregation algorithms. In the following sections, each component in the solution architecture is discussed in detail. The specific methodologies used in association with the feature engineering module 130, the anomaly detection module 140, the short-term failure prediction module 150, the root cause analysis for short-term failure module 160, the long-term failure prediction module 170, and the root cause analysis for long-term failure module 180, is discussed below. In some aspects, the sensor data 110 may be collected by IoT sensors installed on a set of assets-of-interest and used to collect data to monitor the health status and the performance of the asset and the whole system. Different types of sensors are designed to collect different types of data for different industries, different assets, and/or different tasks. In the discussion below, the sensors are discussed generically with the assumption that the methods for data processing are applicable to different types of sensor data with minor adjustments. Some examples of the sensors that may be used to collect the sensor data 110 may include temperature sensors, pressure sensors, vibration sensors, acoustic sensors, motion sensors, optical sensors, LIDAR sensors, infrared (IR) sensors, acceleration sensors, gas sensors, smoke sensors, humidity sensors, level sensors, image sensors (cameras), proximity sensors, water quality sensors, and/or chemical sensors. For a particular asset-of-interest, in addition to sensors that are installed on the particular asset-of-interest, other sensors installed in the system may also be used to build the models (e.g., anomaly detection models, failure prediction models, and/or root cause models) for the particular asset-of-interest. For example, a data collected from a set of sensors installed on assets or system components that are upstream of the particular asset-of-interest and/or downstream of the particular asset-of-interest may be used to build a set of failure prediction models and/or may be identified as associated with a root cause of a predicted failure. The selection of sensors considered when building the different models, in some aspects, may be based on domain knowledge. In some aspects, a full set of sensors may be used to generate the different models for a particular asset-of-interest, where the feature detection/selection and root cause analysis may be used to narrow the set of sensors associated with a set of trained models (e.g., failure prediction models). For example, data analysis and model-based feature(s) selection may be applied to select sensors associated with a failure prediction model. FIG. 2A is a diagram 200 illustrating components of a physics-based model in accordance with some aspects of the disclosure. The physics-based model 220, in some aspects corresponds to the physics-based model 120 of FIG. 1. As described above, physical sensors may not capture a complete set of relevant signals and/or metrics to support the monitoring of the system health. This failure to capture the complete set of relevant signals and/or metrics may be due to one or more reasons. For example, a physical sensor may not capture a set of expected signals due to the physical limitations of the hardware, a physical sensor may not be able to be installed in a severe environment, for example, in a location with high levels of radiation or with a pressure or temperature that is outside a range in which the sensor is able to function. In some aspects, the set of physical sensors may not capture data at the frequency as expected. To overcome the limitations of the physical sensors, in some aspects, a software- based approach may be used to get the expected signals. The physics-based model 224, in some aspects, may be a representation of the governing laws of nature that innately embeds the concepts of time, space, causality, and generalizability. These laws of nature, in some aspects, define how physical, chemical, biological, and geological processes evolve. The physics-based model 224, in some aspects, may be a function which takes multiple inputs (e.g., physical sensor data 222) and generates multiple outputs (e.g., virtual sensor data 226). The inputs can come from predefined profile (such as motion profile) during the design time, or physical sensors (e.g., physical sensor data 222) during the operation time. The outputs (e.g., virtual sensor data 226), in some aspects, may include multiple variables and may represent a set of virtual sensors. In some aspects, a virtual sensor is a type of software derived information from the available information, which represents, or is associated with, data that a corresponding physical device would collect. For example, data collected by a physical acceleration sensor may be combined with a known mass of an associated component to calculate the (virtual) output of a virtual force sensor for the associated component. In some aspects, the virtual data associated with the virtual sensor may be used in the same way as the data from physical sensors, to derive the insights and build models and/or solutions for the downstream tasks. Benefits of the use of the virtual sensors, in some aspects, may include data complementation and data validation. Data complementation may include using virtual sensors to collect data (e.g., capture signals) that may not otherwise be captured by a physical sensor. For example, if the physical sensors are unable to capture data due to the hardware limitations of the physical sensor, or a severe environment that is incompatible with the installation or functioning of a physical sensor. In some aspects, when there is not much data available at the early stage of physical sensor installation, virtual sensors may be used to derive insights and build models. Virtual sensors, in some aspects, may generate high-frequency data that the set of physical sensors may not be able to capture. Data validation, in some aspects, may also be used to validate physical sensor data when collecting a same, or correlated, set of data. For example, the virtual sensor data may be used as an “expected” value while physical sensor data may be used as an “observed” value and the variance or difference between them can be used as a signal to detect abnormal behaviors or anomalies in the system. Through the virtual sensors, in some aspects, physics-based models and machine learning models may be combined into a semi-empirical approach, which may take advantage of both domain knowledge (through the physics-based models) and a data driven approach (through the machine learning models). The physics model is theoretically self-consistent and has demonstrated successes in providing experimental predictions. The physics-based model usually works well during the system design time. However, during the operation time, the complex system interactions and situations may get involved and the theoretical physics-based model based on domain knowledge and simulation may fall short of capturing the underlying mechanisms and become less accurate and sensitive. Data driven approach, on the other hand, can capture the subtle signals and patterns in the complex system (providing enough data is collected), and derive the proper insights for decision making. The physics-based model can complement the machine learning model by incorporating the domain knowledge into the artificial intelligence (AI) and/or machine learning (ML) model which may be costly to discover based on pure data driven approach. FIG.2A shows how the physics-based model 224 is applied to physical sensor data 222 in time-series format to derive the virtual sensor data 226. In some aspects, only a subset of physical sensor data (e.g., a subset of sensor data 110 of FIG. 1) may be used as the input physical sensor data 222 to the physics-based models 224. The physical sensor data 222, in some aspects, may be preprocessed before feeding into the physics-based model 224. For example, physical sensors may capture the position of an asset as it moves, however, the physics-based model may be based on a velocity and acceleration of the asset as it moves. Accordingly, the preprocessing, or the physics-based model, may include a first order of derivation used to calculate the velocity data associated with the position data and may include a second order of derivation used to calculate the acceleration data associated with the position data. The data from both the physical sensors and virtual sensors will be used as input to the next module (e.g., a feature engineering module 130 of FIG. 1) in the solution architecture. The physics-based model, in some aspects, may be built with simulation software and/or tools and may be built based on domain knowledge. FIG. 2B is a diagram 250 illustrating a feature engineering module 230 in accordance with some aspects of the disclosure. The feature engineering module 230, in some aspects, is an implementation of feature engineering module 130 of FIG. 1. In some aspects, several feature engineering techniques may be introduced to derive features from the sensor data (e.g., the physical or virtual sensor data, high-frequency IoT data, etc.). Diagram 250 illustrates several steps in a feature engineering module 230. Both physical sensor data and virtual sensor data, in some aspects, may be used to derive features. Since the sensor data, in some aspects, may be high-frequency (such as 1000 Hz or 3000 Hz) time-series data, there may be a down-sampling operation to convert the data into a lower frequency and/or an aggregation operation to aggregate the data to capture useful signals for downstream tasks and/or analysis, and use several techniques to derive features from the low-frequency and/or aggregated data. The down-sampling and/or aggregation operations may be performed at a sampling and aggregation module 234 that receives sensor data (e.g., physical and virtual sensor data 232). In some aspects, the output of the sampling and aggregation module 234 may be provided to feature derivation module 236 to identify a first set of features 238. In some aspects, before performing the down-sampling operation and/or the aggregation operation for the high-frequency sensor data, a sampling rate and aggregation statistics/characteristics may be determined. For example, the sampling rate may relate to how much data is retained when providing the data to the feature derivation module 236. For example, if the sampling rate is 0.01, then 1 percent of the original data will be retained in the result data while, if the sampling rate is 0.1, then 10 percent of the original data will be retained in the result data for feature derivation. To compensate for data loss due to the down-sampling operation, in some aspects, the aggregation statistics/characteristics may be provided for the original high-frequency sensor data over each of a set of time windows. For example, the aggregation statistics/characteristics may include, but are not limited to, a minimum value over the time window, a maximum value over the time window, a mean value over the time window, a standard deviation, a value associated with the 1^st percentile, a value associated with the 99^th percentile, a value associated with the 25^th percentile, a value associated with the 50^th percentile, a value associated with the 75^th percentile, and a trend. For example, when we sample the data from 1000 HZ (1000 data points per second) to 1 HZ (1 data point per second), for each second, we will calculate some statistics against 1000 HZ’s data and provide such statistics in the resulting data that is provided to the feature derivation module 236. The sampling rate and aggregation statistics, in some aspects, may be suggested based on domain knowledge. However, the suggested value may not be optimal for the downstream solutions. Accordingly, an optimization approach to optimize the sampling rate and aggregation statistics for the downstream solutions may be performed before any down- sampling or aggregation operations for real-time prediction failure (e.g., the inference phase after the models are trained and root causes have been identified). FIG. 3 is a flow diagram 300 illustrating an example workflow for optimizing at least one of the sampling rate and the aggregation statistics in accordance with some aspects of the disclosure. The optimization may be performed by the feature engineering module 130 or 230 (or more specifically by the sampling and aggregation module 234). At 302, a set/list of aggregation statistics may be generated identifying the statistics and/or characteristics to be included in the results of the aggregation operation. As described above, the aggregation statistics/characteristics may include, but are not limited to, a minimum value over the time window, a maximum value over the time window, a mean value over the time window, a standard deviation, a value associated with the 1^st percentile, a value associated with the 99^th percentile, a value associated with the 25^th percentile, a value associated with the 50^th percentile, a value associated with the 75^th percentile, and a trend. The space of possible aggregation statistics and sampling rates, in some aspects, may be explored based on at least one of a plurality of optimization approaches (e.g., Bayesian optimization, a grid search, or a random search). At 304, the workflow may include randomly selecting a subset of aggregation statistics from the set/list of aggregation statistics generated at 302. Randomly selecting the subset of aggregation statistics from the set/list of aggregation statistics may include generating a random binary value of length N (e.g., an N-bit binary value), where N is the number of elements of the set/list of aggregation statistics such that for elements of the set/list of aggregation statistics that correspond to a “0” in the N-bit binary value are not included in the randomly selected subset of aggregation statistics while elements of the set/list of aggregation statistics that correspond to a “1” value are included in the randomly selected subset of aggregation statistics. At 306, the workflow may include selecting a sampling rate. The sample rate may be selected, at 306, randomly or based on one of domain knowledge, or a business requirement. For example, a sampling rate may be selected based on knowledge of a characteristic time scale associated with an asset failure. At 308, the workflow may include performing the sampling operation and the aggregation operation on the sensor data (e.g., the physical sensor data and/or the virtual sensor data). The subset of aggregation statistics and the sampling rate used to perform the sampling operation and the aggregation operation at 308, in some aspects, may be the subset of aggregation statistics selected at 304 and the sampling rate selected at 306. As described in relation to feature engineering module 230 of FIG. 2B, performing the sampling and aggregation operations may provide data for a feature derivation operation performed by feature derivation module 236. The workflow may then include building, at 310, at least one model based on the sampled and aggregated sensor data (and/or any identified features in a first set of identified features). The at least one model may include one or more of an anomaly detection/scoring model, a short-term failure prediction model, or a long-term failure prediction. The model built based on the subset of aggregation statistics selected at 304 and the sampling rate selected at 306 may be evaluated based on some configured model performance metrics, such as overall accuracy, precision, and recall. In some aspects, building the models at 310 may include training machine learning models based on the sampled and aggregated data. At 312, the workflow may include determining whether a configured and/or desired number of repetitions of the selection of a sampling rate and the aggregation statistics has been reached. The configured number of repetitions may be based on a grid-based search of the sampling rate-aggregation statistics space or on some other property of the system or models. If the workflow determines, at 312, that the configured and/or desired number of repetitions has not been reached the process may return to 304 to select another subset of the aggregation statistics. If, however, the workflow determines that the configured and/or desired number of repetitions has been met, the workflow may proceed to train a model (e.g., a gaussian regression model) at 314. The trained model may be a surrogate model based on the results from 310. In some aspects, the features will be binary representation of aggregation statistics plus the sampling rate and a target may be related to performance metrics. The surrogate model may be one of a Gaussian process model for Bayesian optimization or a Tree Parzen Estimators (TPE) model. In some aspects, if the machine learning model for the downstream tasks is too complex, a simpler machine model (linear model, tree-based model) may be used as a surrogate of/for the complex machine learning model. At 316, the workflow may include defining an acquisition function to help choose an optimized set of features for the Gaussian regression model (a binary representation of aggregation statistics plus the sampling rate). The acquisition function in some aspects, may be one of probability of improvement, expected improvement, Bayesian expected losses, upper confidence bounds (UCB), Thompson sampling, or a hybrid of one or more of these. In some aspects, each different acquisition function may be associated with a different trade-off between exploration and exploitation and may be selected so as to minimize the number of function queries. At 318, the workflow may include training one or more machine learning models with the optimal set of values obtained based on the acquisition function employed at 316. The training, at 318, may include identifying performance metrics and a running time. Based on the output of the training of the one or more components 318, the workflow may determine whether a running time is above a threshold time. If the running time is determined, at 320, to be above the threshold, in some aspects, the workflow may return to randomly select a subset of aggregation statistics from the set/list of aggregation statistics at 304. If the running time is determined, at 320, to be below the threshold, the workflow may further determine, at 322, whether one or more criteria for stopping (e.g., a stop criteria) has been met. If the workflow determines, at 322, that the stop criteria has not been met (e.g., the training performance metrics have not met the predefined criteria), the last selected optimal set of features (a binary representation of aggregation statistics plus the sampling rate) and the model performance metrics may, at 324, be added to the training data set for Gaussian regression model, and the workflow may return to an additional round of training, at 314 as illustrated. If the workflow determines, at 322, that the stop criteria has been met, the workflow may end. In some aspects, the criteria may relate to number of rounds, model metrics, variance or entropy reduction rate, or other relevant criteria. In some aspects, there may be different algorithms used during the optimization process at different stages of the workflow illustrated in FIG. 3. For example, while the optimization approach described above is related to a Bayesian optimization, other optimization approaches such as grid search (coarse to grain search) and random search may be used in some aspects. Similarly, while a Gaussian process model may commonly be used as a surrogate function for a Bayesian optimization, the workflow may use other surrogate functions for particular business/industrial problems. As discussed above, in some aspects, the workflow may utilize TPEs or, if the machine learning model for the downstream tasks is too complex, a simpler machine model (e.g., a linear model or a tree-based model) as a surrogate of the complex machine learning model. Additionally, different acquisition functions may be implemented at 316 including but not limited to a probability of improvement, an expected improvement, a Bayesian expected losses, or a UCB, a Thompson sampling and/or a hybrid of one or more of the acquisition functions. Each different acquisition function may be associated with a trade- off between exploration and exploitation, and a particular acquisition function may be selected so as to minimize the number of function queries. In some aspects, two phases of optimization may be defined. A first phase may involve fixing the aggregation statistics (e.g., the selected subset of aggregation statistics) and optimizing the sampling rate, while a second phase may involve fixing the sampling rate to optimize the set of aggregation statistics. In each phase, the fixed parameter (e.g., one of the sampling rate or subset of aggregation statistics) may be determined based on domain knowledge. In some aspects, the first and second phases may be performed iteratively or in different orders. For example, based on domain knowledge a sampling rate (or a set of possible sampling rates) may be identified and the second phase of the optimization may be performed in accordance with the workflow of FIG.3. For example, during the second phase the sampling rate may be selected, at 306, based on the sampling rate or set of sampling rates identified based on the domain knowledge. After optimizing the subset of aggregated statistics, the first phase may be performed (e.g., for a first time or an i^th time) to optimize the sampling rate for the optimized subset of aggregation statistics identified in the second phase. The first and second phases may be iterated to perform a set of “one-dimensional” optimization operations (e.g., only varying one set of parameters) to converge on an optimized set of parameters for sampling rate and aggregation statistics. Based on the optimized sampling rate and the aggregated statistics, a set of historical sensor data (e.g., physical and/or virtual sensor data) may be sampled and aggregatedto produce processed sensor data. The processed sensor data may be used, e.g., by a feature engineering module 130 or 230 (or a feature derivation module 236), to derive a first set of features by applying techniques for time-series data. The techniques may include, but are not limited to, one or more of a moving average, a moving variance, a differencing associated with a rate of change of a value in time-series data (e.g., for a first order derivation or for a second order derivation). In addition, for each time point, a feature detection may define a look-back feature window to derive some statistics (e.g., in the setoff aggregation statistics) about the data in the feature window and may use the derived statistics as additional features for the current time point (e.g., may associate the derived statistics with the time point for an enhanced data set). The length of the feature window may (at least initially) be determined based on domain knowledge and may further be optimized with optimization techniques (such as grid search, random search or Bayesian optimization as described above in relation to FIG.3). Such derived features, in some aspects, may be used together with the aggregated data as features for the downstream solutions. FIG.4 is a diagram illustrating a set of components of an implementation of an anomaly detection module 140 (e.g., an unsupervised anomaly detection model) in accordance with some aspects of the disclosure. The unsupervised anomaly detection model may incorporate (or be based on) a set of features 410 (e.g., corresponding to features 238). Based on the features 410 (e.g., derived by feature engineering module 130 or 230), a set of anomaly detection models 420 (e.g., including a set of anomaly detection models 420-1, 420-2, and 420-K) may be used, in some aspects, to generate a set of anomaly scores 430 (e.g., including a set of anomaly scores 430-1, 430-2, and 430-K) for each time point. In some aspects, an anomaly score (e.g., an anomaly score 430-1, 430-2, or 430-K) may indicate how likely there is an anomaly at each time point. For example, an anomaly score may be defined to be in a range from 0 to 1 where larger values may represent a greater likelihood of representing an anomaly. The set of anomaly scores 430, in some aspects, may be used as labels and features to build the failure prediction models (e.g., models associated with short-term failure prediction module 150 or long-term failure prediction module 170 of FIG. 1). Multiple anomaly detection approaches (represented by anomaly detection models in the set of anomaly detection models 420), in some aspects, may be applied to the features 410 with each anomaly detection model in the set of anomaly detection models 420 generating an anomaly score in the set of anomaly scores 430 at each time point. The set of anomaly scores 430 from multiple models, in some aspects, may be ensembled (e.g., into ensemble anomaly scores 440) to remove a bias that may be embedded in the anomaly detection models in the set of anomaly detection models 420. A supervised surrogate model 450 may be built based on the features 410 and ensemble anomaly scores 440 in order to select features 475, explain the anomaly scores (e.g., using explainable AI model 460), and evaluate the anomaly detection model. A description of a workflow associated with the elements of FIGs. 1, 2, and 4 for performing an anomaly detection to generate a set of anomaly scores 430 (or, collectively, ensemble anomaly scores 440), root causes for anomaly scores 480, and selected features 475 is provided below. For example, a feature engineering module 130 or 230 may be used to generate a set of features 410 based on a set of physical sensor data 110 or 222 and/or virtual sensor data 226 generated by a physics-based model 120 or 220. The workflow may, in some aspects, include selecting multiple anomaly detection model algorithms (e.g., anomaly detection models in the set of anomaly detection models 420) and applying each selected model algorithm to the features 410 (or a subset of the features 410) to generate an anomaly score (e.g., anomaly score 430-1, 430-2, and 430-K) from each model. For each time point, the anomaly scores 430 generated by the set of anomaly detection models 420 may be ensembled into one anomaly score. In some aspects, the workflow may include using the features 410 as features and the ensemble anomaly scores 440 as labels to build a supervised surrogate model 450. With the supervised surrogate model 450 and the features 410, an explainable AI model 460 may be used to explain each ensemble anomaly score 440 and derive a set of root causes for the ensemble anomaly scores 480. Each root cause (alternatively referred to as a contributing factor), in some aspects, may be identified by a feature or factor name and its weight contributing to the ensemble anomaly score. In some aspects, open source libraries may be used to explain the prediction results for machine learning models. For example, “ELI5” (https://eli5.readthedocs.io/) and “SHAP” (https://shap.readthedocs.io/) are two open source libraries that may be used to explain the prediction results for machine learning models. Such libraries are designed to explain each result for each time. The supervised surrogate model, in some aspects, may be used to select (e.g., via a model-based feature selection module 470) important features. In some aspects, the model- based feature selection module 470 may use one or more feature selection techniques such as forward selection, backward selection, or model-based feature selection (based on a feature importance technique). For example, a set of features may be associated with a calculated importance score indicating the magnitude of the contribution to at least one model utilized in the workflow (e.g., by an anomaly detection module 140 that may be associated with the data and models illustrated in FIG.4. Accordingly, the output of the anomaly detection module 140 may include, in some aspects, one or more of the individual anomaly scores 430 (or the ensemble anomaly scores 440), the root causes for anomaly scores (identified by the explainable AI model 460), and/or the selected features 475 identified by the model-based feature selection module 470). In some aspects, the sensor data is collected to predict at least a short-term failure. A short-term failure, by design, may be a failure that develops in a short time period, such as hours or days. Some examples of short-term failures may include system operation failure and/or small mechanical or electrical failures. The anomaly detection module 140, in some aspects, may generate ensemble anomaly scores, a set of root causes (e.g., contributing factors), as well as the selected important features based on a set of historical sensor data. FIG. 5 is a diagram 500 illustrating a set of operations associated with short-term failure prediction. The set of operations may include a first subset of operations (e.g., operations 502, 504, 506, and 508) for model building/training and a second subset of operations (e.g., operations 510, 512, and 514) for a model-based inference/prediction using the trained model on real-time (or near- real-time) collected data. In some aspects, a system may generate, at 502, one or more of a set of individual anomaly scores (or the ensemble anomaly scores), the root causes (contributing factors) for the anomaly scores (identified by an explainable AI model), and/or the selected features identified by the model-based feature selection module 470. At 504, the system may, based on the anomaly scores, root causes, and features, for each time point, define a set of parameters associated with one or more of a look-back feature window and/or a lead-time window to derive features. The set of parameters associated with one or more of the look-back feature window and/or the lead-time window may be defined based on domain knowledge or may be optimized based on some optimization algorithms, such as grid search and random search. The set of parameters for the look-back feature window and the lead time window may include a duration of a look-back feature window in time from the current time and a separation in time between the current time and potential failure occurrence time (i.e. the lead time feature window). Such parameters, in some aspects, may be based on a type of failure to be predicted and a desired lead time for identifying a potential failure to give a user sufficient time to address the predicted failure (e.g., preparing a replacement part, performing maintenance, or otherwise mitigating or avoiding the failure). Based on the set of parameters associated with the look-back feature window and/or a lead-time window, the system, at 506, may derive the features for building the short-term failure prediction model. For example, based on the parameters associated with the look-back feature window, the system may, at 506, derive the features for building the short-term failure prediction model by calculating and/or identifying one or more of the selected features, the ensemble anomaly scores, and their root causes within the defined look-back feature window. The selected features, the ensemble anomaly scores, and their root causes within a particular defined look-back feature window, in some aspects, may be concatenated with the time order (e.g., the time point) and used as features for generating, training, or validating a short-term failure prediction model. In some aspects, deriving the features may also include an aggregation function based on a selected subset of aggregation statistics. The selected aggregation statistics, in some aspects, may include a minimum value over the time window, a maximum value over the time window, a mean value over the time window, a standard deviation, a value associated with the 1^st percentile, a value associated with the 99^th percentile, a value associated with the 25^th percentile, a value associated with the 50^th percentile, a value associated with the 75^th percentile, and a trend and may be selected as described above in relation to FIG. 3. The lead-time window may further be used to generate, at 506, ensemble anomaly scores for association with the time point associated with the lead-time window (and the look- back window). The generated set of ensemble anomaly scores may be used as a set of target data for a subsequent training operation for building/training a short-term failure prediction model. For example, the system, in some aspects, may, for each time point, define a look-ahead lead-time window and use the ensemble anomaly scores associated with the lead-time window as the target (e.g., ground truth for the predicted values) for building the short-term failure prediction model. In some aspects, the anomaly scores are continuous values, and using continuous value as the prediction target mitigates issues associated with rare failures in the classification approach. At 508, the system may build (or train using a machine learning operation) a short-term failure prediction model with time series sequence prediction model. In some aspects, a deep learning recurrent neural network (RNN) models (e.g., LSTM, GRU) may be may be used to build and/or train the short-term failure prediction model. Other approaches like auto- regressive integrated moving average (ARIMA) or other appropriate machine learning method may be used in some aspects. Building, at 508, the short-term failure prediction model may represent the end of the first model building/training subset of operations. During operation of the system, the system, at 510, calculated one or more predicted failure scores based on data collected by the physical sensors and, in some aspects, the virtual sensor data. In some aspects, the predicted failure scores may be converted, at 512, into categorical failure risk levels. The categorical failure risk levels may include a low risk level, a medium risk level, and a high-level risk that may be more understandable for a user attempting to decide whether action should be taken based on the predicted failure score. The conversion may be based on a set of thresholds for predicted failure scores associated with different risk levels. For example, when defining three different risk levels, a low risk level may be associated with a risk with a predicted failure score below 0.2, a medium risk level may be associated with a predicted failure score of 0.2 up to a failure prediction score of 0.6, and a high risk level may be associated with a failure prediction score above 0.6. A different number of categories may be used in different aspects and the labels may indicate a recommended action, e.g., the low, medium and risk levels of the previous example, may be replaced and/or labeled as “no action to be taken”, “monitor performance”, and “repair/replace”. Accordingly, at 514, the predicted failure score and/or the categorical risk level may be reported to a user. In some aspects, the sensor data is collected to predict at least a long-term failure. A long-term failure may be a failure that develops over a long time period, such as weeks, months or even years. Some examples of long-term failures include big mechanical failures and big electrical failures. Checking the system to identify the potential long-term failure can avoid big loss in terms of both assets and human safety. FIG. 6 is a diagram 600 illustrating a set of operations associated with generating a long-term failure prediction score to identify and predict the potential long-term failures, in accordance with some aspects of the disclosure. At 602, a separate set of anomaly detection operations may be performed for each asset in a set of assets associated with a long-term failure prediction for a system or subsystem within a larger system. The set of operations may include anomaly detection operations performed at 602. In some aspects, a first set of anomaly detection operations 602A may be performed based on sensor data as described in relation to operation 502 of FIG. 5. The output of the anomaly detection operations performed at 602 includes ensemble anomaly scores, root causes for the ensemble anomaly scores and selected features. As described above in relation to FIG. 5, for each asset in a set of multiple assets (tens to thousands of assets), one or more of the selected features, the ensemble anomaly scores, and their root causes within the defined time windows may be derived. Aggregation statistics similar to those discussed above in relation to FIG. 5 may also be generated at each of the multiple time scales. The aggregation statistics for the multiple time scales, in some aspects, may be concatenated per each asset. The anomaly detection operations performed at 602 may further include obtaining, at 602B, physical design data for each asset, which, in some aspects, may include, but may not be limited to, an expected asset life, an asset material, an asset make, an asset model, or other relevant physical attributes. At 604, the system may perform a feature engineering operation including an aggregation operation across the data in a look-back feature window for each of the multiple assets to derive a set of features and root causes (contributing factors). For example, for each asset and each of a set of defined time windows, the system, at 604, may calculate one or more of an average value for the anomaly scores, a variance of the anomaly scores or other aggregation statistics as described above. In some aspects, the aggregation reduces a “dimensionality” of the problem to be modeled for the long-term failure prediction and, by considering a system or components of a subsystem, may generate a failure prediction score that reflects or considers a redundancy in the system or subsystem. At 606, an anomaly detection operation for the ensemble of multiple models may be determined based on the output of the feature engineering. The feature engineering at 604 may generate a second set of system-level (aggregated) features, a set of system-level ensemble anomaly scores, and a set of system-level root causes (contributing factors for the anomaly scores generated by the detection model). Based on the second set of system-level (aggregated) features, a set of system-level ensemble anomaly scores, and a set of system-level root causes, an anomaly detection model at the system-level (or an ensemble of multiple anomaly detection models) may be applied at 606 to produce a long-term failure prediction score 608. The long- term failure prediction score 608 may include a set of scores associated with different time horizons. As described in relation to FIG.5, the long-term prediction score may be categorized to allow for more intuitive interpretation of the calculated long-term failure prediction score. Once a short-term failure is predicted by the model, the system may be able to derive a root cause of the failure to help diagnose and remediate the short-term failure. FIG. 7 is a diagram 700 illustrating a set of operations for identifying a set of root causes (e.g., a root cause analysis for predicted short-term failure 701) associated with a short-term failure in accordance with some aspects of the disclosure. FIG.8 is a diagram 800 illustrating a set of operations for identifying a set of root causes (e.g., a root cause analysis for predicted long-term failure 801) associated with a long-term failure in accordance with some aspects of the disclosure. As the operations and elements of FIGs. 7 and 8 significantly overlap, the common elements will be discussed together below. Diagram 700 illustrates that, for each predicted failure represented by predicted failure scores 702 associated with a failure prediction model 704 (generated as described in relation to FIG. 5), an explainable AI model – failure prediction module 706 may identify a first feature set (“first feature set” 708) based on the failure prediction model 704 and the predicted failure score 702. In some aspects, the first feature set 708 may be a set of features identified as being important features which contribute the most to the predicted failure score, and such features include a subset of root causes for anomaly scores 480 and selected features 475. The important features may be identified based on a relative importance (e.g., a preconfigured number of features with a largest impact on, or contribution to, the predicted failure score) or an absolute importance (e.g., based on a threshold value associated with a measure of a contribution to the predicted failure score 702). Accordingly, in some aspects, each feature is associated with a feature importance score to indicate how much it contributes to the predicted failure score. The explainable AI model – failure prediction module 706 may identify the detected anomaly scores 710 (e.g., the anomaly scores 430 or ensemble anomaly scores 440) as important features for each predicted failure score 702. The detected anomaly scores 710 and a supervised surrogate model – detection 712 (e.g., corresponding to supervised surrogate model 450) may be provided to an explainable AI model – anomaly detection module 714 to derive a second set of important features (i.e., second feature set 716). As for the features in first feature set 708, each feature may be associated with a feature importance score to indicate how much it contributes to the detected failure score. For identifying a set of root causes associated with a long-term failure 834, the inputs to the explainable AI model – failure prediction module 808 are slightly different from the inputs to the explainable AI model – failure prediction module 706. The operations for identifying a set of root causes associated with a long-term failure 834 may include building a supervised surrogate model – prediction 806 (where supervised surrogate functions have been discussed above in relation to FIG. 4) for long-term failure prediction model 804 by using features for the long-term failure prediction model 804 as features and the anomaly scores from the long-term failure prediction model 804 as a target. For each predicted failure, the explainable AI model – failure prediction module 808 may process the supervised surrogate model – prediction 806 and the predicted long-term failure score 802 to identify the important features which contribute the most to the predicted long-term failure score 802. Each feature in the first feature set 810, in some aspects, may be associated with a feature importance score to indicate how much it contributes to the predicted long-term failure score 802. The operations may include identifying detected anomaly scores 812 and the supervised surrogate model – detection 814 related to the identified first feature set 810 and using the explainable AI model – anomaly detection module 816 to identify a second set of important features (second feature set 818). Each feature in first feature set 810 and second feature set 818, in some aspects, may be associated with a feature importance score to indicate how much it contributes to the detected failure score. Subsequent operations are significantly overlapping for short-term and long-term root cause analysis with the understanding that the features and sensors identified using the following operations may be different in nature due to the different underlying systems and/or components being analyzed (e.g., components related to small scale-failures versus large-scale failures). A feature aggregation and ranking module 718 ( 820), in some aspects, merges the features from first feature set 708 (810) and second feature set 716 (818) into a single set and sorts the features based on a feature importance score. In some aspects, the merging removes redundant features such that a feature that appears in both first feature set 708 (810) and second feature set 716 (818) may only be represented once in the merged list. For example, a feature importance score in second feature set 716 (818) may be calculated by multiplying the feature importance score from explainable AI model – failure prediction module 706 (808) and the feature importance score from explainable AI model – anomaly detection module 714 (816). In some aspects, if there are duplicate features in first feature set 708 (810) and second feature set 716 (818), the system may merge the duplicated features by using the aggregated feature importance score with aggregation statistics, which may include, but is not limited to, a summation of the feature importance scores, a maximum of the feature importance scores, and an average of the feature importance scores. The feature aggregation and ranking module 718 (820) may sort the features in the merged list based on the feature importance scores in descending order. For each feature in the above result set, the system may map the feature to one or more physical or virtual sensors via a module for mapping features to sensors 720 (822). For features that map to physical sensors directly, a first set of physical sensors (first physical sensor set 722 (824)) may be identified. For features that map to a set of virtual sensors 724 (826), a further mapping may be performed by a module for mapping virtual sensors to physical sensors 726 (828) to identify a second set of physical sensors (a second physical sensor set728 (830)) that is associated with the feature. Sensor aggregation and ranking module 730 (832) may then perform a merging and ranking operation on first physical sensor set 722 (824) and second physical sensor set 728 (830) where each physical sensor is associated with an importance score based on the corresponding feature. Similarly to the feature aggregation and ranking, aggregating the sets of identified physical sensors may merge importance scores for physical sensors that correspond to more than one feature based on, e.g., one or more of a summation of the sensor importance scores, a maximum of the sensor importance scores, and an average of the sensor importance scores. The physical sensors in the aggregated list may also be ranked by the sensor aggregation and ranking module 730 (832). The ranked list may include a set of physical sensors and a corresponding weight indicating a magnitude of a contribution to the ensemble anomaly score sorted in descending order. The list may then be provided as a set of root causes for predicted failures 732 (834). In some aspects, the first feature set 708 (810), the second feature set 716 (818), and the set of virtual sensors 724 (826) may also be output to provide additional operational insights to a user. While the above discussion relates to sensor data for inanimate components of a system, the method may be extended, in some aspects, to use data about operators to enhance failure prediction, if such data is available. The operators, in some aspects, may include human, robot, or motion profile. In some aspects, the role of the operator is critical in the operation of the machine or the system, and their performance directly impacts the system’s performance, which can be measured as production yield, failure rate or user experience, and may therefore provide better prediction accuracy when the role of the operator is considered. Specifically, for failure rate, if the data about operators is collected, then we can use them as additional features to build the anomaly detection model, failure prediction model and long-term failure prediction model. The data about operators, in some aspects, may include, but is not limited to, one or more of operation trajectories (e.g., position, velocity and acceleration), an operator’s seniority, an operator’s historical performance metrics, or demographics of the operator for example. FIG. 9 is a flow diagram illustrating a method 900 for a system performing a failure prediction operation for an industrial process. The method may be performed by a set of processing units associated with one or more computing devices associated with an industrial system including a set of components and a set of sensors for monitoring the components of the system. At 910, the system may collect (or obtain) a set of physical sensor data. The set of physical sensor data, in some aspects, may include a set of high-frequency data that is sampled by the sampling operation. The set of physical sensor data may include sensor data from one or more of temperature sensors, pressure sensors, vibration sensors, acoustic sensors, motion sensors, optical sensors, LIDAR sensors, IR sensors, acceleration sensors, gas sensors, smoke sensors, humidity sensors, level sensors, image sensors (cameras), proximity sensors, water quality sensors, and/or chemical sensors. For example, referring to FIG.1, the sensor data 110 may be obtained from a set of physical sensors that monitor conditions or characteristics of components of the system. At 920, the system may generate a set of virtual sensor data by applying a physics- based model to a subset of the set of physical sensor data. In some aspects, a virtual sensor is a type of software derived information from the available information, which represents, or is associated with, data that a corresponding physical device would collect. For example, referring to FIGs. 1 and 2A, a physics-based model 120 or 220 (including a physics-based model 224) may obtain physical sensor data 110 or 222 and may generate a set of virtual sensor data 226 based on a physics-based model (e.g., physics-based model 224). The physics-based model 224, in some aspects, may be a representation of the governing laws of nature that innately embeds the concepts of time, space, causality, and generalizability. These laws of nature, in some aspects, define how physical, chemical, biological, and geological processes evolve. The physics-based model 224, in some aspects, may be a function which takes multiple inputs (e.g., physical sensor data 222) and generates multiple outputs (e.g., virtual sensor data 226). The inputs can come from predefined profile (such as motion profile) during the design time, or physical sensors (e.g., physical sensor data 222) during the operation time. The outputs (e.g., virtual sensor data 226), in some aspects, may include multiple variables and may represent a set of virtual sensors. At 930, the system may identify a first set of features from the set of physical sensor data and the set of virtual sensor data by performing at least one of a sampling operation, an aggregating operation, or a feature derivation operation on the set of physical sensor data and the set of virtual sensor data based on an optimized sampling rate or optimized aggregation statistics. The set of physical sensor data, in some aspects, may include a set of high-frequency data and the optimized sampling rate may be associated with a sampling operation to reduce the volume of data. For example, referring to FIGs. 2B and 4, a feature engineering module 230 may generate a set of features 238 or 410 using one or more of a sampling and aggregation module 234 and/or feature derivation module 236 applied to the sensor data 232. As described in relation to FIG. 3, the optimized sampling rate and/or the optimized aggregation statistics may be based on one or more optimization algorithms. For example, the at least one of the optimized sampling rate or the optimized aggregation statistics, in some aspects, may be computed using one or more of a Bayesian optimization, a grid search, or a random search. As described above, in some aspects, a first of the optimized sampling rate or the optimized aggregation statistics is based on domain knowledge and a second of the optimized sampling rate or the optimized aggregation statistics is computed based on one or more of a Bayesian optimization, a grid search, or a random search. In some aspects, optimizing the first of the optimized sampling rate or the optimized aggregation statistics and the second of the optimized sampling rate or the optimized aggregation statistics includes cycling between, or iterating, these optimizations based on the results of a previous optimization as described in relation to FIG. 3. In some aspects, the optimized sampling and aggregation operations may be implemented by a first set of machine-trained models, e.g., trained by the method described above in relation to FIG. 3. At 940, the system may identify, based on the first set of machine-trained models (e.g., the trained sampling and aggregation models that may correspond to anomaly detection models) applied to the first set of features, a set of anomaly detection scores, a first set of contributing factors to the set of anomaly detection scores, and a second set of features that have a feature importance score that is above a threshold. The first set of machine-trained models, in some aspects, may include a set of anomaly detection models that, in some aspects, produce a set of anomaly detection scores. The set of anomaly detection models, in some aspects, may be trained based on the set of physical sensor data collected at 910 and the set of virtual sensor data generated at 920. For example, referring to FIG.4, the system may identify (1) a set of anomaly detection scores 430 (e.g., anomaly score 430-1, 430-2, and 430-K, or, collectively, ensemble anomaly scores 440), (2) the root causes for anomaly scores 480 as the first set of contributing factors for anomaly scores 480 (e.g., the first set of contributing factors to the set of anomaly detection scores), and (3) the selected features 475 (e.g., a second set of features that have a feature importance score that is above a threshold) based on a set of anomaly detection models 420 applied to the features 410 ensemble anomaly scores 440, the supervised surrogate model 450, the explainable AI model 460, and the model-based feature selection 470. At 950, the system may generate at least one failure prediction model based on the first set of features and a first set of machine-trained models. The at least one failure prediction model, in some aspects, may include at least one short-term failure prediction model based on a machine-learning operation applied to (1) a set of anomaly detection scores, (2) a sampling operation and/or aggregation operation, or (3) the first set of identified features. The machine- learning operation used to build and/or train the short-term failure prediction, in some aspects, may include one or more deep learning RNN models (e.g., LSTM, GRU), ARIMA, or other appropriate machine learning method. The at least one short-term failure prediction model, in some aspects, may include a short-term failure prediction model generated for each individual asset in a set of assets associated with the system. For example, referring to FIG.5, the system may generate (e.g., build/train), at 508, the short-term failure prediction mode based on the (1) operation 502 based on a set of anomaly detection models, (2) operation 504 based on a set of parameter optimization operations, and (3) operation 506 based on a set of feature derivation and transformation operations. In some aspects, the at least one failure prediction model may include at least one long- term failure prediction model. Generating the at least one long-term failure prediction model, in some aspects, may further be based on a second set of machine-trained models. The second set of machine-trained models, in some aspects, may be based on the first set of machine- trained models. For example, referring to FIG.6, the system may generate the at least one long- term failure prediction model based on an additional set of anomaly detection models (e.g., a second set of machine-trained models) as applied, at 606, to a set of data generated based on the first set of machine-trained models applied to each of a plurality of assets at 602. At 960, the system may apply the at least one failure prediction model to the set of physical sensor data and the set of virtual sensor data to calculate a likelihood score related to a predicted failure of an asset. Additional operations that may be performed are indicated by the letter “A”. FIG.10 is a flow diagram illustrating a method 1000 for identifying a set of root causes (e.g., contributing factors) for a failure prediction, in accordance with some aspects of the disclosure. As indicated by the letter “A”, the method of FIG. 10 may be performed after the operations described in relation to the method illustrated in FIG. 9. The method 1000, in some aspects, is performed by a root-cause analysis module (e.g., root-cause analysis for short-term failure module 160 or root-cause analysis for long-term failure module 180 of FIG.1). At 1010, the root-cause analysis module may, in some aspects, identify a set of contributing features for the at least one failure prediction model by applying the first explanatory model for the at least one failure prediction model to the set of predicted failure scores and applying a second explanatory model for the first set of machine-trained models. In some aspects, the term contributing features may refer to abstract features identified by models in the system while the term contributing factors may relate to physical components of the system that are identified as contributing to the failure prediction models. For example, referring to FIGs. 7 and 8, explainable AI model – failure prediction module 706 (or 808) may obtain a set of predicted short-term (or long-term) failure scores 702 (or 802) to identify first feature set 708 (810) associated with the detected anomaly scores 710 (812). The explainable AI model – anomaly detection module 714 (816) may also be used to identify second feature set 716 (818). The first feature set 708 (810) and the second feature set 716 (818) may together make up the set of contributing features for the at least one failure prediction model. At 1020, the root-cause analysis module may, in some aspects, calculate a second importance score for each contributing feature in the set of contributing features. The calculated second importance score, in some aspects, reflects a contribution to one or more failure prediction scores or anomaly scores. For example, referring to FIGs. 7 and 8, explainable AI model – failure prediction module 706 (or 808) and the explainable AI model – anomaly detection module 714 (816) may calculate or output a weight associated with each contributing feature. The root cause analysis module (e.g., root cause analysis for predicted short-term failure 701 or root cause analysis for predicted long-term failure 801) may produce a list of features associated with a corresponding weight and a feature aggregation and ranking module 718 (or 820) may generate a ranked list of features and aggregated weights (e.g., for features that contribute to multiple anomaly scores and/or predictions). At 1030, the root-cause analysis module may, in some aspects, map each contributing feature in the set of contributing features to one or more contributing physical sensors. The mapping may include a mapping of features to physical sensors and a mapping of features to virtual sensors. Mapping each contributing feature in the set of contributing features to one or more contributing physical sensors may additionally include mapping the virtual sensors to one or more physical sensors. For example, referring to FIGs. 7 and 8, the module for mapping features to sensors 720 (822) may map the contributing factors in the first set of factors to a set of physical sensors 722 (824) and 728 (830), e.g., via an intermediate mapping to a set of virtual sensors 724 (826) that is in turn mapped to the set of physical sensors 728 (830). At 1040, the root-cause analysis module may, in some aspects, calculate a third importance score for each of the one or more contributing physical sensors based on the second importance score of at least one contributing feature in the set of contributing features mapped to the physical sensor of the one or more contributing physical sensors. The calculation of the third importance score, in some aspects, may be based on aggregating the weights associated with a same physical sensor based on different features or mappings. For example, referring to FIGs. 7 and 8, sensor aggregation and ranking module 730 (832) may perform a merging operation on first physical sensor set 722 (824) and second physical sensor set 728 (830) where each physical sensor is associated with an importance score based on one or more corresponding features. Similarly to the feature aggregation and ranking, aggregating the sets of identified physical sensors may merge importance scores for physical sensors that correspond to more than one feature based on, e.g., one or more of a summation of the sensor importance scores, a maximum of the sensor importance scores, and an average of the sensor importance scores Finally, at 1050, the root cause analysis module, in some aspects, may identify a subset of contributing physical sensors of the one or more contributing physical sensors as a second set of contributing factors for the at least one failure prediction model based on the third importance score. For example, referring to FIGs. 7 and 8, sensor aggregation and ranking module 730 (832) may perform a ranking operation on the merged list of contributing factors based on the first physical sensor set 722 (824) and second physical sensor set 728 (830). The physical sensors in the aggregated list may be ranked by the sensor aggregation and ranking module 730 (832). The ranked list may include a set of physical sensors and a corresponding weight indicating a magnitude of a contribution to the ensemble anomaly score sorted in descending order. As presented in the disclosure, the system may provide one or more of the following benefits. For example, the disclosure above introduces an automated, unsupervised, data-driven solution for failure/fault control in an industrial system for one or more of short-term failures (in hours or days) or long-term failures (in weeks, months or years). The disclosure above provides a solution that can predict failures with some lead time and identify the root causes for the failures so that the operators/technicians can have enough time to respond, and have additional root cause information to help diagnose the failures. The disclosure above reduces or eliminates the need for (manually) labeled data. In some aspects, the disclosure relates to performing the operations discussed above on historical sensor data while historical failure data may be optional in some aspects. The disclosure also introduces a semi-empirical approach through a combination of physics-based model and machine learning model and an optimization strategy for a sampling rate and aggregation statistics for high-frequency sensor data. In some aspects, the proposed anomaly prediction approach performs well on the prediction of rare failure events ahead of time with the advanced deep learning sequence prediction power and continuous values as target. The disclosure also relates to root causes at the several levels (detected anomaly score level, feature level, raw sensor data level) that are identified for the predicted failures through a chain of explainable AI techniques and aggregation/ranking algorithms. FIG. 11 illustrates an example computing environment with an example computer device suitable for use in some example implementations. Computer device 1105 in computing environment 1100 can include one or more processing units, cores, or processors 1110, memory 1115 (e.g., RAM, ROM, and/or the like), internal storage 1120 (e.g., magnetic, optical, solid-state storage, and/or organic), and/or IO interface 1125, any of which can be coupled on a communication mechanism or bus 1130 for communicating information or embedded in the computer device 1105. IO interface 1125 is also configured to receive images from cameras or provide images to projectors or displays, depending on the desired implementation. Computer device 1105 can be communicatively coupled to input/user interface 1135 and output device/interface 1140. Either one or both of the input/user interface 1135 and output device/interface 1140 can be a wired or wireless interface and can be detachable. Input/user interface 1135 may include any device, component, sensor, or interface, physical or virtual, that can be used to provide input (e.g., buttons, touch-screen interface, keyboard, a pointing/cursor control, microphone, camera, braille, motion sensor, accelerometer, optical reader, and/or the like). Output device/interface 1140 may include a display, television, monitor, printer, speaker, braille, or the like. In some example implementations, input/user interface 1135 and output device/interface 1140 can be embedded with or physically coupled to the computer device 1105. In other example implementations, other computer devices may function as or provide the functions of input/user interface 1135 and output device/interface 1140 for a computer device 1105. Examples of computer device 1105 may include, but are not limited to, highly mobile devices (e.g., smartphones, devices in vehicles and other machines, devices carried by humans and animals, and the like), mobile devices (e.g., tablets, notebooks, laptops, personal computers, portable televisions, radios, and the like), and devices not designed for mobility (e.g., desktop computers, other computers, information kiosks, televisions with one or more processors embedded therein and/or coupled thereto, radios, and the like). Computer device 1105 can be communicatively coupled (e.g., via IO interface 1125) to external storage 1145 and network 1150 for communicating with any number of networked components, devices, and systems, including one or more computer devices of the same or different configuration. Computer device 1105 or any connected computer device can be functioning as, providing services of, or referred to as a server, client, thin server, general machine, special-purpose machine, or another label. IO interface 1125 can include but is not limited to, wired and/or wireless interfaces using any communication or IO protocols or standards (e.g., Ethernet, 902.11x, Universal System Bus, WiMax, modem, a cellular network protocol, and the like) for communicating information to and/or from at least all the connected components, devices, and network in computing environment 1100. Network 1150 can be any network or combination of networks (e.g., the Internet, local area network, wide area network, a telephonic network, a cellular network, satellite network, and the like). Computer device 1105 can use and/or communicate using computer-usable or computer readable media, including transitory media and non-transitory media. Transitory media include transmission media (e.g., metal cables, fiber optics), signals, carrier waves, and the like. Non- transitory media include magnetic media (e.g., disks and tapes), optical media (e.g., CD ROM, digital video disks, Blu-ray disks), solid-state media (e.g., RAM, ROM, flash memory, solid- state storage), and other non-volatile storage or memory. Computer device 1105 can be used to implement techniques, methods, applications, processes, or computer-executable instructions in some example computing environments. Computer-executable instructions can be retrieved from transitory media, and stored on and retrieved from non-transitory media. The executable instructions can originate from one or more of any programming, scripting, and machine languages (e.g., C, C++, C#, Java, Visual Basic, Python, Perl, JavaScript, and others). Processor(s) 1110 can execute under any operating system (OS) (not shown), in a native or virtual environment. One or more applications can be deployed that include logic unit 1160, application programming interface (API) unit 1165, input unit 1170, output unit 1175, and inter-unit communication mechanism 1195 for the different units to communicate with each other, with the OS, and with other applications (not shown). The described units and elements can be varied in design, function, configuration, or implementation and are not limited to the descriptions provided. Processor(s) 1110 can be in the form of hardware processors such as central processing units (CPUs) or in a combination of hardware and software units. In some example implementations, when information or an execution instruction is received by API unit 1165, it may be communicated to one or more other units (e.g., logic unit 1160, input unit 1170, output unit 1175). In some instances, logic unit 1160 may be configured to control the information flow among the units and direct the services provided by API unit 1165, the input unit 1170, the output unit 1175, in some example implementations described above. For example, the flow of one or more processes or implementations may be controlled by logic unit 1160 alone or in conjunction with API unit 1165. The input unit 1170 may be configured to obtain input for the calculations described in the example implementations, and the output unit 1175 may be configured to provide an output based on the calculations described in example implementations. Processor(s) 1110 can be configured to collect a set of physical sensor data. The processor(s) 1110 may also be configured to generate a set of virtual sensor data by applying a physics-based model to a subset of the set of physical sensor data. The processor(s) 1110 may further be configured to identify a first set of features from the set of physical sensor data and the set of virtual sensor data by performing at least one of a sampling operation, an aggregating operation, or a feature derivation operation on the set of physical sensor data and the set of virtual sensor data based on an optimized sampling rate or optimized aggregation statistics. The processor(s) 1110 may further be configured to identify, using a first set of machine-trained models applied to the first set of features, a set of anomaly detection scores, a first set of contributing factors to the set of anomaly detection scores, and a second set of features that have a feature importance score that is above a threshold. The processor(s) 1110 may further be configured to generate at least one failure prediction model based on the first set of machine- trained models. The processor(s) 1110 may also be configured to apply the at least one failure prediction model to the set of physical sensor data and the set of virtual sensor data to calculate a likelihood score related to a predicted failure of an asset. The processor(s) 1110 may also be configured to identify a set of contributing features for the at least one failure prediction model by applying the first explanatory model for the at least one failure prediction model to the set of predicted failure scores and applying a second explanatory model for the first set of machine- trained models. The processor(s) 1110 may also be configured to calculate a second importance score for each contributing feature in the set of contributing features. The processor(s) 1110 may also be configured to map each contributing feature in the set of contributing features to one or more contributing physical sensors. The processor(s) 1110 may also be configured to calculate a third importance score for each of the one or more contributing physical sensor based on the second importance score of at least one contributing feature in the set of contributing features mapped to the physical sensor of the one or more contributing physical sensors. The processor(s) 1110 may also be configured to identify a subset of contributing physical sensors of the one or more contributing physical sensors as a second set of contributing factors for the at least one failure prediction model based on the third importance score Some portions of the detailed description are presented in terms of algorithms and symbolic representations of operations within a computer. These algorithmic descriptions and symbolic representations are the means used by those skilled in the data processing arts to convey the essence of their innovations to others skilled in the art. An algorithm is a series of defined steps leading to a desired end state or result. In example implementations, the steps carried out require physical manipulations of tangible quantities for achieving a tangible result. Unless specifically stated otherwise, as apparent from the discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing,” “computing,” “calculating,” “determining,” “displaying,” or the like, can include the actions and processes of a computer system or other information processing device that manipulates and transforms data represented as physical (electronic) quantities within the computer system’s registers and memories into other data similarly represented as physical quantities within the computer system’s memories or registers or other information storage, transmission or display devices. Example implementations may also relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may include one or more general-purpose computers selectively activated or reconfigured by one or more computer programs. Such computer programs may be stored in a computer readable medium, such as a computer readable storage medium or a computer readable signal medium. A computer readable storage medium may involve tangible mediums such as, but not limited to optical disks, magnetic disks, read-only memories, random access memories, solid-state devices, and drives, or any other types of tangible or non-transitory media suitable for storing electronic information. A computer readable signal medium may include mediums such as carrier waves. The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Computer programs can involve pure software implementations that involve instructions that perform the operations of the desired implementation. Various general-purpose systems may be used with programs and modules in accordance with the examples herein, or it may prove convenient to construct a more specialized apparatus to perform desired method steps. In addition, the example implementations are not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the example implementations as described herein. The instructions of the programming language(s) may be executed by one or more processing devices, e.g., central processing units (CPUs), processors, or controllers. As is known in the art, the operations described above can be performed by hardware, software, or some combination of software and hardware. Various aspects of the example implementations may be implemented using circuits and logic devices (hardware), while other aspects may be implemented using instructions stored on a machine-readable medium (software), which if executed by a processor, would cause the processor to perform a method to carry out implementations of the present application. Further, some example implementations of the present application may be performed solely in hardware, whereas other example implementations may be performed solely in software. Moreover, the various functions described can be performed in a single unit, or can be spread across a number of components in any number of ways. When performed by software, the methods may be executed by a processor, such as a general-purpose computer, based on instructions stored on a computer readable medium. If desired, the instructions can be stored on the medium in a compressed and/or encrypted format. Moreover, other implementations of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the teachings of the present application. Various aspects and/or components of the described example implementations may be used singly or in any combination. It is intended that the specification and example implementations be considered as examples only, with the true scope and spirit of the present application being indicated by the following claims.

Claims

CLAIMS WHAT IS CLAIMED: 1. A method comprising: collecting a set of physical sensor data; generating a set of virtual sensor data by applying a physics-based model to a subset of the set of physical sensor data; identifying a first set of features from the set of physical sensor data and the set of virtual sensor data by performing at least one of a sampling operation, an aggregating operation, or a feature derivation operation on the set of physical sensor data and the set of virtual sensor data based on an optimized sampling rate or optimized aggregation statistics; identifying, using a first set of machine-trained models applied to the first set of features, a set of anomaly detection scores, a first set of contributing factors to the set of anomaly detection scores, and a second set of features that have a feature importance score that is above a threshold; generating at least one failure prediction model based on the first set of machine-trained models; and applying the at least one failure prediction model to the set of physical sensor data and the set of virtual sensor data to calculate a likelihood score related to a predicted failure of an asset.

2. The method of claim 1 further comprising: identifying a second set of contributing factors for the at least one failure prediction model by applying a first explanatory model for the at least one failure prediction model to a set of predicted failure scores generated by the at least one failure prediction model.

3. The method of claim 2, wherein the second set of contributing factors comprises a set of contributing physical sensors and identifying the second set of contributing factors comprises: identifying a set of contributing features for the at least one failure prediction model by applying the first explanatory model for the at least one failure prediction model to the set of predicted failure scores and applying a second explanatory model for the first set of machine- trained models; calculating a second importance score for each contributing feature in the set of contributing features; mapping each contributing feature in the set of contributing features to one or more sensors in the set of contributing physical sensors; and calculating a third importance score for each physical sensor in the set of contributing physical sensors based on the second importance score of at least one contributing feature in the set of contributing features mapped to the physical sensor in the set of contributing physical sensors.

4. The method of claim 1, wherein the set of physical sensor data includes a set of high- frequency data that is sampled by the sampling operation.

5. The method of claim 1, wherein the first set of machine-trained models comprise a set of anomaly detection models.

6. The method of claim 5, wherein the set of anomaly detection models are trained based on the set of physical sensor data and the set of virtual sensor data.

7. The method of claim 1, wherein the at least one failure prediction model comprises at least one short-term failure prediction model based on a machine-learning operation applied to a set of anomaly detection scores, the first set of contributing factors to the set of anomaly detection scores, and a second set of features generated by the first set of machine-trained models.

8. The method of claim 7, wherein the at least one short-term failure prediction model comprises a short-term failure prediction model generated for each individual asset.

9. The method of claim 1, wherein the at least one failure prediction model comprises at least one long-term failure prediction model, wherein generating the at least one long-term failure prediction model is further based on a second set of machine-trained models, and wherein the second set of machine-trained models are based on the first set of machine-trained models.

10. The method of claim 1, wherein at least one of the optimized sampling rate or the optimized aggregation statistics is computed using one or more of a Bayesian optimization, a grid search, or a random search.

11. The method of claim 1, wherein a first of the optimized sampling rate or the optimized aggregation statistics is based on domain knowledge and a second of the optimized sampling rate or the optimized aggregation statistics is computed based on one or more of a Bayesian optimization, a grid search, or a random search.

12. An apparatus comprising: a memory; and a set of processors coupled to the memory that, when executing a program stored in the memory is configured to: collect a set of physical sensor data; generate a set of virtual sensor data by applying a physics-based model to a subset of the set of physical sensor data; identify a first set of features from the set of physical sensor data and the set of virtual sensor data by performing at least one of a sampling operation, an aggregating operation, or a feature derivation operation on the set of physical sensor data and the set of virtual sensor data based on an optimized sampling rate or optimized aggregation statistics; identify, using a first set of machine-trained models applied to the first set of features, a set of anomaly detection scores, a first set of contributing factors to the set of anomaly detection scores, and a second set of features that have a feature importance score that is above a threshold; generate at least one failure prediction model based on the first set of machine- trained models; and apply the at least one failure prediction model to the set of physical sensor data and the set of virtual sensor data to calculate a likelihood score related to a predicted failure of an asset.

13. The apparatus of claim 12, wherein the at least one processor is further configured to: identify a set of contributing features for the at least one failure prediction model by applying a first explanatory model for the at least one failure prediction model to the set of predicted failure scores and applying a second explanatory model for the first set of machine- trained models; calculate a second importance score for each contributing feature in the set of contributing features; map each contributing feature in the set of contributing features to one or more contributing physical sensors; calculate a third importance score for each of the one or more contributing physical sensor based on the second importance score of at least one contributing feature in the set of contributing features mapped to the physical sensor of the one or more contributing physical sensors; and identify a subset of contributing physical sensors of the one or more contributing physical sensors as a second set of contributing factors for the at least one failure prediction model based on the third importance score.

14. The apparatus of claim 12, wherein the set of physical sensor data includes a set of high-frequency data that is sampled by the sampling operation.

15. The apparatus of claim 12, wherein the first set of machine-trained models comprise a set of anomaly detection models.

16. The apparatus of claim 15, wherein the set of anomaly detection models are trained based on the set of physical sensor data and the set of virtual sensor data.

17. The apparatus of claim 12, wherein the at least one failure prediction model comprises at least one short-term failure prediction model based on a machine-learning operation applied to a set of anomaly detection scores, the first set of contributing factors to the set of anomaly detection scores, and a second set of features generated by the first set of machine-trained models.

18. The apparatus of claim 17, wherein the at least one short-term failure prediction model comprises a short-term failure prediction model generated for each individual asset.

19. The apparatus of claim 12, wherein the at least one failure prediction model comprises at least one long-term failure prediction model, wherein generating the at least one long-term failure prediction model is further based on a second set of machine-trained models, and wherein the second set of machine-trained models are based on the first set of machine-trained models.

20. The apparatus of claim 12, wherein at least one of the optimized sampling rate or the optimized aggregation statistics is computed using one or more of a Bayesian optimization, a grid search, or a random search and wherein a first of the optimized sampling rate or the optimized aggregation statistics is based on a domain knowledge and a second of the optimized sampling rate or the optimized aggregation statistics is computed based on one or more of the Bayesian optimization, the grid search, or the random search.