WO2021257610A1 - Time series forecasting and visualization methods and systems - Google Patents

Time series forecasting and visualization methods and systems Download PDF

Info

Publication number
WO2021257610A1
WO2021257610A1 PCT/US2021/037489 US2021037489W WO2021257610A1 WO 2021257610 A1 WO2021257610 A1 WO 2021257610A1 US 2021037489 W US2021037489 W US 2021037489W WO 2021257610 A1 WO2021257610 A1 WO 2021257610A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
procurement
machine learning
producing
future
Prior art date
Application number
PCT/US2021/037489
Other languages
French (fr)
Inventor
Todd FLORES
Michelle ROJO
Original Assignee
Spartan Capital Intelligence, Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Spartan Capital Intelligence, Llc filed Critical Spartan Capital Intelligence, Llc
Publication of WO2021257610A1 publication Critical patent/WO2021257610A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
    • G06Q10/06375Prediction of business process outcome or impact based on a proposed change
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management

Definitions

  • the field of the present invention is related to analysis and planning based upon the vast amounts of publicly available information, such as time series information. More particularly, financial analysis, which incorporates both internal and external factors affecting a company, is typically performed by individuals using spreadsheet programs, but the results are not typically properly aggregated, analyzed, and reported; and inferences made from the information is typically little better than a wild guess.
  • Systems and methods are disclosed for various implementation and embodiments of an artificial intelligence framework, utilizing machine learning, to aggregate publicly available information, format the information into a useful data form, standardize the data, analyze the data, generate new data by making predictions, such as future financial data, and deliver the new data in a useful and beneficial form.
  • procurement data from an organization is an invaluable source of input information that allows highly targeted financial modeling of select supplier organizations.
  • Financial analysis and company valuation are activities that require a high degree of professional training and involve a systematic process of incorporating changes in the economic landscape, significant global events, competitors’ actions, key management decisions into a company’s past financial reports in order to create projected financial forecasts that ultimately lead to the company’s expected future value that, in turn, affects the value at which the company’s stocks are traded at the exchanges.
  • Some embodiments described herein aim at doing these activities by leveraging on the power of machine learning to recognize trends and patterns in qualitative and quantitative formats and generate forecasts based upon the recognized trends and patterns.
  • a machine learning system is configured with instructions, that when executed, cause the system to receive procurement data associated with a future procurement from a consuming entity; receive historical procurement data from the consuming entity; receive historical data from producing entities, the historical data including one or more of past performance, financial data, economic or macroeconomic data, industry-specific data, news data, and social media data; determine, based upon the procurement data, one or more producing entities that are capable of fulfilling the future procurement; train, using one or more of the historical procurement data and the historical data from producing entities, a machine learning model to generate correlate the one or more of the historical procurement data and the historical data from producing entities with a likelihood that a first producing entity of the one or more producing entities that are capable of fulfilling the future procurement will be selected for a procurement activity by the consuming entity; determine, based at least in part on executing the machine learning model, a likelihood that the first producing entity will be selected for the future procurement activity by the consuming entity; and determine, based at least in part on the procurement data and the likelihood that the
  • the procurement data comprises one or more of budgetary allocations, contracts open for bid, contract requirements, period of performance, price, and payment terms.
  • the historical procurement data may include one or more of past contract awards, the identity of contract managers, identification of goods or services, contract awardees, and performance ratings of past contract awardees.
  • the financial data may include one or more of stock price, earnings, book value, specific line items and note disclosures from annual or quarterly financial reports, and operating income.
  • the machine learning system determines, based at least in part on the procurement data and the likelihood that a second producing entity will be selected for the procurement activity, an impact to the second producing entity.
  • the impact to the first producing entity may include conducting a discounted cash flow analysis.
  • determining the impact to the first producing entity comprises forecasting one or more of revenue, cost of goods sold, expenses, interest, tax, non-cash expenses, financial ratios, and discount rate.
  • the instructions further cause the system to receive non-standard data and tags associated with the non-standard data and apply one or more machine learning algorithms to correlate the non-standard data tags to previously-defined data tags to standardize the non-standard data.
  • the incoming data may be standardized and stored as standardized data in a format and with tags that the system can readily process as a later time.
  • the instructions cause the system to execute a standardization module and standardize the non-standard data.
  • the data can be stored in a data lake for later consumption.
  • the instructions may further cause the system to apply a taxonomy to the historical data from producing entities, and the historical data from producing entities may be stored in a data lake.
  • the data lake allows data to be stored for later analysis and to create a repository of historical data that can be used to further train one or more machine learning models.
  • one or more of the historical procurement data and the historical data from producing entities is time-series data.
  • the time-series data is weighted based upon a date stamp associated with the time-series data. For example, recent data may be more meaningful than older data and the time-series data can be appropriately weighted.
  • a sentiment analysis of the time-series data is executed to determine subjective information.
  • the subjective information may include, without limitation, public perception, complaints, sales effectiveness, customer satisfaction, opinions, brand recognition, and emotion detection.
  • the system may further determine a veracity score for one or more of the historical procurement data and the historical data from producing entities.
  • the veracity score may be based upon source, time, historical veracity, among other things.
  • determining the impact to the first producing entity includes forecasting one or more of a future stock price, a future EBITDA, or a future cash on hand, among other forecasts.
  • FIG. 1 illustrates an example system architecture of a time series forecasting system, in accordance with some embodiments
  • FIG. 2 illustrates an example of computing resources for implementation a time-series forecasting system, in accordance with some embodiments
  • FIG. 3 illustrates a process flow diagram for a time-series forecasting system, in accordance with some embodiments
  • FIG. 4 illustrates a process flow diagram for a time-series forecasting system, in accordance with some embodiments
  • FIG. 5 illustrates a process flow diagram for a time-series forecasting system, showing some exemplary machine learning algorithms, in accordance with some embodiments
  • FIG. 6 is a simplified conceptual diagram of an example process for data capture and aggregation, in accordance with some embodiments.
  • FIG. 7 illustrates an example process for data standardization, in accordance with some embodiments.
  • FIG. 8 shows a representative mapping using the described logic and formulas to standardize the incoming data by mapping the unstandardized XBRL data tags to a standard form, in accordance with some embodiments.
  • the processes described herein are made more efficient by a standard reporting language that may be adopted by procurements, financial reports, business metrics, and others.
  • a standard reporting language that may be adopted by procurements, financial reports, business metrics, and others.
  • One such standardized language is extensible Business Reporting Language (XBRL).
  • XBRL is a software standard to harmonize the way that financial information is communicated, which makes it more efficient to compile and share this type of data. It may be considered a domain-specific species of XML.
  • XBRL implements tags to define each piece of data.
  • the taxonomy may correspond to accepted accounting standards, such as generally accepted accounting principles (GAAP) or some variant taxonomy that may incorporate GAAP tags, which can allow data to be easily shared between entities, and even between countries that may have very different business reporting standards.
  • data may be captured, aggregated, stored, distributed, or otherwise manipulated, generated, created, and shared as XBRL or inline XBRL (iXBRL) tagged data.
  • iXBRL inline XBRL
  • Data formatted as XBRL data is used not only by companies, but has also been adopted by governments, and is continuing to gain momentum as the preferred reporting language of producer entities, consumer entities, media entities, and governmental entities.
  • the adoption of XBRL reporting standards increases the ability to efficiently prepare, validate, publish, exchange, consume, and analyze the data prepared according to the adopted standards.
  • a system 100 is shown that is configured to capture, aggregate, format, standardize, analyze, generate, and distribute predictive data.
  • this system may utilize time-series data in the prediction of future enterprise value, earnings before interest and taxes (EBIT), earnings before interest, taxes, depreciation and amortization (EBITDA), or some other financial metric of an entity.
  • a data lake 102 is a centralized repository used to loosely store structured and unstructured data. The information in the data lake 102 may be populated from numerous sources and contain numerous types of data such as, for example, procurement data 104, company data 106, news data 108, and historical data 110, among others.
  • the data lake 102 may be in communication with one or more remote computing resources 112, which may comprise one or more server computers 114(1), 114(2), 114(P), or a distributed server farm, or cloud storage, or some other computing architecture.
  • the remote computing resources 112 may have one or more processors 116 and memory 118.
  • the memory 118 may store one or more modules 120 that are executed by the processors 116 to carry out many of the instructions, routines, tasks, and operations described herein.
  • the processor(s) 116 may include a central processing unit (CPU), a graphics processing unit (GPU), both CPU and GOU, or other processing units or components known in the art. Additionally, each of the processor(s) 116 may possess its own local memory, which also may store program modules, program data, and/or one or more operating systems.
  • the processor(s) 116 may include multiple processors 116 and/or a single processor 116 having multiple cores.
  • the remote computing resources 112 may be a computing infrastructure of processors 116, storage (e.g., memory 118), software (e.g., modules 120), data access, and so forth that is maintained and accessible via a network, such as the internet.
  • the remote computing resources 112 may not require end-user knowledge of the physical location and configuration of the system that delivers the services. Common expressions associated for these remote computing resources 112 may include “on-demand computing”, “software as a service (SaaS)”, “platform computing”, network-accessible platform”, “cloud services”, “data centers”, and so forth.
  • the memory 118 may include computer readable storage media (CRSM), which may be any available physical media accessible by the processor(s) 116 to execute instructions stored on the memory 118.
  • CRSM may include random access memory (RAM) and Flash memory.
  • RAM random access memory
  • CRSM may include, but is not limited to, read-only memory (ROM), erasable programmable ROM (EPROM), electrically erasable programmable read-only memory (EEPROM), or any other medium which can be used to store the desired information, and which can be accessed by the processor(s) 116.
  • the memory 118 may include an operating system, and one or more modules 120.
  • the memory 118 may be a physical memory device, which physically embodies the modules and instructions, and is non- transitory computer readable memory.
  • the remote computing resources 112 may be communicatively coupled to the data lake 102 via wired technologies (e.g., CAT5, USB, fiber optic cable, etc.), wireless technologies (e.g., RF, cellular, satellite, Bluetooth, etc.), or other suitable connection technologies.
  • wired technologies e.g., CAT5, USB, fiber optic cable, etc.
  • wireless technologies e.g., RF, cellular, satellite, Bluetooth, etc.
  • the data lake 102 is stored on the memory 118 of the remote computing resources 112.
  • a user device 122 associated with a user 124 may be able to access the remote computing resources 112, the data lake 102, or one or more of the modules 120 stored on the memory.
  • the user device 122 may receive one or more outputs of the one or more modules 120 as will be described herein.
  • FIG. 2 an example of the remote computing resources 112 is illustrated in which the remote computing resources 112 have one or more processor(s) 116, and non-transitory computer-readable memory 118 that stores modules.
  • the memory stores an operating system 202 and one or more of an appropriation data module 204, a request for proposal module 206, a funding opportunities module, a public financial data module 210, a news and media module 212, a processing engine module 214, a standardization module 216, and a data analysis module 218.
  • the modules may not necessarily be broken down, as some of the instructions may be combined into a single module or shared by multiple modules. Moreover, not all of the illustrated modules may be present in each embodiment. Similarly, some embodiments will have more than the illustrated modules.
  • Some of the modules may cause the processor(s) 116 to collect information to be stored in the data lake 102.
  • the data may be pulled by the one or more modules, such as by conducting searching, data scraping, fetching, or some other source of pulling data, or may be pushed by the reporting system and received by the data lake 102, such as by subscriptions, RSS feeds, or some other pushed data type.
  • the appropriation data module 204 may be configured to cause the processor(s) 116 to acquire data that describes one or more appropriations, i.e., the designation of money for particular uses. In some cases, this appropriation data may be associated with government appropriations, and may be made available by government agencies.
  • the request for proposal module 206 may be configured to cause the processor(s) 116 to acquire data corresponding with an entities’ solicitation of proposals related to procurement of a commodity, service, asset. Examples may include published RFP or RFQ.
  • the funding opportunities module 208 may include instructions that cause the processor(s) 116 to receive data associated with funding opportunities and may include available government grants. The data received by the funding opportunities module 208 may include the goals, deadlines, eligibility, and reporting associated with the funding opportunity.
  • the public financial data module 210 may include instructions that cause the processor(s) to receive data associated with publicly available information regarding an entity, such as a company, an organization, an enterprise, an agency, or some other structure.
  • the publicly available information may include data associated with required reports (e.g., annual reports), non-required reports (e.g. academic publications, commercial publications, project reports, which may all be considered “gray literature”).
  • This information may include data produced by the subject entity, or by another entity about a company. In some cases, this information may be provided in an XBRL formatted report or database. In some cases, this may also include macroeconomic and market data that affects the company such as but not limited to GDP (country or global), market indexes and industry-wide revenue forecasts that are readily available from trusted public databases.
  • the news and media module 212 may include instructions that cause the processor(s)
  • news and media module 212 may perform natural language processing on the incoming data to segment the data into useful pieces of information.
  • the incoming data is analyzed for semantics, sentiment, causation, correlation, sensitivity, and others to provide meaningful analysis of the incoming new and media data. Any type of news data may be analyzed to provide data points for later analysis.
  • news data may include force majeure events, societal events, social unrest events, political events, recent technological innovation, disruption and/or advancements, management decisions, or other events of or surrounding a particular procuring or producing company, as well as others that may have a tendency to cause fluctuations in the future value of one or more companies.
  • the processing engine module 214 may be responsible for converting incoming raw data into a more useful file format. In some cases, the processing engine module 214 processes the incoming data, such as by determining a schema used to create the data, applying a taxonomy or tagonomy, and storing the data in a data warehouse.
  • the standardization module 216 in conjunction with the processing engine module 214, may include instructions that cause the processor(s) 116 to apply a schema to incoming data to format the data into a consistent format for later analysis and correlation.
  • the data analysis module 218 may include instructions that cause the processor(s)
  • the described modules and operations allow the system to look at thousands, and in some cases, hundreds of thousands, millions, or billions, of data points on a selected company, and perform a future forecast valuation.
  • a discounted cash flow (DCF) valuation is performed on a company to forecast the future value of a present investment into the company.
  • a discounted cash flow is a valuation methodology used to forecast the value of an investment into a company based upon its future cash flow.
  • a DCF analysis attempts to predict the future value of an investment today based on projections of how much money it will generate at a future point in time.
  • the described modules each receive data that can be mined and used as inputs into the DCF valuation to forecast a companies’ future cash flow.
  • the formula for a DCF valuation is as follows:
  • CFn is for additional years, and r is the discount rate.
  • One of the disadvantages with typical DCF valuation is that is relies on making numerous assumptions about future cash flows from a project, and further is subject to variability in market demand, economy, and other unforeseen obstacles.
  • a system is able to precisely anticipate the market demand and the cash flow stream, and it can determine the entities that are eligible to receive the cash flow stream, it can intelligently predict which entity is most likely to receive the cash flow stream and thereby predict a DCF valuation with a certain level of accuracy.
  • FIG. 3 illustrates an example process flow diagram 300 for a time-series forecasting system.
  • time-series data generally refers to a series of data points indexed or ordered by time.
  • time-series data is a sequence of data points taken at spaced points in time.
  • time-series data may be a single data point that is associated with a date or date stamp.
  • the date stamp may be used as a factor in weighting the data, for example, more recent data may carry a greater weight than older historical data.
  • the data may be captured through any suitable method or mechanism, such as those described herein, and may be data that is pushed or pulled into the system by one or more other systems.
  • the time-series data may include, without limitations, financial statements, government database information, contracts open for bid, contract terms of contracts open for bid (e.g., qualification, statements of work, period of performance, milestones, payment schedules, identification of contracting officers, program managers, among others), contracts previously awarded, requests for support announcements, funding opportunity announcements, grant announcements, taxation, budgetary allocations, leading economic data, lagging economic data, coincident economic data.
  • the time-series data may include news and media information such as, without limitation, open source news feeds, subscription feeds, RSS feeds, social media posts, company announcements, and more.
  • the incoming data is very granular and extensive, and may include, without limitation:
  • Share statistics such as: Altman Z score using the average stock information for a period, such as the last twelve months (LTM); Basis weighted average shares outstanding LTM, beta, Beta - 1 year LTM, Company, Volume (e.g., intraday trading), diluted weighted average shares outstanding LTM, ECS total common shares outstanding LTB, ECS total shares outstanding on filing date LTM, Price (e.g., end of day), volume (e.g., end of day), 52- week percent change, price (intraday), change, last close 52-week high, last close 52-week low, last close BS shout, last close indicated dividend or stock price, last close market cap, last close market cap / EBT excluding unusual items, last close price, last close price per earnings, last update time, LT debt/equity, net debt / EBITDA percent shares held by insiders, percent of shares held by institutions, percent change, region, return on total capital, symbol, ticker shares outstanding, total assets, total capital, total cash, short term investments, total common
  • Valuation Measures such as: book value / share, current ratio, EBITDA / Interest Expense, EBITDA / margin %, EBIT / interest expense, market cap (intraday), last close market cap / total revenue, last close price / book value, last close price / tangible book value, last close total enterprise value (TEV) / EBIT, last close TEV / EBITDA, at close TEV / total revenue, operating cash flow to current liabilities, price/earnings to growth (PEG) ratio (e.g.,
  • Cash Flow Statements such as: capital expenditure, cash and equivalents, cash from operations, cash from operations, 1 Yr. Growth %, cash income tax paid (refund), cash interest paid, days outstanding inventory, days sales outstanding, levered free cash flow, levered free cash flow, 1 Yr. Growth %, and unlevered free cash flow, among others.
  • ESG Scores such as: environmental score, ESG score, governance score, highest controversy, peer group, and social score, among others.
  • Financial Highlights such as: gross profit margin %, operating income, return on assets, and return on equity %, debt-to-equity ratio, weighted average cost of capital or discount rate, among others.
  • the aforementioned data can be collected, cleaned, standardized, and aggregated, such as in the data lake 102, and at block 304, the time-series data is analyzed.
  • the data analysis may be performed by the remote computer resources 112, or by some other system that is configured with instructions to manipulate, filter, generate, and create data, using any appropriate module or formula, and may include, without limitations, executing inserts and queries 310, smoothing 312, approximation 314, interpolation 316, or a combination of these, and other, operations. Additional operations may include, without limitations, one or more of ratio analysis, cause and effect relationship between financial and nonfmancial variables, decision trees, and random forest, to name a few.
  • the data is cleaned and transformed as preparation for further analyses.
  • this process may include, without limitation, the set-up of the database itself; creating, separating and merging of data tables; as well as adding, cleaning and transforming metadata.
  • the process may include, but is not limited to, cleaning data of leading or trailing spaces; replacing non-standard alphanumeric or other symbols to facilitate machine readability and further processing; standardizing column headers; converting columns to the same unit standard; adding additional columns to enhance information; splitting individual columns into multiple ones; merging different data points together; highlighting and - if deemed necessary - correcting outlier or otherwise inconsistent data; and rounding, approximating, smoothing or interpolating data values in line with the objectives of the analyses.
  • NLP natural language processing
  • sentiment analysis time-series momentum analysis
  • trend analysis trend analysis
  • other methodologies such as sentiment analysis, time-series momentum analysis, trend analysis, and other methodologies.
  • the data is stored in a way to make it accessible for further analyses through specialized data analytics and visualization tools, and as a basis for machine learning.
  • the time-series data may additionally have sentiment analysis performed to determine whether the data about a particular company or project is positive or negative, which may provide additional data points that can be used in the subsequent analysis and prediction steps.
  • natural language processing may be applied to search for keywords, conduct key phrase extraction, sentiment detection, sentiment analysis, determine bias, entity recognition, topic modeling, date stamps, and otherwise derive meaningful data points to recognize and predict how a company is performing.
  • some pre trained models and libraries may be used and in some cases, the pretrained models may be modified to improve the model performance and made suitable for finance-related context.
  • the data may be evaluated for likely veracity.
  • the data may be scored based upon a probability that the received data is from a reliable source or that the information in the data is reliable. For example, a financial report for a particular company that is submitted to the Securities and Exchange Commission (SEC) may be scored to be a highly reliable source since it is data being submitted to a governmental agency, and falsehoods in the data are known to come with consequences. Similarly, where the data is known to be an annual report from a company, the content of the data may be correlated with other forms of data, and therefore, the content may have a relatively high veracity rating. [0066] In some embodiments, a rating system includes one score for the source reliability, and a second score for the information content.
  • SEC Securities and Exchange Commission
  • the information content is evaluated and scored based upon a predetermined scoring system.
  • machine learning is applied to the data. Any of a number of machine learning algorithms can be applied singularly or iteratively, to the data to make intelligent predictions.
  • a set of algorithms 318 may be used to implement machine learning on the data set, which may be used to determine trends 320, patterns 322, anomalies 324, event dates, and date stamps, and other characteristics such as cause and effect analysis by determining what financial and non-fmancial variables have a strong impact on future financial performance; extracting ‘multiplier factors’ that embody the impact of qualitative information related to the economic and political climate and major global events to a company’s value.
  • the machine learning methods in accordance with any of the embodiments disclosed herein may encompass a variety of approaches, including supervised and unsupervised methods.
  • the machine learning methods may be performed using, for example, neural networks, deep neural networks, support vector machines (SVMs), decision trees, Markov models, Bayesian networks, reinforcement-based learning, cluster-based learning, and any other suitable machine learning strategies now known in the art, or later developed.
  • SVMs support vector machines
  • decision trees Markov models
  • Bayesian networks Bayesian networks
  • reinforcement-based learning cluster-based learning
  • cluster-based learning and any other suitable machine learning strategies now known in the art, or later developed.
  • Training Data is a data set that can be used to train a learning machine. Regardless of whether the class of the data is known or unknown, the data may be adequate for training a learning machine if it includes, for example, at least one positive example for each class, and optionally, at least one negative example for each class.
  • This type of binary classifier may be useful, for example, when determining the set of potential producing entities that may be able to meet the procurement requirements, or for determining the outcome of a binary decision, such as whether a particular company’s previous bid was selected in relation to a procurement contract.
  • machine learning encompasses any of several methods, devices, and/or other features which are optimized to perform a specific informational task (such as classification, prediction, or regression) using a limited number of examples of data of a given form, and are then capable of exercising this same task on unknown data of the same type and form.
  • the machine e.g., a computer
  • the result of the training is a model that can be used to make predictions based on new data.
  • unsupervised learning occurs when training data is not necessarily labelled to reflect the true result, i.e., there is no indication within the data itself as to whether the data belongs to a class or exhibits a pattern.
  • Unsupervised learning techniques may include, without limitation, reinforcement-based learning, association-based rules, cluster-based learning, and the like.
  • time series forecasting can be addressed as a supervised learning problem, and numerous machine learning tools can be used.
  • Supervised learning occurs when training data is labelled to reflect the true result, i.e., that the data belongs to a class or exhibits a pattern.
  • Supervised learning techniques may include neural networks, nearest neighbor, naive Bayes, linear regression, SVMs, decision trees, random forests, XGBoost, hidden Markov models, and Bayesian networks, among others.
  • the systems and methods described herein may utilize unsupervised learning, supervised learning, or a combination of both unsupervised and supervised learning.
  • one or more of unsupervised learning and supervised learning is performed iteratively on the data.
  • the system predicts a future impact of the time-series data.
  • the prediction will never be accurate with time series forecasting; however, the level of uncertainty is important to a time-series forecast.
  • the level of uncertainty is greatly reduced where the system receives data relating to the consumer and its budget for predetermined goods or services.
  • the goods or services are a known quantity
  • the budget for certain goods or services is likewise a known quantity
  • the predominant variables relate to which producer will receive the business from the consumer.
  • the systems described herein will be able to predict, with a reasonable uncertainty, which producers are likely to receive the business of a known consumer, which may include the total cost, the payment schedule, the cost of goods or raw materials, etc. Then the system can forecast the impact on the producer for securing that business of the consumer. The prediction of the future impact may result in forecasting one or more of future stock price, future cash on hand, future EBITDA, or some other financial metric.
  • FIG. 4 illustrates a process flow diagram for a time-series forecasting system 400.
  • the system determines procurement opportunities of an organization. As described, in some embodiments, data associated with procurement opportunities is available, which may include, a description of goods or services desired, a timeframe for acquisition, a budget for acquisition, a payment schedule, qualifications of the producer entity, and so forth.
  • the system 400 determines where the organization historically spends its money. For example, some consumer entities make past acquisition information publicly available.
  • the publicly available details may include, without limitations, an identity of the contracting officer, the identity of the program manager, the total budget for the contract, the payment terms, the statements of work, the contract terms, awardees of government contracts and grants, and other information which may be beneficial to a time series forecasting system.
  • past procurement decisions for individual contract officers or program managers can be determined and used to influence the system 400 to search for patterns in the past procurement decisions.
  • the historical information may be weighted and used by the system 400 to influence its prediction of a future award.
  • the system determines the eligible companies. For example, based upon a scope of work, type of procurement, or identification of goods or services, a finite number of companies are likely to be able to fulfill the contract terms. As an example, where the procurement contract is for a unique, one-of-a-kind, or two-of-a-kind products, then the number of possible companies that fill such a contract may be a known quantity and identifying the specific companies that are able to fulfill the procurement contract are known. [0079] The system aggregates data regarding the type of goods and services a company is able to perform and, over time, this data is modified as a company adds or removes product or service offerings.
  • the data may come in the form of past performance, such as contracts won and performed, financial data, such as quarterly or annual reports, media data, and social information, to name a few.
  • a procuring entity releases a request for proposal for a specific good, such as an airplane for example
  • the system will have historical aggregated data relating to which companies are capable of manufacturing and delivering an airplane that meets the proposal requirements.
  • the request for proposal may include a significant amount of data regarding procurement, such as the type of airplane, performance characteristics, time schedule, quantity, and financial remuneration schedule, among other things.
  • the system can aggregate this procurement data and determine a set of companies that are capable of meeting the contract requirements.
  • the system will have historical data regarding whether the companies within the set of eligible companies have completed similar procurement activities in the past, and a measure of their performance on these procurement past activities. For example, the system can determine whether a company associated with a past procurement activity was able to meet the quantity, quality, and cost factors of a procurement contract and whether the procuring agency was satisfied with the performance of the work. Further, the system can determine whether the company was selected for additional procurement activities.
  • the system can determine patterns which help to forecast likely future values.
  • the system may determine that in a procurement contract for a particular type of goods, one individual contracting officer associated with this type of procurement contract has a pattern of selecting one company over another company to perform the work. In this case, the system will determine the pattern which weighs in favor of the contracting officer selecting a particular company a higher percentage of the time. Based on these discovered patterns, the system can estimate the likelihood that the company will be selected again for a future procurement activity and, based upon known procurement contract details, determine a forecasted future impact to the company, such as by using a DCF model.
  • the system determines which eligible companies can meet the procurement requirements.
  • the procurement requirements may specify a statement of work, timetables, any cost-sharing, and the goods or services. Based upon this information, the system may narrow the eligible companies that are likely able to satisfy the procurement requirements.
  • the system determines a probability that each of the eligible companies will be awarded the contract or grant. This may be based, at least in part, on the data the system receives as an input to the decision matrix, which includes the historical data associated with the procuring agency, the contracting officer, the program manager, the statement of work, and past performance of each eligible company, etc.
  • the system determines the future impact on the selected company.
  • the future impact may include an expected cashflow from the procurement contract, which will invariably impact the selected companies’ financial statements, potentially for many years down the road, such as in the case of a multi-year procurement contract award.
  • a DCF analysis is performed utilizing hundreds, thousands, hundreds of thousands, millions, or billions, or more data points collected on a company in order to forecast a future impact on the company.
  • USG United States Government
  • DAIMS DATA Act Information model schema
  • the USG publicly describes the contract type of available submissions, including scope of projects, identification of goods or services, payment terms, and contract price.
  • the DAIMS architecture allows the system 400 to capture and analyze this procurement data automatically and very efficiently.
  • the USG provides very granular data which the system can access and make serious inferences as to future awards.
  • the data points on a particular company may range in the hundreds of thousands, and each of these data points can be appropriately weighted and used to predict a future impact on a company.
  • the USG is not the only consuming entity that publishes its procurement activities.
  • Other entities such as universities, public entities, government and private entities of other countries, among others, may all provide procurement information that can be aggregated, analyzed, and used to forecast an impact to a probable contract awardee.
  • the incoming data is multi-dimensional, which has historically been difficult to process.
  • the incoming data is harmonized into a hypercube multi-dimensional data format to capture and aggregate the useful data.
  • FIG. 5 illustrates a process flow diagram for a time-series forecasting system 500.
  • the system 500 may be similar, or the same system, as shown in FIGS 1, 2, 3, and 4.
  • data is captured and aggregated according to any method.
  • data is retrieved, received, or both, from publicly available sources that may include financial data, procurement data, news media data, social media data, subscription data, RSS feed data, and other.
  • the data can be aggregated in a data lake, as previously described, and may be stored as a hypercube data array or as a simplified data table.
  • the incoming data may be formatted, such as by retrieving a schema associated with the data and storing the data according to the schema.
  • a schema may refer to a taxonomy that the data is organized according to, and is a common way of sharing data presented in an extensible Markup Language (XML), such as XBRL.
  • XML extensible Markup Language
  • the incoming data will indicate a schema that the data uses, and the schema and data can be loaded and associated together and the data lake can format the incoming data according to the schema.
  • the data is standardized and normalized.
  • Data is standardized through a series of executable scripts that read through the raw data organized according to schema in a tabular format and apply a list of logics and formulas from a continuously developing library that outlines how to treat and transform the raw data into a new set of standardized data necessary for financial analysis and algorithm development.
  • these executable scripts create a new set of standardized data in an easy to query format.
  • the logics and formulas library may be manually maintained with the logics and formulas added and edited whenever new data comes in, or may leverage on machine learning to understand how the data are classified, renamed and aggregated and automatically develop logics and formulas to generate new standardized data that from new, unclassified and unaggregated raw data.
  • a combination of manual and automated processes may be used, and in some cases, a machine learning algorithm may be trained through any suitable training methodology to standardize and normalize incoming raw data.
  • the data is stored in a data warehouse.
  • the data warehouse may include any suitable data structure, and in some cases, is stored in a relational database or in a relational data stream management system.
  • the database adheres to one or more structured query language (SQL) standards to allow for efficient writing, retrieving, sorting, querying, and displaying of data.
  • SQL structured query language
  • the data is stored in a time series database service in which data is stored in a time order form.
  • time series database services offer built-in analytics, such as smoothing, approximation, and interpolations, as well as adaptive query processing to make analyzing time series data highly efficient.
  • one or more machine learning algorithms can be applied.
  • a set of algorithms 512 which may be stored in memory and when executed by one or more processors, cause the processors to perform acts according to the one or more algorithms.
  • Machine learning is a type of artificial intelligence that allows computer systems to gradually become more efficient and proficient at a specific task. In many cases, machine learning is facilitated by large amounts of data that the computer systems can apply statistical operations to make accurate predictions based on new inputs.
  • One or more algorithms stored in the memory allow collection of large amounts of data over time, and through an iterative process, generate more and more accurate predictions.
  • training data, or sample data is provided to the computer systems to train the system as to which outputs are consistent with the inputs.
  • the machine learning algorithms don’t simply review historical data and generate predictions based on current data, but rather, receive data about what an organization is going to do in the future (e.g., procurement data).
  • Procurement data may include data regarding what an organization is going to do, such as requests for proposal, statements of work, funding opportunity announcements, contract budgets, monetary allocations, and the like, and can greatly facilitate arriving at much more accurate predictions than simply relying on historical data to predict future outcomes. For instance, where the computing systems have data indicating that an organization has allocated a specific budget for a particular type of goods, the system can predict which entities are eligible and able to provide the particular type of goods on the requested schedule.
  • the computing system can further determine which of the eligible entities have strong performance records with the procuring organization. Even more granular, the computing system can determine the behavioral history of individuals associated with the procurement activities, such as, for example, determining whether a particular contracting officer or program manager has a tendency to select one entity over another based on historical contract data.
  • the set of algorithms 512 include any suitable algorithms, and may include one or more of neural networks 514, linear regression 516, nearest neighbor 518, Bayesian 520, clustering 522, (e.g., k-means clustering 524), natural language processing 526, sentiment detection 528, or other algorithms either alone or in combination.
  • Some additional algorithms that may be used singularly or in combination with one or more other algorithms include logistic regression, decision trees, random forest, and dimensionality reduction operations.
  • These various set of algorithms may be developed by using data coming from a single source (news, contracts, historical financial data); or by combining and merging data from different sources outlined; or by combining data sources and utilizing one or more of prediction, forecast, outcomes, or results coming from the algorithms developed at the early stages.
  • the set of algorithms 512 may weight particular data when making predictions.
  • the data may be weighted for any suitable characteristic, such as newness, source, reliability, authenticity, veracity, competency, magnitude, collaboration, and others.
  • the system generates new data associated with future predictions, such as forecasting one or more financial accounts, cashflows or other values of similar nature.
  • the generated data may indicate a present or future value of an investment into a particular company or project.
  • the forecast may result in a stock price of a company at one or more future points in time.
  • the generated data may be delivered for end use, such as consumption by an end user.
  • the generated data may be delivered to a computing device associated with an end user, and may be presented is graphical form, textual form or both.
  • the generated data may include advice based on the generated data, such as a buy or sell recommendation.
  • the generated data may indicate a forecasted value over time, such as a graph, chart, or some other indicia, that shows likely future trends.
  • the generated data may be fed back into the machine learning system and used to train the one or more machine learning algorithms. For instance, a predicted future impact may be fed back into the machine learning system and used to compare the prediction against ground truth data once it becomes available. In this way, the machine learning system can compare its predicted data against real-world data and modify the one or more machine learning algorithms based upon this comparison.
  • FIG. 6 illustrates an example process 600 for data capture and aggregation.
  • the illustrated process 600 can be used in conjunction with any of the computing systems or methods described herein.
  • Procurement data 602 which is typically publicly available is captured 604 through any suitable capture process, such as, for example, pushing, pulling, subscription, scraping, crawling, or some other methodology.
  • the captured data is run through a processing engine 606.
  • the processing engine 606 receives data formatted as XBRL according to one or more taxonomies.
  • the processing engine converts received data into an XBRL format based on one or more taxonomies, and may use one or more machine learning algorithms, such as natural language processing, to format the data appropriately.
  • the data is optionally standardized, such as by running the data through a standardization process 610, and the standardized data 612 is stored in a data warehouse 614.
  • the data warehouse 614 may be a data lake, a relational database, a time series database, or some other suitable data structure.
  • the processes 602-614 may be carried out automatically, without human intervention. This allows for millions or billions of data points to captured, aggregated, processed, standardized, and stored for later analysis.
  • the stored data in the data warehouse 614 can be analyzed according to any suitable machine learning algorithm, and forecasts and predictions can be made 616 and distributed 618 to interested stakeholders.
  • FIG. 7 illustrates an example process for data standardization 700, which may be used with any of the systems and methods described herein.
  • data files such as XBRL files 701, comma separated value files (CSV), spreadsheet files, or other file types, are provided by (or about) various companies.
  • CSV comma separated value files
  • financial statements are one type of data file that may be provided by a company in an XBRL format.
  • These files can be processed, at block 702, to standardize and harmonize the data, such as to create useful data points on the company to facilitate later analysis.
  • the data normalization process may take advantage of a defined schema that identifies and classifies pieces of data into a defined catalog. In some instances, data that is not initially identified can be categorized and associated with the catalog using one or more machine learning algorithms to understand and categorize the data. That is, unstandardized data tags can be identified and correlated with defined data tags.
  • the standardized data points 704 and dimensionality of the data is run through a designer algorithm 706 that applies a selected taxonomy 708, such as any of a number of suitable XBRL taxonomies.
  • a selected taxonomy 708 such as any of a number of suitable XBRL taxonomies.
  • the designer algorithm creates new formulas and logic in order to harmonize the data tags so the system can appropriately correlate, categorize, and understand the data tags, even if they are non-standard.
  • an XBRL processing engine 710 receives the data and loads the data into a data warehouse 712.
  • the data warehouse may be any suitable data storage architecture, such as those described herein.
  • a data file 714 is already standardized and comes with an identified taxonomy.
  • the data file 714 may immediately be processed by the XBRL processing engine 710 and stored in the data warehouse 712.
  • Zone maps may be used to facilitate data extraction and enhance data query performance. Some zone maps may be developed based on but are not limited to companies, period and frequently used raw or standardized financial data.
  • the data stored in the data warehouse 712 may be processed, analyzed, or distributed, as desired.
  • One or more machine learning algorithms may access the data stored in the data warehouse 712 and distribute data generated by the one or more machine learning algorithms.
  • FIG. 8 shows a representative mapping using the described logic and formulas to standardize the incoming data by mapping the various widely used GAAP -based and various types of XBRL extension data tags to a standard form.
  • the column XBRL IDs 820 contains XBRL data associated with an incoming file to the system.
  • the system may already realize that the file contains balance sheet data, or the file may be parsed, such as by reviewing the incoming XBRL data tags, or the data associated with the incoming file to determine that the file contains balance sheet data associated with a company.
  • the XBRL IDs 802 are correlated with Standardized Data Tags 804.
  • the Standardized Data Tags 804 can then be stored in the data lake for later retrieval and analysis.
  • the incoming data will include non-standard tags.
  • Non-standard tags may include tags that are not universally adopted, such as industry-specific tags, or company-specific tags, or country-specific tags.
  • an incoming data file may include an entry with the ID “ba CustomerFinancingCurrenf ’ 806.
  • This data or data tag although not used by other companies or other industries, is an allowed extension or customization of any known GAAP-based or other standard XBRL data tags.
  • the system may be trained, or otherwise learn, that this XBRL ID is associated with the standard data tag with ID “notereceivable” 808.
  • other XBRL IDs are associated with standardized data tags and stored in the data lake.
  • the generated data may be distributed to one or more computing devices associated with one or more users and/or presented on a display device.
  • the generated data may indicate a future stock price, a future range of stock prices, a recommendation to a user, such as a buy/sell recommendation, or some other indicia.
  • the recommendation may be provided as an alert, a text alert, a graphical indicator, and may be pushed to an end user, such as through SMS messaging, email, or some other form of notification.
  • the processor as disclosed herein can be configured with instructions to perform any one or more steps of any method as disclosed herein.
  • illustrated data structures may store more or less information than is described, such as when other illustrated data structures instead lack or include such information respectively, or when the amount or types of information that is stored is altered.
  • the various methods and systems as illustrated in the figures and described herein represent example implementations. The methods and systems may be implemented in software, hardware, or a combination thereof in other implementations. Similarly, the order of any method may be changed, and various elements may be added, reordered, combined, omitted, modified, etc., in other implementations.

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

Systems and methods aggregate data from a variety of sources, including procurement data, historical data, and company data, and based upon a future procurement from a consuming entity, forecast a financial impact to a producing entity. Data regarding a future, procurement can be used to forecast an impact to a multitude of producing entities. XBRL data including future procurement details, contract details, period of performance, qualification, historical performance, and other data can be aggregated, standardized, normalized, weighted, analyzed, and a machine learning system predicts which producing entities are able to satisfy the future procurement requirements, and the system predicts the likelihood of a producing entity being selected to perform the contract. A machine learning enabled financial analysis is performed to determine the financial impact to the producing entity performing the contract.

Description

TIME SERIES FORECASTING AND VISUALIZATION METHODS AND
SYSTEMS
RELATED APPLICATIONS
[0001] This application claims priority to United States Provisional Application
No. 63/039,158, filed June 15, 2020, the disclosure of which is hereby incorporated, in its entirety, by this reference.
BACKGROUND
[0002] The field of the present invention is related to analysis and planning based upon the vast amounts of publicly available information, such as time series information. More particularly, financial analysis, which incorporates both internal and external factors affecting a company, is typically performed by individuals using spreadsheet programs, but the results are not typically properly aggregated, analyzed, and reported; and inferences made from the information is typically little better than a wild guess.
[0003] With the ever-increasing volume and variety of information available, especially in electronic form, there is a need to be able to aggregate the data in a very loose structure, and perform synthetic and automatic analysis on the data to make future predictions with a high degree of accuracy and certainty. A machine learning-based system that automates the data acquisition, standardization, and analysis to make accurate future predictions, such as financial modeling, is highly needed and very powerful.
SUMMARY
[0004] Systems and methods are disclosed for various implementation and embodiments of an artificial intelligence framework, utilizing machine learning, to aggregate publicly available information, format the information into a useful data form, standardize the data, analyze the data, generate new data by making predictions, such as future financial data, and deliver the new data in a useful and beneficial form. In some cases, procurement data from an organization is an invaluable source of input information that allows highly targeted financial modeling of select supplier organizations.
[0005] Financial analysis and company valuation are activities that require a high degree of professional training and involve a systematic process of incorporating changes in the economic landscape, significant global events, competitors’ actions, key management decisions into a company’s past financial reports in order to create projected financial forecasts that ultimately lead to the company’s expected future value that, in turn, affects the value at which the company’s stocks are traded at the exchanges. Some embodiments described herein aim at doing these activities by leveraging on the power of machine learning to recognize trends and patterns in qualitative and quantitative formats and generate forecasts based upon the recognized trends and patterns.
[0006] According to some embodiments, a machine learning system is configured with instructions, that when executed, cause the system to receive procurement data associated with a future procurement from a consuming entity; receive historical procurement data from the consuming entity; receive historical data from producing entities, the historical data including one or more of past performance, financial data, economic or macroeconomic data, industry-specific data, news data, and social media data; determine, based upon the procurement data, one or more producing entities that are capable of fulfilling the future procurement; train, using one or more of the historical procurement data and the historical data from producing entities, a machine learning model to generate correlate the one or more of the historical procurement data and the historical data from producing entities with a likelihood that a first producing entity of the one or more producing entities that are capable of fulfilling the future procurement will be selected for a procurement activity by the consuming entity; determine, based at least in part on executing the machine learning model, a likelihood that the first producing entity will be selected for the future procurement activity by the consuming entity; and determine, based at least in part on the procurement data and the likelihood that the first producing entity will be selected for the future procurement activity, an impact to the first producing entity.
[0007] In some cases, the procurement data comprises one or more of budgetary allocations, contracts open for bid, contract requirements, period of performance, price, and payment terms. The historical procurement data may include one or more of past contract awards, the identity of contract managers, identification of goods or services, contract awardees, and performance ratings of past contract awardees. According to some embodiments, the financial data may include one or more of stock price, earnings, book value, specific line items and note disclosures from annual or quarterly financial reports, and operating income. [0008] In some instances, the machine learning system determines, based at least in part on the procurement data and the likelihood that a second producing entity will be selected for the procurement activity, an impact to the second producing entity. The impact to the first producing entity may include conducting a discounted cash flow analysis.
[0009] According to some embodiments, determining the impact to the first producing entity comprises forecasting one or more of revenue, cost of goods sold, expenses, interest, tax, non-cash expenses, financial ratios, and discount rate. In some cases, the instructions further cause the system to receive non-standard data and tags associated with the non-standard data and apply one or more machine learning algorithms to correlate the non-standard data tags to previously-defined data tags to standardize the non-standard data. In other words, the incoming data may be standardized and stored as standardized data in a format and with tags that the system can readily process as a later time.
[0010] In some cases, the instructions cause the system to execute a standardization module and standardize the non-standard data. The data can be stored in a data lake for later consumption. The instructions may further cause the system to apply a taxonomy to the historical data from producing entities, and the historical data from producing entities may be stored in a data lake. The data lake allows data to be stored for later analysis and to create a repository of historical data that can be used to further train one or more machine learning models.
[0011] In some cases, one or more of the historical procurement data and the historical data from producing entities is time-series data. In some examples, the time-series data is weighted based upon a date stamp associated with the time-series data. For example, recent data may be more meaningful than older data and the time-series data can be appropriately weighted. In some embodiments, a sentiment analysis of the time-series data is executed to determine subjective information. The subjective information may include, without limitation, public perception, complaints, sales effectiveness, customer satisfaction, opinions, brand recognition, and emotion detection.
[0012] The system may further determine a veracity score for one or more of the historical procurement data and the historical data from producing entities. The veracity score may be based upon source, time, historical veracity, among other things. [0013] In some cases, determining the impact to the first producing entity includes forecasting one or more of a future stock price, a future EBITDA, or a future cash on hand, among other forecasts.
BRIEF DESCRIPTION OF THE DRAWINGS [0014] A better understanding of the features, advantages and principles of the present disclosure will be obtained by reference to the following detailed description that sets forth illustrative embodiments, and the accompanying drawings of which:
[0015] FIG. 1 illustrates an example system architecture of a time series forecasting system, in accordance with some embodiments;
[0016] FIG. 2 illustrates an example of computing resources for implementation a time-series forecasting system, in accordance with some embodiments;
[0017] FIG. 3 illustrates a process flow diagram for a time-series forecasting system, in accordance with some embodiments;
[0018] FIG. 4 illustrates a process flow diagram for a time-series forecasting system, in accordance with some embodiments;
[0019] FIG. 5 illustrates a process flow diagram for a time-series forecasting system, showing some exemplary machine learning algorithms, in accordance with some embodiments;
[0020] FIG. 6 is a simplified conceptual diagram of an example process for data capture and aggregation, in accordance with some embodiments;
[0021] FIG. 7 illustrates an example process for data standardization, in accordance with some embodiments; and
[0022] FIG. 8 shows a representative mapping using the described logic and formulas to standardize the incoming data by mapping the unstandardized XBRL data tags to a standard form, in accordance with some embodiments.
DETAILED DESCRIPTION
[0023] The following detailed description and provides a better understanding of the features and advantages of the inventions described in the present disclosure in accordance with the embodiments disclosed herein. Although the detailed description includes many specific embodiments, these are provided by way of example only and should not be construed as limiting the scope of the inventions disclosed herein.
[0024] According to some embodiments, the processes described herein are made more efficient by a standard reporting language that may be adopted by procurements, financial reports, business metrics, and others. One such standardized language is extensible Business Reporting Language (XBRL). XBRL is a software standard to harmonize the way that financial information is communicated, which makes it more efficient to compile and share this type of data. It may be considered a domain-specific species of XML.
[0025] In many instances, XBRL implements tags to define each piece of data. In some cases, the taxonomy may correspond to accepted accounting standards, such as generally accepted accounting principles (GAAP) or some variant taxonomy that may incorporate GAAP tags, which can allow data to be easily shared between entities, and even between countries that may have very different business reporting standards. According to some implementations, data may be captured, aggregated, stored, distributed, or otherwise manipulated, generated, created, and shared as XBRL or inline XBRL (iXBRL) tagged data. [0026] Data formatted as XBRL data is used not only by companies, but has also been adopted by governments, and is continuing to gain momentum as the preferred reporting language of producer entities, consumer entities, media entities, and governmental entities. The adoption of XBRL reporting standards increases the ability to efficiently prepare, validate, publish, exchange, consume, and analyze the data prepared according to the adopted standards.
[0027] The adoption of a standard way of reporting data allows the creation of taxonomies that capture the meaning of all the reporting terms and provides for a reusable and authoritative definitions of terms. It further allows for multi-lingual and multi -currency reporting by the implementation of language or currency conversion.
[0028] With reference to FIG. 1, a system 100 is shown that is configured to capture, aggregate, format, standardize, analyze, generate, and distribute predictive data. In some embodiments, this system may utilize time-series data in the prediction of future enterprise value, earnings before interest and taxes (EBIT), earnings before interest, taxes, depreciation and amortization (EBITDA), or some other financial metric of an entity. [0029] A data lake 102 is a centralized repository used to loosely store structured and unstructured data. The information in the data lake 102 may be populated from numerous sources and contain numerous types of data such as, for example, procurement data 104, company data 106, news data 108, and historical data 110, among others.
[0030] The data lake 102 may be in communication with one or more remote computing resources 112, which may comprise one or more server computers 114(1), 114(2), 114(P), or a distributed server farm, or cloud storage, or some other computing architecture. The remote computing resources 112 may have one or more processors 116 and memory 118.
The memory 118 may store one or more modules 120 that are executed by the processors 116 to carry out many of the instructions, routines, tasks, and operations described herein. In some embodiments, the processor(s) 116 may include a central processing unit (CPU), a graphics processing unit (GPU), both CPU and GOU, or other processing units or components known in the art. Additionally, each of the processor(s) 116 may possess its own local memory, which also may store program modules, program data, and/or one or more operating systems. The processor(s) 116 may include multiple processors 116 and/or a single processor 116 having multiple cores.
[0031] In some instances, the remote computing resources 112 may be a computing infrastructure of processors 116, storage (e.g., memory 118), software (e.g., modules 120), data access, and so forth that is maintained and accessible via a network, such as the internet. The remote computing resources 112 may not require end-user knowledge of the physical location and configuration of the system that delivers the services. Common expressions associated for these remote computing resources 112 may include “on-demand computing”, “software as a service (SaaS)”, “platform computing”, network-accessible platform”, “cloud services”, “data centers”, and so forth.
[0032] Those skilled in the art will appreciate that embodiments described herein can be practiced on or in conjunction with other computer system configurations beyond those described herein, including multiprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, handheld computers, personal digital assistants, e-readers, mobile telephone devices, tablet computing devices, special- purposed hardware devices, network appliances, and the like. The configurations described herein can also be practiced in distributed computing environments, such as a distributed computing network, where tasks can be performed by remote computing devices that are linked through a communications network. In a distributed computing environment, program modules can be located in both local and remote memory storage devices.
[0033] The memory 118 may include computer readable storage media (CRSM), which may be any available physical media accessible by the processor(s) 116 to execute instructions stored on the memory 118. In one basic implementation, CRSM may include random access memory (RAM) and Flash memory. In other implementations, CRSM may include, but is not limited to, read-only memory (ROM), erasable programmable ROM (EPROM), electrically erasable programmable read-only memory (EEPROM), or any other medium which can be used to store the desired information, and which can be accessed by the processor(s) 116. As will be discussed in additional detail, the memory 118 may include an operating system, and one or more modules 120. The memory 118 may be a physical memory device, which physically embodies the modules and instructions, and is non- transitory computer readable memory.
[0034] It should be appreciated that the subject matter presented herein can be implemented as a computer process, a computer-controlled apparatus, a computing system, or an article of manufacture, such as a computer-readable storage medium. While the subject matter described herein is presented in the general context of program modules that execute on one or more computing devices, those skilled in the art will recognize that other implementations can be performed in combination with other types of program modules. Generally, program modules include routines, programs, components, data structures, and other types of structures that perform particular tasks or implement particular abstract data types.
[0035] The remote computing resources 112 may be communicatively coupled to the data lake 102 via wired technologies (e.g., CAT5, USB, fiber optic cable, etc.), wireless technologies (e.g., RF, cellular, satellite, Bluetooth, etc.), or other suitable connection technologies. In some cases, the data lake 102 is stored on the memory 118 of the remote computing resources 112.
[0036] A user device 122 associated with a user 124 may be able to access the remote computing resources 112, the data lake 102, or one or more of the modules 120 stored on the memory. The user device 122 may receive one or more outputs of the one or more modules 120 as will be described herein. [0037] With reference to FIG. 2, an example of the remote computing resources 112 is illustrated in which the remote computing resources 112 have one or more processor(s) 116, and non-transitory computer-readable memory 118 that stores modules. In some embodiments, the memory stores an operating system 202 and one or more of an appropriation data module 204, a request for proposal module 206, a funding opportunities module, a public financial data module 210, a news and media module 212, a processing engine module 214, a standardization module 216, and a data analysis module 218. The modules may not necessarily be broken down, as some of the instructions may be combined into a single module or shared by multiple modules. Moreover, not all of the illustrated modules may be present in each embodiment. Similarly, some embodiments will have more than the illustrated modules.
[0038] Some of the modules may cause the processor(s) 116 to collect information to be stored in the data lake 102. The data may be pulled by the one or more modules, such as by conducting searching, data scraping, fetching, or some other source of pulling data, or may be pushed by the reporting system and received by the data lake 102, such as by subscriptions, RSS feeds, or some other pushed data type.
[0039] The appropriation data module 204 may be configured to cause the processor(s) 116 to acquire data that describes one or more appropriations, i.e., the designation of money for particular uses. In some cases, this appropriation data may be associated with government appropriations, and may be made available by government agencies.
[0040] The request for proposal module 206 may be configured to cause the processor(s) 116 to acquire data corresponding with an entities’ solicitation of proposals related to procurement of a commodity, service, asset. Examples may include published RFP or RFQ. [0041] The funding opportunities module 208 may include instructions that cause the processor(s) 116 to receive data associated with funding opportunities and may include available government grants. The data received by the funding opportunities module 208 may include the goals, deadlines, eligibility, and reporting associated with the funding opportunity.
[0042] The public financial data module 210 may include instructions that cause the processor(s) to receive data associated with publicly available information regarding an entity, such as a company, an organization, an enterprise, an agency, or some other structure. The publicly available information may include data associated with required reports (e.g., annual reports), non-required reports (e.g. academic publications, commercial publications, project reports, which may all be considered “gray literature”). This information may include data produced by the subject entity, or by another entity about a company. In some cases, this information may be provided in an XBRL formatted report or database. In some cases, this may also include macroeconomic and market data that affects the company such as but not limited to GDP (country or global), market indexes and industry-wide revenue forecasts that are readily available from trusted public databases.
[0043] The news and media module 212 may include instructions that cause the processor(s)
116 to receive news through media channels, and may include television broadcasts, radio broadcasts, print news media, social media, among others. This data may be received in a natural language format, and the news and media module 212 (or some other module) may perform natural language processing on the incoming data to segment the data into useful pieces of information. In some cases, the incoming data is analyzed for semantics, sentiment, causation, correlation, sensitivity, and others to provide meaningful analysis of the incoming new and media data. Any type of news data may be analyzed to provide data points for later analysis. For instance, news data may include force majeure events, societal events, social unrest events, political events, recent technological innovation, disruption and/or advancements, management decisions, or other events of or surrounding a particular procuring or producing company, as well as others that may have a tendency to cause fluctuations in the future value of one or more companies.
[0044] The processing engine module 214 may be responsible for converting incoming raw data into a more useful file format. In some cases, the processing engine module 214 processes the incoming data, such as by determining a schema used to create the data, applying a taxonomy or tagonomy, and storing the data in a data warehouse.
[0045] The standardization module 216, in conjunction with the processing engine module 214, may include instructions that cause the processor(s) 116 to apply a schema to incoming data to format the data into a consistent format for later analysis and correlation.
[0046] The data analysis module 218 may include instructions that cause the processor(s)
116 to analyze the data in the data lake 102 and determine correlations, sentiments, cause and effect, clusterization, among others. [0047] In some embodiments, the described modules and operations allow the system to look at thousands, and in some cases, hundreds of thousands, millions, or billions, of data points on a selected company, and perform a future forecast valuation. In some cases, a discounted cash flow (DCF) valuation is performed on a company to forecast the future value of a present investment into the company.
[0048] A discounted cash flow is a valuation methodology used to forecast the value of an investment into a company based upon its future cash flow. A DCF analysis attempts to predict the future value of an investment today based on projections of how much money it will generate at a future point in time. The described modules each receive data that can be mined and used as inputs into the DCF valuation to forecast a companies’ future cash flow. [0049] In simple terms, the formula for a DCF valuation is as follows:
Figure imgf000012_0001
[0050] Where CF = the cash flow for a given year, CFi is for year 1, CF2 is for year two,
CFn is for additional years, and r is the discount rate. One of the disadvantages with typical DCF valuation is that is relies on making numerous assumptions about future cash flows from a project, and further is subject to variability in market demand, economy, and other unforeseen obstacles. However, as described herein, where a system is able to precisely anticipate the market demand and the cash flow stream, and it can determine the entities that are eligible to receive the cash flow stream, it can intelligently predict which entity is most likely to receive the cash flow stream and thereby predict a DCF valuation with a certain level of accuracy.
[0051] FIG. 3 illustrates an example process flow diagram 300 for a time-series forecasting system. At block 302, time-series data is captured. Time-series data generally refers to a series of data points indexed or ordered by time. In some example, time-series data is a sequence of data points taken at spaced points in time. In some cases, time-series data may be a single data point that is associated with a date or date stamp. In some instances, the date stamp may be used as a factor in weighting the data, for example, more recent data may carry a greater weight than older historical data. The data may be captured through any suitable method or mechanism, such as those described herein, and may be data that is pushed or pulled into the system by one or more other systems. [0052] According to some embodiments, the time-series data may include, without limitations, financial statements, government database information, contracts open for bid, contract terms of contracts open for bid (e.g., qualification, statements of work, period of performance, milestones, payment schedules, identification of contracting officers, program managers, among others), contracts previously awarded, requests for support announcements, funding opportunity announcements, grant announcements, taxation, budgetary allocations, leading economic data, lagging economic data, coincident economic data. In addition, the time-series data may include news and media information such as, without limitation, open source news feeds, subscription feeds, RSS feeds, social media posts, company announcements, and more.
[0053] In some embodiments, the incoming data is very granular and extensive, and may include, without limitation:
[0054] Share statistics such as: Altman Z score using the average stock information for a period, such as the last twelve months (LTM); Basis weighted average shares outstanding LTM, beta, Beta - 1 year LTM, Company, Volume (e.g., intraday trading), diluted weighted average shares outstanding LTM, ECS total common shares outstanding LTB, ECS total shares outstanding on filing date LTM, Price (e.g., end of day), volume (e.g., end of day), 52- week percent change, price (intraday), change, last close 52-week high, last close 52-week low, last close BS shout, last close indicated dividend or stock price, last close market cap, last close market cap / EBT excluding unusual items, last close price, last close price per earnings, last update time, LT debt/equity, net debt / EBITDA percent shares held by insiders, percent of shares held by institutions, percent change, region, return on total capital, symbol, ticker shares outstanding, total assets, total capital, total cash, short term investments, total common equity, total common shares outstanding, total current assets, total current liabilities, total debt, total debt /EBITDA, total debt/equity, total equity, total shares outstanding, and total shares outstanding on filing date, among others.
[0055] Income such as: basic earnings per share (EPS) - continuing operations, diluted EPS 1 Yr. Growth %, diluted EPS - continuing operations, EBIT, EBITDA, EBITDA, 1 Yr. Growth %, EBT, Excluding unusual items, gross profit, net EPS - Basic, net EPS - diluted, net income, 1 Yr. Growth %, Net Income _ (IS), net income margin %, net income to common excluded extra items, quarterly revenue growth, SG&A margin, total revenues, and total revenues, 1 Yr. Growth %, among others.
[0056] Valuation Measures such as: book value / share, current ratio, EBITDA / Interest Expense, EBITDA / margin %, EBIT / interest expense, market cap (intraday), last close market cap / total revenue, last close price / book value, last close price / tangible book value, last close total enterprise value (TEV) / EBIT, last close TEV / EBITDA, at close TEV / total revenue, operating cash flow to current liabilities, price/earnings to growth (PEG) ratio (e.g.,
5 yrs expected), trailing P/E, price/book, and quick ratio, among others.
[0057] Cash Flow Statements such as: capital expenditure, cash and equivalents, cash from operations, cash from operations, 1 Yr. Growth %, cash income tax paid (refund), cash interest paid, days outstanding inventory, days sales outstanding, levered free cash flow, levered free cash flow, 1 Yr. Growth %, and unlevered free cash flow, among others.
[0058] ESG Scores such as: environmental score, ESG score, governance score, highest controversy, peer group, and social score, among others.
[0059] Earnings such as EPS growth.
[0060] Financial Highlights such as: gross profit margin %, operating income, return on assets, and return on equity %, debt-to-equity ratio, weighted average cost of capital or discount rate, among others.
[0061] Sector and industry information.
[0062] The aforementioned data can be collected, cleaned, standardized, and aggregated, such as in the data lake 102, and at block 304, the time-series data is analyzed. The data analysis may be performed by the remote computer resources 112, or by some other system that is configured with instructions to manipulate, filter, generate, and create data, using any appropriate module or formula, and may include, without limitations, executing inserts and queries 310, smoothing 312, approximation 314, interpolation 316, or a combination of these, and other, operations. Additional operations may include, without limitations, one or more of ratio analysis, cause and effect relationship between financial and nonfmancial variables, decision trees, and random forest, to name a few.
[0063] The data is cleaned and transformed as preparation for further analyses. On a database level, this process may include, without limitation, the set-up of the database itself; creating, separating and merging of data tables; as well as adding, cleaning and transforming metadata. On a table-level, the process may include, but is not limited to, cleaning data of leading or trailing spaces; replacing non-standard alphanumeric or other symbols to facilitate machine readability and further processing; standardizing column headers; converting columns to the same unit standard; adding additional columns to enhance information; splitting individual columns into multiple ones; merging different data points together; highlighting and - if deemed necessary - correcting outlier or otherwise inconsistent data; and rounding, approximating, smoothing or interpolating data values in line with the objectives of the analyses. Additional features may be created or added from other sources, such as text-based analysis, natural language processing (NLP), sentiment analysis, time-series momentum analysis, trend analysis, and other methodologies. Ultimately, the data is stored in a way to make it accessible for further analyses through specialized data analytics and visualization tools, and as a basis for machine learning.
[0064] The time-series data may additionally have sentiment analysis performed to determine whether the data about a particular company or project is positive or negative, which may provide additional data points that can be used in the subsequent analysis and prediction steps. In addition, natural language processing may be applied to search for keywords, conduct key phrase extraction, sentiment detection, sentiment analysis, determine bias, entity recognition, topic modeling, date stamps, and otherwise derive meaningful data points to recognize and predict how a company is performing. To assess the impact of the news or any text-based data on the future financial performance of the company, some pre trained models and libraries may be used and in some cases, the pretrained models may be modified to improve the model performance and made suitable for finance-related context. [0065] As part of block 304, the data may be evaluated for likely veracity. For example, the data may be scored based upon a probability that the received data is from a reliable source or that the information in the data is reliable. For example, a financial report for a particular company that is submitted to the Securities and Exchange Commission (SEC) may be scored to be a highly reliable source since it is data being submitted to a governmental agency, and falsehoods in the data are known to come with consequences. Similarly, where the data is known to be an annual report from a company, the content of the data may be correlated with other forms of data, and therefore, the content may have a relatively high veracity rating. [0066] In some embodiments, a rating system includes one score for the source reliability, and a second score for the information content. For example, a source reliability score may include letter scores similar to the following: A = Reliable: No doubt of authenticity, trustworthiness, or competency has a history of valid information most of the time; B = usually reliable: minor doubt about authenticity, trustworthiness, or competency; has a history of valid information most of the time; C = Fairly Reliable: doubt about authenticity, trustworthiness, or competency but has provided valid information in the past; D - not usually reliable: significant doubt about authenticity, trustworthiness, or competency but has provided valid information in the past; E = Unreliable: lacking in authenticity, trustworthiness, and competency; history of invalid information; and F = Cannot be judged: no basis exists for evaluating the reliability of the source. Any of these scores may be applied to incoming data from a particular source and may be updated from time to time as reliability tends to change over time and as more data is analyzed and verified with other sources and other data points.
[0067] In some cases, the information content is evaluated and scored based upon a predetermined scoring system. For example, a content evaluation score may include number scores that indicate: 1 = Confirmed: confirmed by other independent sources; logical in itself; consistent with other information on the subject; 2 = Probably True: not confirmed; logical in itself; consistent with other information on the subject; 3 = Possibly True: not confirmed; reasonably logical in itself; agrees with some other information on the subject; 4 =
Doubtfully True: not confirmed; possible but not logical; no other information on the subject; 5 = Improbable: not confirmed; not logical in itself; contradicted by other information on the subject; and 6 = Cannot be judged: no basis exists for evaluating the validity of the information.
[0068] At block 306, machine learning is applied to the data. Any of a number of machine learning algorithms can be applied singularly or iteratively, to the data to make intelligent predictions. For example, a set of algorithms 318 may be used to implement machine learning on the data set, which may be used to determine trends 320, patterns 322, anomalies 324, event dates, and date stamps, and other characteristics such as cause and effect analysis by determining what financial and non-fmancial variables have a strong impact on future financial performance; extracting ‘multiplier factors’ that embody the impact of qualitative information related to the economic and political climate and major global events to a company’s value.
[0069] The machine learning methods in accordance with any of the embodiments disclosed herein may encompass a variety of approaches, including supervised and unsupervised methods. The machine learning methods may be performed using, for example, neural networks, deep neural networks, support vector machines (SVMs), decision trees, Markov models, Bayesian networks, reinforcement-based learning, cluster-based learning, and any other suitable machine learning strategies now known in the art, or later developed.
[0070] According to some embodiments, one or more machines are trained using training data. Training Data is a data set that can be used to train a learning machine. Regardless of whether the class of the data is known or unknown, the data may be adequate for training a learning machine if it includes, for example, at least one positive example for each class, and optionally, at least one negative example for each class. This type of binary classifier may be useful, for example, when determining the set of potential producing entities that may be able to meet the procurement requirements, or for determining the outcome of a binary decision, such as whether a particular company’s previous bid was selected in relation to a procurement contract.
[0071] As used herein machine learning encompasses any of several methods, devices, and/or other features which are optimized to perform a specific informational task (such as classification, prediction, or regression) using a limited number of examples of data of a given form, and are then capable of exercising this same task on unknown data of the same type and form. The machine (e.g., a computer) will learn, for example, by identifying patterns, categories, statistical relationships, and the like exhibited by training data. The result of the training is a model that can be used to make predictions based on new data. [0072] In some embodiments, unsupervised learning occurs when training data is not necessarily labelled to reflect the true result, i.e., there is no indication within the data itself as to whether the data belongs to a class or exhibits a pattern. Unsupervised learning techniques may include, without limitation, reinforcement-based learning, association-based rules, cluster-based learning, and the like.
[0073] In some instances, there may be a disconnect between machine learning approaches and time series forecasting since typically, time series forecasting problems may not correlate with the way standard machine learning problems are approached. In some embodiments, time series forecasting can be addressed as a supervised learning problem, and numerous machine learning tools can be used. Supervised learning occurs when training data is labelled to reflect the true result, i.e., that the data belongs to a class or exhibits a pattern. Supervised learning techniques may include neural networks, nearest neighbor, naive Bayes, linear regression, SVMs, decision trees, random forests, XGBoost, hidden Markov models, and Bayesian networks, among others.
[0074] The systems and methods described herein may utilize unsupervised learning, supervised learning, or a combination of both unsupervised and supervised learning. In some embodiments, one or more of unsupervised learning and supervised learning is performed iteratively on the data.
[0075] At block 308, the system predicts a future impact of the time-series data. In many cases, the prediction will never be accurate with time series forecasting; however, the level of uncertainty is important to a time-series forecast. According to embodiments described herein, the level of uncertainty is greatly reduced where the system receives data relating to the consumer and its budget for predetermined goods or services. In these cases, the goods or services are a known quantity, the budget for certain goods or services is likewise a known quantity, and the predominant variables relate to which producer will receive the business from the consumer. In some embodiments, the systems described herein will be able to predict, with a reasonable uncertainty, which producers are likely to receive the business of a known consumer, which may include the total cost, the payment schedule, the cost of goods or raw materials, etc. Then the system can forecast the impact on the producer for securing that business of the consumer. The prediction of the future impact may result in forecasting one or more of future stock price, future cash on hand, future EBITDA, or some other financial metric.
[0076] FIG. 4 illustrates a process flow diagram for a time-series forecasting system 400. At block 402, the system determines procurement opportunities of an organization. As described, in some embodiments, data associated with procurement opportunities is available, which may include, a description of goods or services desired, a timeframe for acquisition, a budget for acquisition, a payment schedule, qualifications of the producer entity, and so forth. [0077] At block 404, the system 400 determines where the organization historically spends its money. For example, some consumer entities make past acquisition information publicly available. As an example, in government procurement contracts and government grant awards, the publicly available details may include, without limitations, an identity of the contracting officer, the identity of the program manager, the total budget for the contract, the payment terms, the statements of work, the contract terms, awardees of government contracts and grants, and other information which may be beneficial to a time series forecasting system. For example, past procurement decisions for individual contract officers or program managers can be determined and used to influence the system 400 to search for patterns in the past procurement decisions. Similarly, where a follow-on procurement is associated with a prior procurement award, the historical information may be weighted and used by the system 400 to influence its prediction of a future award.
[0078] At block 406, the system determines the eligible companies. For example, based upon a scope of work, type of procurement, or identification of goods or services, a finite number of companies are likely to be able to fulfill the contract terms. As an example, where the procurement contract is for a unique, one-of-a-kind, or two-of-a-kind products, then the number of possible companies that fill such a contract may be a known quantity and identifying the specific companies that are able to fulfill the procurement contract are known. [0079] The system aggregates data regarding the type of goods and services a company is able to perform and, over time, this data is modified as a company adds or removes product or service offerings. The data may come in the form of past performance, such as contracts won and performed, financial data, such as quarterly or annual reports, media data, and social information, to name a few. For example, where a procuring entity releases a request for proposal for a specific good, such as an airplane for example, the system will have historical aggregated data relating to which companies are capable of manufacturing and delivering an airplane that meets the proposal requirements. The request for proposal may include a significant amount of data regarding procurement, such as the type of airplane, performance characteristics, time schedule, quantity, and financial remuneration schedule, among other things.
[0080] The system can aggregate this procurement data and determine a set of companies that are capable of meeting the contract requirements. In addition, the system will have historical data regarding whether the companies within the set of eligible companies have completed similar procurement activities in the past, and a measure of their performance on these procurement past activities. For example, the system can determine whether a company associated with a past procurement activity was able to meet the quantity, quality, and cost factors of a procurement contract and whether the procuring agency was satisfied with the performance of the work. Further, the system can determine whether the company was selected for additional procurement activities. The system can determine patterns which help to forecast likely future values. As an example, the system may determine that in a procurement contract for a particular type of goods, one individual contracting officer associated with this type of procurement contract has a pattern of selecting one company over another company to perform the work. In this case, the system will determine the pattern which weighs in favor of the contracting officer selecting a particular company a higher percentage of the time. Based on these discovered patterns, the system can estimate the likelihood that the company will be selected again for a future procurement activity and, based upon known procurement contract details, determine a forecasted future impact to the company, such as by using a DCF model.
[0081] At block 408, the system determines which eligible companies can meet the procurement requirements. The procurement requirements may specify a statement of work, timetables, any cost-sharing, and the goods or services. Based upon this information, the system may narrow the eligible companies that are likely able to satisfy the procurement requirements.
[0082] At block 410, the system determines a probability that each of the eligible companies will be awarded the contract or grant. This may be based, at least in part, on the data the system receives as an input to the decision matrix, which includes the historical data associated with the procuring agency, the contracting officer, the program manager, the statement of work, and past performance of each eligible company, etc.
[0083] At block 412, once the system determines the probability that a company will be awarded the procurement, the system determines the future impact on the selected company. The future impact may include an expected cashflow from the procurement contract, which will invariably impact the selected companies’ financial statements, potentially for many years down the road, such as in the case of a multi-year procurement contract award. In some embodiments, a DCF analysis is performed utilizing hundreds, thousands, hundreds of thousands, millions, or billions, or more data points collected on a company in order to forecast a future impact on the company.
[0084] As an example of one consumer that makes its procurement decisions public is the United States Government (USG). The USG publishes its architecture through the DATA Act Information model schema (DAIMS), which requires proposing companies to submit documents in a particular format. The USG publicly describes the contract type of available submissions, including scope of projects, identification of goods or services, payment terms, and contract price. By aggregating this data, the system 400 can determine how and when the USG will distribute funds. The DAIMS architecture allows the system 400 to capture and analyze this procurement data automatically and very efficiently. Historically, the USG provides very granular data which the system can access and make serious inferences as to future awards.
[0085] In some embodiments, the data points on a particular company may range in the hundreds of thousands, and each of these data points can be appropriately weighted and used to predict a future impact on a company.
[0086] Of course, the USG is not the only consuming entity that publishes its procurement activities. Other entities, such as universities, public entities, government and private entities of other countries, among others, may all provide procurement information that can be aggregated, analyzed, and used to forecast an impact to a probable contract awardee.
[0087] In some embodiments, the incoming data is multi-dimensional, which has historically been difficult to process. In some embodiments, the incoming data is harmonized into a hypercube multi-dimensional data format to capture and aggregate the useful data.
[0088] FIG. 5 illustrates a process flow diagram for a time-series forecasting system 500.
The system 500 may be similar, or the same system, as shown in FIGS 1, 2, 3, and 4. At block 502, data is captured and aggregated according to any method. In some instances, data is retrieved, received, or both, from publicly available sources that may include financial data, procurement data, news media data, social media data, subscription data, RSS feed data, and other. The data can be aggregated in a data lake, as previously described, and may be stored as a hypercube data array or as a simplified data table. [0089] At block 504, the incoming data may be formatted, such as by retrieving a schema associated with the data and storing the data according to the schema. A schema may refer to a taxonomy that the data is organized according to, and is a common way of sharing data presented in an extensible Markup Language (XML), such as XBRL. In some instances, the incoming data will indicate a schema that the data uses, and the schema and data can be loaded and associated together and the data lake can format the incoming data according to the schema.
[0090] At block 506, the data is standardized and normalized. Data is standardized through a series of executable scripts that read through the raw data organized according to schema in a tabular format and apply a list of logics and formulas from a continuously developing library that outlines how to treat and transform the raw data into a new set of standardized data necessary for financial analysis and algorithm development. Ultimately, these executable scripts create a new set of standardized data in an easy to query format. The logics and formulas library may be manually maintained with the logics and formulas added and edited whenever new data comes in, or may leverage on machine learning to understand how the data are classified, renamed and aggregated and automatically develop logics and formulas to generate new standardized data that from new, unclassified and unaggregated raw data. In some embodiments, a combination of manual and automated processes may be used, and in some cases, a machine learning algorithm may be trained through any suitable training methodology to standardize and normalize incoming raw data.
[0091] At block 508, the data is stored in a data warehouse. The data warehouse may include any suitable data structure, and in some cases, is stored in a relational database or in a relational data stream management system. In some cases, the database adheres to one or more structured query language (SQL) standards to allow for efficient writing, retrieving, sorting, querying, and displaying of data.
[0092] In some embodiments, the data is stored in a time series database service in which data is stored in a time order form. Some time series database services offer built-in analytics, such as smoothing, approximation, and interpolations, as well as adaptive query processing to make analyzing time series data highly efficient.
[0093] At block 510, one or more machine learning algorithms can be applied. For example, a set of algorithms 512, which may be stored in memory and when executed by one or more processors, cause the processors to perform acts according to the one or more algorithms. Machine learning is a type of artificial intelligence that allows computer systems to gradually become more efficient and proficient at a specific task. In many cases, machine learning is facilitated by large amounts of data that the computer systems can apply statistical operations to make accurate predictions based on new inputs. One or more algorithms stored in the memory allow collection of large amounts of data over time, and through an iterative process, generate more and more accurate predictions. In some cases, training data, or sample data, is provided to the computer systems to train the system as to which outputs are consistent with the inputs.
[0094] In some embodiments, the machine learning algorithms don’t simply review historical data and generate predictions based on current data, but rather, receive data about what an organization is going to do in the future (e.g., procurement data). Procurement data may include data regarding what an organization is going to do, such as requests for proposal, statements of work, funding opportunity announcements, contract budgets, monetary allocations, and the like, and can greatly facilitate arriving at much more accurate predictions than simply relying on historical data to predict future outcomes. For instance, where the computing systems have data indicating that an organization has allocated a specific budget for a particular type of goods, the system can predict which entities are eligible and able to provide the particular type of goods on the requested schedule. The computing system can further determine which of the eligible entities have strong performance records with the procuring organization. Even more granular, the computing system can determine the behavioral history of individuals associated with the procurement activities, such as, for example, determining whether a particular contracting officer or program manager has a tendency to select one entity over another based on historical contract data.
[0095] The set of algorithms 512 include any suitable algorithms, and may include one or more of neural networks 514, linear regression 516, nearest neighbor 518, Bayesian 520, clustering 522, (e.g., k-means clustering 524), natural language processing 526, sentiment detection 528, or other algorithms either alone or in combination. Some additional algorithms that may be used singularly or in combination with one or more other algorithms include logistic regression, decision trees, random forest, and dimensionality reduction operations. These various set of algorithms may be developed by using data coming from a single source (news, contracts, historical financial data); or by combining and merging data from different sources outlined; or by combining data sources and utilizing one or more of prediction, forecast, outcomes, or results coming from the algorithms developed at the early stages.
[0096] The set of algorithms 512 may weight particular data when making predictions. The data may be weighted for any suitable characteristic, such as newness, source, reliability, authenticity, veracity, competency, magnitude, collaboration, and others.
[0097] At block 530, the system generates new data associated with future predictions, such as forecasting one or more financial accounts, cashflows or other values of similar nature.
The generated data may indicate a present or future value of an investment into a particular company or project. In some cases, the forecast may result in a stock price of a company at one or more future points in time.
[0098] At block 532, the generated data may be delivered for end use, such as consumption by an end user. The generated data may be delivered to a computing device associated with an end user, and may be presented is graphical form, textual form or both. The generated data may include advice based on the generated data, such as a buy or sell recommendation. The generated data may indicate a forecasted value over time, such as a graph, chart, or some other indicia, that shows likely future trends. In addition, the generated data may be fed back into the machine learning system and used to train the one or more machine learning algorithms. For instance, a predicted future impact may be fed back into the machine learning system and used to compare the prediction against ground truth data once it becomes available. In this way, the machine learning system can compare its predicted data against real-world data and modify the one or more machine learning algorithms based upon this comparison.
[0099] FIG. 6 illustrates an example process 600 for data capture and aggregation. The illustrated process 600 can be used in conjunction with any of the computing systems or methods described herein.
[0100] Procurement data 602, which is typically publicly available is captured 604 through any suitable capture process, such as, for example, pushing, pulling, subscription, scraping, crawling, or some other methodology. The captured data is run through a processing engine 606. In some cases, the processing engine 606 receives data formatted as XBRL according to one or more taxonomies. In some cases, the processing engine converts received data into an XBRL format based on one or more taxonomies, and may use one or more machine learning algorithms, such as natural language processing, to format the data appropriately. At block 608, the data is optionally standardized, such as by running the data through a standardization process 610, and the standardized data 612 is stored in a data warehouse 614. According to some embodiments, the data warehouse 614, may be a data lake, a relational database, a time series database, or some other suitable data structure.
[0101] In many cases, the processes 602-614 may be carried out automatically, without human intervention. This allows for millions or billions of data points to captured, aggregated, processed, standardized, and stored for later analysis.
[0102] The stored data in the data warehouse 614 can be analyzed according to any suitable machine learning algorithm, and forecasts and predictions can be made 616 and distributed 618 to interested stakeholders.
[0103] FIG. 7 illustrates an example process for data standardization 700, which may be used with any of the systems and methods described herein. According to some embodiments, data files, such as XBRL files 701, comma separated value files (CSV), spreadsheet files, or other file types, are provided by (or about) various companies. For example, financial statements are one type of data file that may be provided by a company in an XBRL format. These files can be processed, at block 702, to standardize and harmonize the data, such as to create useful data points on the company to facilitate later analysis. The data normalization process may take advantage of a defined schema that identifies and classifies pieces of data into a defined catalog. In some instances, data that is not initially identified can be categorized and associated with the catalog using one or more machine learning algorithms to understand and categorize the data. That is, unstandardized data tags can be identified and correlated with defined data tags.
[0104] The standardized data points 704 and dimensionality of the data is run through a designer algorithm 706 that applies a selected taxonomy 708, such as any of a number of suitable XBRL taxonomies. In some instances, the designer algorithm creates new formulas and logic in order to harmonize the data tags so the system can appropriately correlate, categorize, and understand the data tags, even if they are non-standard. Once the XBRL taxonomy is applied to the data, an XBRL processing engine 710 receives the data and loads the data into a data warehouse 712. The data warehouse may be any suitable data storage architecture, such as those described herein.
[0105] In some embodiments, a data file 714 is already standardized and comes with an identified taxonomy. In this case, the data file 714 may immediately be processed by the XBRL processing engine 710 and stored in the data warehouse 712. Zone maps may be used to facilitate data extraction and enhance data query performance. Some zone maps may be developed based on but are not limited to companies, period and frequently used raw or standardized financial data.
[0106] From here, the data stored in the data warehouse 712 may be processed, analyzed, or distributed, as desired. One or more machine learning algorithms may access the data stored in the data warehouse 712 and distribute data generated by the one or more machine learning algorithms.
[0107] FIG. 8 shows a representative mapping using the described logic and formulas to standardize the incoming data by mapping the various widely used GAAP -based and various types of XBRL extension data tags to a standard form. As illustrated, the column XBRL IDs 820 contains XBRL data associated with an incoming file to the system. The system may already realize that the file contains balance sheet data, or the file may be parsed, such as by reviewing the incoming XBRL data tags, or the data associated with the incoming file to determine that the file contains balance sheet data associated with a company. As shown, the XBRL IDs 802 are correlated with Standardized Data Tags 804. The Standardized Data Tags 804 can then be stored in the data lake for later retrieval and analysis.
[0108] In some instances, the incoming data will include non-standard tags. Non-standard tags may include tags that are not universally adopted, such as industry-specific tags, or company-specific tags, or country-specific tags. For example, an incoming data file may include an entry with the ID “ba CustomerFinancingCurrenf ’ 806. This data or data tag, although not used by other companies or other industries, is an allowed extension or customization of any known GAAP-based or other standard XBRL data tags. As such, the system may be trained, or otherwise learn, that this XBRL ID is associated with the standard data tag with ID “notereceivable” 808. Similarly, other XBRL IDs, whether previously known or unknown, are associated with standardized data tags and stored in the data lake. [0109] As with any embodiment described herein, the generated data may be distributed to one or more computing devices associated with one or more users and/or presented on a display device. The generated data may indicate a future stock price, a future range of stock prices, a recommendation to a user, such as a buy/sell recommendation, or some other indicia. The recommendation may be provided as an alert, a text alert, a graphical indicator, and may be pushed to an end user, such as through SMS messaging, email, or some other form of notification.
[0110] A person of ordinary skill in the art will recognize that any process or method disclosed herein can be modified in many ways. The process parameters and sequence of the steps described and/or illustrated herein are given by way of example only and can be varied as desired. For example, while the steps illustrated and/or described herein may be shown or discussed in a particular order, these steps do not necessarily need to be performed in the order illustrated or discussed.
[0111] The various exemplary methods described and/or illustrated herein may also omit one or more of the steps described or illustrated herein or comprise additional steps in addition to those disclosed. Further, a step of any method as disclosed herein can be combined with any one or more steps of any other method as disclosed herein.
[0112] Unless otherwise noted, the terms “connected to” and “coupled to” (and their derivatives), as used in the specification and claims, are to be construed as permitting both direct and indirect (i.e., via other elements or components) connection. In addition, the terms “a” or “an,” as used in the specification and claims, are to be construed as meaning “at least one of.” Finally, for ease of use, the terms “including” and “having” (and their derivatives), as used in the specification and claims, are interchangeable with and shall have the same meaning as the word “comprising.
[0113] The processor as disclosed herein can be configured with instructions to perform any one or more steps of any method as disclosed herein.
[0114] As used herein, the term “or” is used inclusively to refer items in the alternative and in combination.
[0115] As used herein, characters such as numerals refer to like elements.
[0116] Embodiments of the present disclosure have been shown and described as set forth herein and are provided by way of example only. One of ordinary skill in the art will recognize numerous adaptations, changes, variations and substitutions without departing from the scope of the present disclosure. Several alternatives and combinations of the embodiments disclosed herein may be utilized without departing from the scope of the present disclosure and the inventions disclosed herein. Therefore, the scope of the presently disclosed inventions shall be defined solely by the scope of the appended claims and the equivalents thereof.
[0117] Those skilled in the art will appreciate that, in some implementations, the functionality provided by the processes and systems discussed above may be provided in alternative ways, such as being split among more software modules or routines or consolidated into fewer modules or routines. Similarly, in some implementations, illustrated processes and systems may provide more or less functionality than is described, such as when other illustrated processes instead lack or include such functionality respectively, or when the amount of functionality that is provided is altered. In addition, while various operations may be illustrated as being performed in a particular manner (e.g., in serial or in parallel) and/or in a particular order, those skilled in the art will appreciate that in other implementations the operations may be performed in other orders and in other manners. Those skilled in the art will also appreciate that the data structures discussed above may be structured in different manners, such as by having a single data structure split into multiple data structures or by having multiple data structures consolidated into a single data structure. [0118] Similarly, in some implementations, illustrated data structures may store more or less information than is described, such as when other illustrated data structures instead lack or include such information respectively, or when the amount or types of information that is stored is altered. The various methods and systems as illustrated in the figures and described herein represent example implementations. The methods and systems may be implemented in software, hardware, or a combination thereof in other implementations. Similarly, the order of any method may be changed, and various elements may be added, reordered, combined, omitted, modified, etc., in other implementations.
[0119] From the foregoing, it will be appreciated that, although specific implementations have been described herein for purposes of illustration, various modifications may be made without deviating from the spirit and scope of the appended claims and the elements recited therein. In addition, while certain aspects are presented below in certain claim forms, the inventors contemplate the various aspects in any available claim form. For example, while only some aspects may currently be recited as being embodied in a particular configuration, other aspects may likewise be so embodied. Various modifications and changes may be made as would be obvious to a person skilled in the art having the benefit of this disclosure. It is intended to embrace all such modifications and changes and, accordingly, the above description is to be regarded in an illustrative rather than a restrictive sense.

Claims

CLAIMS WHAT IS CLAIMED IS:
1. A machine learning system configured with instructions, that when executed, cause the system to: receive procurement data associated with a future procurement from a consuming entity; receive historical procurement data from the consuming entity; receive historical data from producing entities, the historical data including one or more of past performance, financial data, economic or macroeconomic data, industry-specific data, news data, and social media data; determine, based upon the procurement data, one or more producing entities that are capable of fulfilling the future procurement; train, using one or more of the historical procurement data and the historical data from producing entities, a machine learning model to generate a likelihood that a first producing entity of the one or more producing entities that are capable of fulfilling the future procurement will be selected for a procurement activity by the consuming entity; determine, based at least in part on executing the machine learning model, a likelihood that the first producing entity will be selected for the future procurement by the consuming entity; and determine, based at least in part on the procurement data and the likelihood that the first producing entity will be selected for the future procurement, an impact to the first producing entity.
2. The machine learning system of claim 1, wherein the procurement data comprises one or more of budgetary allocations, contracts open for bid, contract requirements, period of performance, price, and payment terms.
3. The machine learning system of claim 1, wherein the historical procurement data comprises one or more of past contract awards, an identity of contract managers, an identification of goods or services, contract awardees, and performance ratings of past contract awardees.
4. The machine learning system of claim 1, wherein the financial data comprises one or more of stock price, earnings, book value, annual financial reports, quarterly financial reports, and operating income.
5. The machine learning system of claim 1, further comprising determine, based at least in part on the procurement data and the likelihood that a second producing entity will be selected for the procurement activity, an impact to the second producing entity.
6. The machine learning system of claim 1, wherein determine the impact to the first producing entity comprises forecasting one or more of revenue, cost of goods sold, expenses, interest, tax, non-cash expenses, financial ratios, and discount rate.
7. The machine learning system of claim 1, further comprising instructions that cause the system to receive non-standard data and tags associated with the non-standard data and apply one or more machine learning algorithms to correlate the non-standard data tags to previously-defined data tags to standardize the non-standard data.
8. The machine learning system of claim 7, further comprising instructions that cause the system to execute a standardization module and standardize the non-standard data.
9. The machine learning system of claim 1, further comprising instructions that cause the system to apply a taxonomy to the historical data from producing entities.
10. The machine learning system of claim 9, wherein the historical data from producing entities is stored in a data lake.
11. The machine learning system of claim 1, wherein one or more of the historical procurement data and the historical data from producing entities is time-series data.
12. The machine learning system of claim 11, wherein the time-series data is weighted based upon a date stamp associated with the time-series data.
13. The machine learning system of claim 11, further comprising executing a sentiment analysis of the time-series data to determine subjective information.
14. The machine learning system of claim 1, further comprising determining a veracity score for one or more of the historical procurement data and the historical data from producing entities.
15. The machine learning system of claim 1, wherein determining the impact to the first producing entity comprises forecasting one or more of a future stock price, a future EBITDA, or a future cash on hand.
PCT/US2021/037489 2020-06-15 2021-06-15 Time series forecasting and visualization methods and systems WO2021257610A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202063039158P 2020-06-15 2020-06-15
US63/039,158 2020-06-15

Publications (1)

Publication Number Publication Date
WO2021257610A1 true WO2021257610A1 (en) 2021-12-23

Family

ID=76959058

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2021/037489 WO2021257610A1 (en) 2020-06-15 2021-06-15 Time series forecasting and visualization methods and systems

Country Status (1)

Country Link
WO (1) WO2021257610A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220147895A1 (en) * 2020-11-06 2022-05-12 Citizens Financial Group, Inc. Automated data forecasting using machine learning
US20220164886A1 (en) * 2020-11-24 2022-05-26 VFD SAAS Technology, Ltd. Artificial intelligence financial analysis and reporting platform
CN115204535A (en) * 2022-09-16 2022-10-18 湖北信通通信有限公司 Purchasing business volume prediction method based on dynamic multivariate time sequence and electronic equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110106743A1 (en) * 2008-01-14 2011-05-05 Duchon Andrew P Method and system to predict a data value

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110106743A1 (en) * 2008-01-14 2011-05-05 Duchon Andrew P Method and system to predict a data value

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220147895A1 (en) * 2020-11-06 2022-05-12 Citizens Financial Group, Inc. Automated data forecasting using machine learning
US11803793B2 (en) * 2020-11-06 2023-10-31 Citizens Financial Group, Inc. Automated data forecasting using machine learning
US20220164886A1 (en) * 2020-11-24 2022-05-26 VFD SAAS Technology, Ltd. Artificial intelligence financial analysis and reporting platform
CN115204535A (en) * 2022-09-16 2022-10-18 湖北信通通信有限公司 Purchasing business volume prediction method based on dynamic multivariate time sequence and electronic equipment

Similar Documents

Publication Publication Date Title
Ardia et al. Questioning the news about economic growth: Sparse forecasting using thousands of news-based sentiment values
US11354747B2 (en) Real-time predictive analytics engine
Cerchiello et al. Big data analysis for financial risk management
US11257161B2 (en) Methods and systems for predicting market behavior based on news and sentiment analysis
US20040215551A1 (en) Value and risk management system for multi-enterprise organization
US20080027841A1 (en) System for integrating enterprise performance management
Yan et al. Uncertainty and IPO initial returns: evidence from the tone analysis of China’s IPO prospectuses
US20080015871A1 (en) Varr system
Hisano et al. High quality topic extraction from business news explains abnormal financial market volatility
WO2021257610A1 (en) Time series forecasting and visualization methods and systems
US20220343433A1 (en) System and method that rank businesses in environmental, social and governance (esg)
Aprigliano et al. The power of text-based indicators in forecasting Italian economic activity
US20150221038A1 (en) Methods and system for financial instrument classification
Brown et al. Financial statement adequacy and firms' MD&A disclosures
Nissim Big data, accounting information, and valuation
CN114303140A (en) Analysis of intellectual property data related to products and services
Chi et al. The use and usefulness of big data in finance: Evidence from financial analysts
Hsu et al. Does industry competition influence analyst coverage decisions and career outcomes?
Yao et al. Using social media information to predict the credit risk of listed enterprises in the supply chain
Altaf Two decades of big data in finance: systematic literature review and future research agenda
Pratap et al. Macroeconomic effects of uncertainty: a Google trends-based analysis for India
Elena News sentiment in bankruptcy prediction models: Evidence from Russian retail companies
CN110968622B (en) Accounting report customization method, platform and terminal
Kelly News, sentiment and financial markets: A computational system to evaluate the influence of text sentiment on financial assets
US20170061548A1 (en) Advice engine

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21742948

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21742948

Country of ref document: EP

Kind code of ref document: A1