Forecast#

Overview#

Forecast provides a systematic way to make predictions across numerous data environments:

Single time series (e.g., trading volume for a single grain company), with or without “exogenous” (“independent”) data (e.g., weather and advertising).
Multiple time series (e.g., trading volume of multiple grain companies that would likely be correlated with each other), with or without “exogenous” (“independent”) data (e.g., weather and advertising).
Cross-sectional data across many different units (e.g., regions).
“Fixed effects” data that allows for both time and categorical controls.
Date pooled across the cross-sectional and time.

Forecast includes many advanced estimation features:

Rigorously handles categorical (non-numerical) Exogenous (“right-hand side”) and Endogenous (“left-hand side”) data.
“Confidence intervals” (numerical Endogenous data) or “prediction ranges” (categorical Endogenous data) reflecting uncertainty in the forecast.
Recommends the best forecast methods consistent with your data. Methods are chosen from a range of advanced techniques available from the fields of statistics and machine learning.
When appropriate, calibration / tuning the “best” statistical method (e.g., determine the optimal (p, d, q) combination for time series models) against a suitable “information criterion.”

Videos#

Note

Video goes here.

Examples#

Clicking on Forecast produces the following form (with some selections filled for a specific example):

Step 1: Select Data#

Select the relevant data.

Explanation:

“Endogenous data to forecast” points to the Excel location of the data desired to be forecasted, either into the future or across different scenarios. In statistics, Endogenous data is sometimes called “dependent” or “left-hand side” data. There are two optional settings for reading in the data:
- Top row is the header row Click this option if the top row in the endogenous data should be treated as the header.
- The left most column is an index Click this option if the left most column in the endogenous data should be treated as the index column.
“Is there exogenous data?” Click this option to allow the statistical forecast model to include additional Exogenous data that might help determine the values of the Endogenous data. Exogenous data is sometimes called “independent” or “right-hand side” data. This selection is optional because some Forecast methods do not require Exogenous data. The best fitting statistical models, however, typically use Exogenous data. The presence of Exogenous data also allows for statistical forecasting against hypothetical future changes. Clicking this option unlocks several more choices:
- “Exogenous data” points to the Excel location of the Exogenous data.
- “Preforecast exogenous data” holds the Excel location of the Exogenous data consistent with the forecast. This information is required if Exogenous data is provided.
- “One-hot encoding of non-numerical exogenous data?” Click this option to allow categorical (non-numerical) Exogenous data to be included in the forecast by converting categorical data into a set of “dummy” (fixed effects) variables. If this option is not clicked, Finplicity will attempt to convert data stored as Excel text into numerical format (e.g., “500.3” becomes 500.3), generating an error if it fails (e.g., “dog” has no numerical value).
- “Endogenous data should be treated as categorical” Click this option to force Endogenous data to be treated as categorical even if it is numerical. For example, a firm might have four distinct plant operations, labeled as 1, 2, 3, and 4. This option allows these plant identifiers to be treated appropriately as separate plants instead of, for example, representing four different levels of output of the same plant that varies with the exogenous data.
- “Join rule if Endogenous data index and Exogenous data index ever differ:” Endogenous and Exogenous data must have a common table index (e.g., date) to line up. Select the index rule to be used if the actual indexes ever differ between Endogenous and Exogenous data.
  - Endogenous: select this option to insert or delete rows in Exogenous data to include rows with index values that exactly matches the Endogenous data. If rows are added, empty values can be filled in with interpolation discussed below.
  - Exogenous: select this option to insert or delete rows in Endogenous data to include rows with index values that exactly matches the Exogenous data. If rows are added, empty values can be filled in with interpolation discussed below.
  - Intersection: select this option to only include rows with common index values in Endogenous and Exogenous data. Will result in all data being dropped if there are no index values in common.
  - Union: select this option to insert or delete rows in both Exogenous and Endogenous data to include all index values found in the Exogenous and Endogenous data. If rows are added, empty values can be filled in with interpolation discussed below.
- “Select missing continuous data rule (missing categorical data is always dropped)” Decide how to treat missing data, either with linear interpolation using the surrounding data in the same column, or by dropping the missing data.

Step 2: Select Model Options#

Select the desired statistical forecast model, choose the desired forecast output and where to place it.

Explanation:

“Select model” Finplicity automatically presents appropriate statistical models based on your data. The models are presented in an order that should be tried first. The models will differ from the ones shown below based on the data provided.
“Output location for (out-sample) forecast” Select the location for the (“out-of-sample”) forecast of the Endogenous data based on the chosen “Number of Forecast Periods”. The number of forecast periods can be either fixed (static) as an “Input value” or read from an Excel “Input cell”. Selecting “Input cell” allows the number of forecast periods to be changed in Excel either directly on the worksheet or using the Control Board discussed below. This information is always required.
“Output location for (in-sample) fit” Select this option to examine how the statistical model would have performed (“fitted”) over the “in-sample” historical data used to form the statistical model.
“Confidence / Prediction Interval: in-sample fit” Select this option to show confidence intervals (if Endogenous data is numerical) or prediction intervals (if Endogenous data is categorical) for the in-sample fit.
“Confidence / Prediction Interval: out-sample forecast” Select this option to show confidence intervals (if Endogenous data is numerical) or prediction intervals (if Endogenous data is categorical) for the out-sample forecast.
“Launch Control Board” Select this option to launch the Control Board discussed below.

Step 3: Advanced Settings#

Choose advanced settings specific to the forecast method chosen.

Methods: Numerical Endogenous Data#

Forecast methods can be decomposed into two distinct sets of methods, depending on whether the Endogenous (“dependent” or “left-hand side”) data is numerical or categorical. This section reviews the methods available when the Endogenous data is numerical. A complete example is provided in Example: Numerical Endogenous Data.

The figure above shows the first few rows of a data set. The Endogenous data includes trading volumes for two firms (vol1 and vol2) that is indexed by the Date column. The Exogenous data includes intra-day prices (High, Open, Close) and a subjective categorical characterization of the weather that can take the values Calm or Storm. The data is messy with blanks (shown in red) and the Exogenous discussed below might have fewer rows than the Endogenous data. Still, Finplicity is robust to these and other issues.

Finplicity supports multiple types of numerical Endogenous data:

Case: Single-variable Endogenous Data with no Exogenous data#

Suppose we want to forecast future values of variable vol1 but do not have the shown Exogenous data. Select the A3:B63 (where 63 is the last row of data) but do not check the exogenous data indicator.

Only one appropriate forecasting model is shown in Step 2, SARIMA, which is documented in more detail within the Python “statsmodels” library, https://www.statsmodels.org.

Case: Multi-variable Endogenous Data with no Exogenous data#

Suppose we forecast future values of variables vol1 and vol2 together to take advantage of their correlation, increasing their forecasting power. We do not have access to Exogenous data. Now, select the range A3:C63 in Step 1 but do not check the exogenous data indicator. In this case, the only one appropriate model, VAR, is shown in Step 2, which is documented in more detail within the Python “statsmodels” library, https://www.statsmodels.org.

Case: Single-variable Endogenous Data with Exogenous data#

Suppose we forecast future values of variable vol1. However, we now do have access to the Exogenous data. (Preforecast exogenous data might have been previously forecasted from the Exogenous historical data using the “Multi-variable Endogenous Data with no Exogenous Data” case discussed above.) Step 1 now looks like the following form:

In this case, several more models in Step 2 are appropriate:

OLS, Ordinary Least Squares, as documented in more detail within the Python “statsmodels” library, https://www.statsmodels.org.
SARIMAX, as documented in more detail within the Python “statsmodels” library, https://www.statsmodels.org.
Random Forest (Continuous), as documented in more detail within the Python “scikit-learn” library, https://scikit-learn.org.
SVR, Support Vector Regression, as documented in more detail within the Python “scikit-learn” library, https://scikit-learn.org.

Case: Multi-variable Endogenous Data with Exogenous data#

Suppose we forecast future values of variables vol1 and vol2 together to take advantage of their correlation, increasing their forecasting power. We now do have access to the Exogenous data. Step 1 looks like the following form:

In this case, the only appropriate model in Step 2 is VARMAX, as documented in more detail within the Python “statsmodels” library, https://www.statsmodels.org.

Methods: Categorical Endogenous Data#

Forecast methods can be decomposed into two distinct sets of methods, depending on whether the Endogenous (“dependent” or “left-hand side”) data is numerical or categorical. This section reviews the methods available when the Endogenous data is categorical. A complete example is provided in Example: Categorical Endogenous Data.

The figure above shows the first few rows of a data set. The Exogenous and Preforecast data is the same as before, but the Endogenous data is now categorical and indicates a decision (Buy, Sell, or Hold) previously made. The objective is to forecast the best future decision for the Preforecast data. Two rules must always be satisfied with categorical Endogenous data:

Endogenous data must be a single column, as there no reliable statistical method that supports multiple columns or a mix of Endogenous categorical and numerical data.
Exogenous data (and, hence, Preforecasted exogenous data) must be provided, as there are no reliable statistical methods that permit forecasting of categorical data without Exogenous data.

The form for Step 1 looks like the following figure:

Notice that “Endogenous data should be treated as categorical” (see red arrow) must be checked.

The form for Step 2 should look like the following figure, with the forecasting options shown:

Several methods are available, as documented in more detail within the Python “scikit-learn” library, https://scikit-learn.org.

Example: Numerical Endogenous Data#

This section considers a complete example where the Endogenous data is numerical.

The above figure shows the first few rows of a data set. The Endogenous data includes trading volumes for two firms (vol1 and vol2) that is indexed by the Date column. The Exogenous data includes intra-day prices (High, Open, Close) and a subjective categorical characterization of the weather that can take the values Calm or Storm. The data is messy with blanks (shown in red) and the Exogenous discussed below might have fewer rows than the Endogenous data. Still, Finplicity is robust to these issues.

This example shows how to forecast vol1 data into the future dates listed in the Preforecast table.