Simulate#

Overview#

Simulate provides a systematic way to create a model in Excel that associates model outcomes (e.g., profits, earnings per share) with model inputs (e.g., number of customers, costs per customer, and prices); associate data (either historical or projected values) with model inputs; randomly select input values from that data using statistically sound methods; and generate model outcomes consistent with the random variation in input values. We currently provide two-way to draw from random variables. The default way is to draw different data rows for each Random variable. And “Row bootstrap each random variable data series with the same length” will ensure that the same row of data for each Random Variable is chosen within each single random draw. Also, it helps user have a better understanding of their data before doing future analysis. The user can open Simulate Control Board to visualize the data (see details in Simulate Control Board).

Videos#

Note

Video goes here.

Examples#

Here is a step-by-step example of using simulate to generating synthetic data by considering the following simple Excel model of profits and earnings per share for Ace Hospital Co.

  1. Prepare an Excel sheet with data to simulate. In this example, each green highlighted cell represents a Random Variable with a specific value from Data. Each peach color cell holds a formula that computes an Objective Function. Intermediate formulas are not highlighted, as Ace ultimately cares about the potential variation in “Net income after taxes” and “Earnings per Share” across their 5,000 assumed outstanding shares. The blue highlighted cell represents ranges to restricts sampling to values in shown [low value, high value] intervals separated by semicolons.

  2. Open Finplicity, and then click Simulate. The form of Simulate will appear, as shown below.

  3. Click “Add Objective Function” button to include an objective function. A new window will pop up shown below. “Label” refers to text that describes the model outcome (e.g., profit). “Variable location” is the Excel equation that calculates the output (e.g., =C13-C14). In this example, “Net income after taxes” and “Earnings per Share” are both objective functions. Select B17 in “Label” and select C17 in “Variable location.” Then click “Add” to add “Net income after taxes” objective function to table.

  4. Click “Add Objective Function” button to include “Earnings per Share” objective function. A new window will pop up shown below. Select B19 in “Label” and select C19 in “Variable location.” Then click “Add” to add “Earnings per Share” objective function to table.

  5. Click “Add Random Variable” button to include a model input. A new window will pop up shown below. “Label” refers to text that describes the model input (e.g., costs). “Variable location” is the Excel value that holds the input (e.g., =1000). “Data location” is the Excel data location randomly sampled to replace the value held at “Variable location” for “Number of simulations” times. “Interval bounds” restricts sampling to values in shown [low value, high value] intervals separated by semicolons. In this example, we have Patients, Reimbursement pp, Costs pp three random variables to add. Select B5 in “Label,” select C5 in “Variable location,” select G5:G32 in “Data location,” and select D5 in “Internal bounds.” Then click “Add” to add Patients random variable to table.

  6. Click “Add Random Variable” button to add Reimbursement pp random variable. A new window will pop up shown below. Select B7 in “Label,” select C7 in “Variable location,” select H5:H32 in “Data location,” and select D7 in “Internal bounds.” Then click “Add” to add Reimbursement pp random variable to table.

  7. Click “Add Random Variable” button to add Costs pp random variable. A new window will pop up shown below. Select B10 in “Label,” select C10 in “Variable location,” and select I5:I32 in “Data location.” Since Costs pp doesn’t have internal bounds. We can leave it as blank. Then click “Add” to add Costs pp random variable to table.

  8. Check “Row bootstrap each random variable data series with the same length” if you want to ensure that the same row of data for each Random Variable is chosen within each single random draw. Otherwise, uncheck it. The default way is that different data rows for each Random variable might be chosen for a single draw. In this example, we will check “Row bootstrap each random variable data series with the same length.”

  9. Select “Number of simulations” which is the sample size generated either as hard-coded “Input a number” or as an Excel location.”Input a number” means that the number of simulations is hardcoded as a fixed number. “Select a number” means that the number of simulations is read from Excel sheet and can vary. This option is most useful if the Simulate task is saved as part of a workflow and additional flexibility is desired. In this example, select “Select a number” and select C21 (value is 50).

  10. Check “Callback workflow” and select a workflow from the dropdown if you want to run the workflow when this simulates task has been executed. In this example, we will uncheck it.

  11. Select an output location for simulated result. In this example, select K1.

  12. Check “Open control board to visualize and curate the data” if you want to open Simulate Control Board. This step is optional. For more information, please see the next part.

  13. Now you have finished all the required fields in the Simulate form, and you can click “Run” to generate the simulated result or click “Save” to save the task. After clicking “Run,” the spreadsheet model will appear to “flicker” as the simulation randomly considers many different possible input values from Data for the Random Variables. Since the “Row bootstrap …” option was chosen, an entire row in Data is taken at random, as shown in image below with one of the rows highlighted with a dashed-lined box corresponding to values chosen and shown in the green highlighted Random Variables. (If “Row bootstrap …” option was not chosen then each column in Data is sample independently of each other.) This process is repeated “Number of simulations” times, equal to 50 in this example. The same row might be randomly selected multiple times, a process known as “sampling with replacement,” which is the most statistically sound method to sample data. The output at location K1 in Excel will look something like the following table, where we only show some rows for brevity. (Since the simulation is random, your results with the sample example will be different).

Simulate Control Board#

Simulate Control Board is a built-in visualization tool for simulate function. Random variables plot, relationship between random variables plot, objective function histogram, and data summary table are provided on Simulate Control Board. The Control Board allows the user to visualize the simulation results, and have some interaction between Simulate Control Board and Excel worksheet by “Rerun simulation,” “Update bounds,” and “Reset bounds.”

Open Simulate Control Board#

Here is a step-by-step example of opening a Simulate Control Board using the above-mentioned dataset.

  1. Fill the required fields as demonstrated above.

  2. Check “Open control board to visualize and curate the data” checkbox.

  3. Now you can click “Run” to generate the result simulate tables and simulate control board or click “Save” to save the task. Below is Simulate Control Board in this example (Since the simulation is random, your results with the sample example will be different).

Functions on Simulate Control Board#

  1. Filter Random Variables

    This area changes the constraints applied to each random variable.

    • Click on “Random variables” dropdown to add a random variable range to the pane. Add up to six random variables.

    • Click on “Bins” dropdown to control the resolution of the bins based on optimal bin size theory.

    • Use your computer mouse to use “box select” (left click and hold to draw a box) on a random variable to change its constraint range, as shown below for the random variable “Patients” from the previous example. Repeat as desired for other random variables.

    • Click “Rerun simulation” button to update the data shown in Select Objective Function.

    • Click “Update constraints” button to overwrite the original constraints in Excel specified in the Simulate Form with the box select values. If the Simulate Form values were saved as a task, these new constraints will be used in future when that task is run in the future. This feature requires specifying a constraint location for each random variable in the Simulate Form; otherwise, an error is produced. For random variables with no constraints, specify a blank cell that can hold the location of the updated constraint written back to Excel.

    • Click “Reset constraints” to return the constraints in Excel to their original values specified in the Simulate Form.

  2. Relationship between two random variables

    This area shows a correlogram indicate the intensity of common value pairs between two chosen random variables. The correlogram automatically reflects the constraints selected in Filter Random Variables.

  3. Select Objective Function

    This area shows the simulation outcome for a chosen objective function. If desired:

    • Use “box select” to limit the outcome range. The probability bar below the objective function histogram will update. The data summary table for this objective function automatically updates where the column “Box Select” summarizes the information in the random variable both without and with box select, as shown here:

    • Click on “Normalized” button reformulate the values on the vertical axis as percent, thereby showing the percent of time that the value of the objective function falls in the chosen horizontal bin range.

FAQ#

  • When should I use Simulate versus Forecast?

Simulate and Forecast are not mutually exclusive but are used for different purposes. Forecast is used to predict a “best fit” value based purely on statistical relationships that tend to minimize the difference between actual. Simulate is used along with a model to estimate the variation in model outcomes based on variations in model inputs. Use Forecast when you only care about the best guess of some variable based on previous data. Use Simulate when you have a model and want to test its sensitivity to variation in its inputs.

For example, as airline, Forecast can estimate the number of likely domestic and international customers next month based on historical trends, time of the year, and economic factors that influence purchase decisions. Forecast estimates a “best fit” statistical relationship across different data series but does not consider whether those relationships are consistent with an actual model.

In some cases, a model be postulated by science (laws of physics) or defined by law, such as an accounting relationship between inputs and net income and earning per share, as in the previous Example. It might be sensible to forecast future model inputs, like number of patients and Costs per patient, based on economic conditions and other factors that influence these variables. However, data for Random Variable inputs should never be forecasted together with data of an Objective Function (e.g., net income) since an existing model already determines the relationship of the Objective Function with the inputs. In fact, trying to do so will generate an error—known in statistics as a singularity or multi-collinearity. In layman’s terms, Forecast sees an ambiguity in how to predict the future Objective Function (e.g., net income) because including it in Forecast is redundant.

Simulate considers how to estimate uncertainty of model outcomes (Objective Functions) based on uncertainty in model parameters. The Data for model inputs (Random Variables) can be based on a Forecast based on past data. Alternatively, Data could be purely hypothetical and unrelated to the past or even time. So, simulation can be used to “stress test” a model including hypothetical worse case scenarios or to test which model inputs are the most important for determining success.

  • How do I specify a “one-sided” constraint for a Random Variable?

Suppose that you want to limit the randomness from Data for a Random Variable to all values less than 10.5. The constraint can be written as “[,10.5]” in the Excel cell. The constraints “[20.3,]” limits the randomness from Data for a Random Variable to all values greater than 20.3. These constraints can even be combined for the same Random Variable if separated by a semicolon: “[,10.5]; [20.3,].” An in-between range can also be included, for example, “[,10.5]; [15.0,18.5], [20.3,].”

  • What happens if I do not include a constraint for a Random Variable?

The randomness from Data for a Random Variable is then not limited in any way.

  • Why are constraints specified in an Excel cell rather than hard-coded in a Finplicity form?

To allow for additional curation when selecting the Control Board option.

  • For Data for a Random Variable, what happens if I just select the top row number?

Simulate will expand the read of the Data. If the top row happens to a header value instead of a number, Simulate will attempt to fix by expanding numerical values below it. If the Data expands beyond a single column to encompass an entire table, only the first column is used.

  • Can Data for a Random Variable be organized as a row instead of a column?

Yes, Simulate will read this data and transpose it into a column to be consistent with the “Row bootstrap” option. Still, we recommend storing your Data as columns to make your spreadsheet more readable. You can transpose your data using Data Basics > Transpose Data.

  • When should I check “Row bootstrap each random variable data series of the same length” ?

“Row bootstrap each random variable data series of the same length” ensures that the same row of data for each Random Variable is chosen within each single random draw; otherwise, different data rows for each Random variable might be chosen for a single draw. The data columns for each Random Variable must be the same length but do not have to adjacent, as in a table. Select this option will enforce statistical correlation properties of Data across multiple Random Variables. But this selection comes at the cost of reducing variation in the randomness of the simulation; do not select this option if the data correlation is unimportant for the application.

  • Why do I see the error: “Constraints of one or more random variables are not in a valid format.”

A constraint for a Random Variable takes the [low value, high value] format. Separate multiple constraints for the same Random Variable with a semicolon. For example, the constraint “[1.0,5.9]; [9.1,21.3]” tells Simulate to only consider randomness from Data for a Random Variable that falls in the ranges 1.0 – 5.9 and 9.1 – 21.3. If the constraint area is empty, verify that there are no stray characters in the selected cell.

  • Why do I see the error: “The intended constraint(s) of the following random variable(s) are not in a valid format.”

A constraint must be specified in [low value, high value], with multiple constraints separated by semicolons. If the constraint area is empty, verify that there are no stray characters in the selected cell.

  • Why do I see the error: “Data selected for random variable [name here] contains multiple columns and rows? Select data as a single column or row without an index”

Organize Data for a Random Variable as a single row or column. Do not include an index.

  • Why do I see the error: “Data selected for random variable [name here] contains non-numeric data? Select numerical data without a column header.”

Simulate tries to convert numbers in Excel stored as text. Simulate converts this text to numbers, but there still might be some text in Excel where the numerical conversion is not obvious. Recheck your data selection to ensure it is correct.

  • Why do I see the error: “There is no viable sample that satisfies your constraints …”

The constraints you have specified have eliminated any reasonable random distribution available in your Data. If you cannot increase the Data size, consider relaxing your constraints.

  • What about error messages not shown above?

If you are unable to understand an error message, the error box gives you the choice to send the message to us for our review. If we see that many users obtain the same error, we will work to “trap” that error to improve the messaging to the user.