Summarize#

Overview#

Summarize is a tool to generate summary tables that include statistical information (i.e., correlation matrix) about the user provided data. The summary tables help the user to look at their data in new ways. Also, it helps user have a better understanding of their data before doing future analysis. The user can open Summarize Control Board to visualize the data (see details in Summarize Control Board).

The user can expect to have six result tables for each running of Summarize in Finplicity:

  • Correlation matrix: the measure used in variables to demonstrate a linear relationship between each other.

  • Covariance matrix: the measure used in variables to show the distribution magnitude and direction of multivariate data in multidimensional space.

  • Mixed correlation and covariance table: the mixed of correlation matrix and covariance matrix.

  • Moments table: the quantitative measure of the data distribution.

  • Basic statistics table: the table which includes the value of variables’ basic statistical information (i.e., mean, median, etc.).

  • Percentile table: the percentile table indicates the percentage of scores that fall below a particular value.

The data for a summarize function requires a set of numerical values with header. The header can be located at the top row or the first column. Nonnumerical data can only be set as index variable or header. Please leave enough blank space for the output in Excel worksheet.

Videos#

Note

Video goes here.

Examples#

Here is a step-by-step example of using Summarize to generate the summary tables of two columns of data with date index shown as follows.

  1. Prepare the data. In this example, the header is the top row. The “Date” column is index.

  2. Open Finplicity, and then click Summarize. The form of Summarize will appear, as shown below.

  3. In “Data location with header,” select the data Excel range. In this example, select the cells A1:C11.

  4. In “Header variable” part, select “first column” if the header located in the first column or “top row” if the header located in the top row. In this example, select “top row.”

  5. In “Index variable” part, select “first column” if the index located in the first column or select “top row” if the index located in the top row or select “none” if no index needed. In this example, select “first column.”

  6. In “Output location,” select the Excel cell for the output summary tables. In this example, select the cell E1.

  7. Check “Open control board to visualize and curate the data” if you want to open Summarize Control Board. This step is optional. For more information, please see the next part.

  8. Now you have finished all the required fields in the Summarize form, and you can click “Run” to generate the result summary tables or click “Save” to save the task. Below is the result summary tables in this example.

Summarize Control Board#

Summarize Control Board is a built-in visualization tool for summarize function. Data summary plot, relationship plot between two selected variables, data summary table, correlation coefficient matrix, and covariance matrix are provided on Summarize Control Board. The user can curate some data points on the data summary plot and the other plots on Summarize Control Board will be refreshed with the selected data points. The user can have some interactive between Summarize Control Board and Excel worksheet by providing a “Data write back location” or/and “Summary write back location.”

  • Data write back: The selected data points in the summary plot.

  • Summary write back: The result summary tables which is recalculated using the selected data points in the summary plot.

Open Summarize Control Board#

Here is a step-by-step example of opening a Summarize Control Board using the above-mentioned dataset.

  1. Fill the required fields as demonstrated above.

  2. Check “Open control board to visualize and curate the data” checkbox.

  3. Check “Data write back location” checkbox and select the Excel cell for data write back. This step is optional. Leave it unchecked if not needed. In this example, select “N1.”

  4. Check “Summary write back location” checkbox and select the Excel cell for summary write back. This step is optional. Leave it unchecked if not needed. In this example, select “R1.”

  5. Now you can click “Run” to generate the result summary tables or click “Save” to save the task. Below is Summarize Control Board in this example.

Functions on Summarize Control Board#

  1. Data curation on summary plot

    If you did data column curation and/or data points curation on summary plot, the relationship between two selected variables plots, data summary table, Pearson correlation coefficient matrix, and covariance matrix will be updated with the data. If you have set the “Data write back location,” the curated data on summary plot will be updated to its corresponding Excel worksheet location. If you have set the “Summary write back location,” the curated data summary results will be updated to its corresponding Excel worksheet location.

    • Data columns curation

      Curate data columns by clicking on and off the legend on the right-hand side of the Summarize Control Board.

    • Data points curation

      Curate data points by hovering over the summary plot and clicking “Box Select” button on the navigation bar (see below). Then draw a box on the summary plot to include the data points that you want to curate.

    • Data reset

      Click empty space on the summary plot to reset the current curation to original data.

Visualizations on Summarize Control Board#

  1. Relationship between two selected variables plot

    • Use “X-axis” dropdown to select the variables you want to plot on the X axis.

    • Use “Y-axis” dropdown to select the variables you want to plot on the Y axis.

  2. Data summary table

    • Use “Rounding” dropdown to change the summary values’ round decimals on the data summary table.

FAQ#

  • What if the raw data contains nonnumerical values?

Our summarize function doesn’t support nonnumerical values in table body. Nonnumerical values can only set to be header or index. You can also use functions in Data Wrangling to pre-process your data before doing a summarized.

  • Can I use data without header?

No, there must be a header variable. The summarize function will take the “Header variable” radio button value as the header location.

  • Can I do a summarized without opening Summarize Control Board?

Yes, Summarize Control Board is optional. You just uncheck the “Open control board to visualize and curate the data” check box. The “Data write back location” and “Summary write back location” check boxes will automatically be unchecked.