File Reader#

Overview#

File Reader is a tool to import data from CSV or TXT files stored on the disk or from external web links to Excel worksheets with (optional) data sampling and (optional) advanced settings. This tool is commonly used when the existing data source is in CSV or TXT format.

File Reader requires an input file in CSV or TXT format and an output worksheet. Data sampling can also be added with various settings, together with multiple advanced settings. File Reader reads the data from the input file, samples the data, and writes the sampled data with selected advanced settings into the output worksheet.

Videos#

Examples#

Example 1: Read CSV Files#

Here is a step-by-step example of using File Reader to import data from a CSV file stored on the disk to an Excel worksheet with data sampling. File Reader assumes that CSV files are comma-separated, and TXT files are tab-separated.

Prepare the input data in a CSV/TXT file and an Excel worksheet.
Open Finplicity, and then click Data Reader > From CSV/TXT File. The form of file reader will appear as below.
In Step 1, click “Browse” to select the file path of the input CSV file and then click “Connect” to load the file. You should see the headers listed in the left column.
In Step 1, select the columns you want to import by adding corresponding headers from the left column to the right column. You can click left/right arrow to move the selected headers to the left/right side or click “Remove All” / “Add All” to move all headers to the left/right side. In this example, click “Add All” to select all headers. Then, select the “index.” Imported data will be sorted based on the selected index. If nothing is selected, the original order of the data will be preserved. In this example, select “columnA”. After that, select the “output area.” Imported data will be written in the selected area. In this example, select A1 (as the top-left cell). Now you have finished all the required steps in this form as shown below.
Now you have finished all the required steps in this form, and you can click “Run” to import data or click “Save” to save this task. Below is the imported data. As you can see, “columnA” column is the index to sort the data. Then we will learn how to sample the data in Step 2 and how to set advanced settings in Step 3.
In Step 2, change the settings for sampling. This step is optional and there is no sampling by default. In this example, use the following settings.
- Import data with sampling: whether to enable sampling
  - Sample Without Replacement: do not allow to sample the same row for multiple times.
    - Block Length: the number of consecutive rows selected at a time
    - Optimal Block Length: optimal block length based on total number of rows
  - Sample With Replacement: allow to sample the same row for multiple times.
    - Block Length: the number of consecutive rows selected at a time
    - Optimal Block Length: optimal block length based on total number of rows
  - Sample Size: the number of imported rows
  - Sample Fraction: the number of imported rows as a fraction of total number of rows
  - Random Seed: use a random seed to control sampling (which yields random results)
  - Set Seed: use a custom seed to control sampling (which yields deterministic results)
In Step 3, change the advanced settings or use the default settings. This step is optional and there are some settings by default. In this example, use the following settings.
After choosing the settings at Step 2 and Step 3, now you have reached the end of the form. Go back to Step 1 to change the output location to I1, and then click “Run” button to generate the data. You will see the below result.

Example 2: Read CSV Files from the External Link#

Here is a step-by-step example of using File Reader to import data from a CSV file from the external web link to an Excel worksheet.

Prepare the input data from external web link and an Excel worksheet. In this example, we will try to read data from “https://www2.census.gov/geo/docs/reference/state.txt,” which contains the following data separated by “|.” File Reader assumes that CSV files are comma-separated, and TXT files are tab-separated. The data looks like below.
Open Finplicity, and then click Data Reader > From CSV/TXT File. The form of file reader will appear as below.
In Step 1, copy and paste the web link into “CSV/TXT File Path” field, select “Custom” Delimiter, and enter “|” into the input box, and then click “Connect” to load the file. You should see the headers listed in the left column.
In Step 1, select the columns you want to import by adding corresponding headers into the right column. You can click left/right arrow to move the selected headers to the left/right side or click “Clear All” / “Add All” to move all headers to the left/right side. In this example, click “Add All” to select all headers. Select the “index” to be “STATE.” And select the “output location” to be A1. Now you have finished all the required steps in this form as shown below.
Now you have finished all the required steps in this form, and you can click “Run” to import data or click “Save” to save this task. Below is the imported data. It is the same as in Example 1 to sample the data in Step 2 and set advanced settings in Step 3. Here we skip those two optional steps.

FAQ#

What are the requirements for the input file?

The input file must (1) be in CSV/TXT format; (2) have headers.

How to select the output area if I do not know its exact size?

Select a single cell as the top-left cell.

What does “block length” mean in Step 2: Data Sampling?

Block length is the number of consecutive rows selected at a time. This parameter is introduced for block bootstrapping, a sampling technique to better preserve data characteristics. Suppose an input file has 10 rows of data (from row 1 to row 10), the sample size is 7, and the block length is 3. It works as below.

The first randomly picked row is row 4. Since the block length is 3, rows 4, 5 and 6 are all selected as samples.
The second randomly picked row is row 9. As row 11 does not exist, only rows 9 and 10 are selected.
The third randomly picked row is row 3. The total sample size is 7, so only 2 samples need to be selected, and they are row 3 and 4.
Therefore, the samples are rows 4, 5, 6, 9, 10, 3 and 4.

Why do I get different imported data every time?

Sampling results depends on the seed for random number generation. If you select “random seed”, the result will be random each time you run File Reader; but if you provide a custom seed, like 123 in the example, the result will be the same each time.