dice and randomization

Randomize Excel Data

Randomize Excel Data

To randomize a dataset, all you need to do is add, to the dataset, a formula column containing the RAND() function then sort by that column. Pick the top n rows (where n = the size of the sample you are after) and you get a quick random sample of the data (see screenshot below). I have also included a workbook here with before and after versions of a small dataset to illustrate.

Before and After Randomizing

randomize excel data
A contiguous dataset before randomizing

 

 

randomize an excel dataset
Dataset after randomizing after adding the column containing the RAND() function and sorting the dataset by it

The Uses of Sampling

This is a quick one I use often. In my previous job I used to do a lot of root cause analysis on large metering datasets. It was occasionally helpful to randomize the data and take a small sample (say, 10 out of 5000 records) then interrogate the path of those records through the various systems we used (SAP, Access Databases, billing data, etc.) in order to get a more complete picture of how a record was moving through the business.

Of course, if the data resided in SAS I could randomize it using the proc surveyselect function. If the data is in Access you can use Rnd([seed]) function combined with TOP (e.g. select top 5 * from d_table order by rnd_column) to generate the sample–that is pretty much┬áthe same thing as what is shown here in Excel.

Share your thoughts below :)