I recently learned about the Linux command line utility shuf from browsing The Art of Command Line.

This could be useful for random sampling.

Given just a file name, shuf randomly permutes the lines of the file.

With the option -n you can specify how many lines to return.

So it’s doing sampling without replacement.

For example, shuf -n 10 foo.

txtwould select 10 lines from foo.


Actually, it would select at most 10 lines.

You can’t select 10 lines without replacement from a file with less than 10 lines.

If you ask for an impossible number of lines, the -n option is ignored.

You can also sample with replacement using the -r option.

In that case you can select more lines than are in the file since lines may be reused.

For example, you could run shuf -r -n 10 foo.

txtto select 10 lines drawn with replacement from foo.

txt, regardless of how many lines foo.

txt has.

For example, when I ran the command above on a file containing alpha beta gamma I got the output beta gamma gamma beta alpha alpha gamma gamma beta I don’t know how shuf seeds its random generator.

Maybe from the system time.

But if you run it twice you will get different results.


