Saturday, 15 April 2017

Split contents of a file according to a list

The following script runs through a list in a first file, and check through a second file for lines that start with items in the list. It outputs the matched lines in the second file to a third file.

while read p; do
  grep "^$p" $2 >> $3
done <$1
The script is useful if you have a dataset split file and wanted to extract relevant lines from another file (e.g., a detector's bounding box outputs) according to this split. For example, you may want to run
./ train.txt output_trainval.txt output_train.txt
./ val.txt output_trainval.txt output_val.txt
Note the script makes no assumption about the ordering of lines in the second file.