4

I would like to whittle down a large database from the command line to N files, very similar to this question does. The only difference is that most of my files are in sub-directories, so I was wondering if there was a quick fix to my problem or if it would require more in depth action. Currently, my command looks like (with the (N+1) replaced with the appropriate number):

find . -type f | sort -R | tail -n +(N+1) | xargs rm

I originally thought this would work because find by nature is recursive, and then I tried adding the -r (recursive flag) around the rm as the output indicates that it is randomly selecting files, but can't find them to delete. Any ideas?

EDIT: My new command looks like this:

find . -type f -print0 | sort -R | tail -n +(N+1) | xargs -0 rm

and now I get the error saying rm: missing operand. Also, I am on a CentOS machine so the -z flag is unavailable to me.

EDIT #2 This command runs:

find . -type f -print0 | sort -R | tail -n +(N+1) | xargs -0 -r rm

but when I execute a find . -type f | wc -l to get the number of files in the directory (which should be N if the command worked correctly) has not changed from the starting amount of files.

Alerra
  • 145

2 Answers2

1

If you need to use find … -print0 and you cannot use -z with sort and/or tail, there is a possible, yet cumbersome workaround (substitute (N+1) as usual):

find . -type f -printf "%i\n" | sort | uniq | sort -R | tail -n +(N+1) |
   while read i; do
      find . -type f -inum "$i" -delete
   done

The dirty trick is we use inode numbers instead of paths.

The inner find removes all files with the given inode number under the current directory, so if some files are hardlinked one to another then you will either lose them all or keep them all.

Preliminary sort | uniq is to avoid a mishap when you lose too much because of duplicate inode numbers because of hardlinks. You may end up with more than N filenames, pointing to up to N distinct inodes in total.

In case your find doesn't understand -delete, use -exec rm {} +.

0

I did this on osx like

find . -type f -print | sort | uniq | sort --random-sort | tail -n +1000 | xargs rm -f

Where my N was 1000. You can then doublecheck it's the right number of files remainig with ls | wc -l. See also https://stackoverflow.com/a/20307392/630752