2

I looking for a command or bash script to delete all folders except if they have a specific file type (*.pdf) in the first level subfolder.

folder01
  a.txt
  y.txt

folder02 b.pdf z.txt

folder03 h.txt folder03.1 c.pdf

In the example above folder01 and folder03 needs to be deleted.

My attempt:

#!/bin/bash

shopt -s globstar

Loop through every subdirectory.

for d in */; do f=("$d"/) if [[ -f "$f" && ! "${f##/}" =~ ^.pdf$ ]]; then # echo to ensure a test run; remove when verified. echo rm -r -- "$d" fi done

Joel
  • 123

3 Answers3

1

The following command prints pathnames of directiories about to be deleted:

# cd to the right directory first

find . -type d ! -name . ( -exec [ -r {} ] ; -o ! -prune )
-exec sh -c ' set -- "$1"/*.pdf ! [ -e "$1" ] ' find-sh {} ; -prune -print

If the result looks right, append -exec rm -r {} + after -print. Even if your find supports -delete, do not use it, as it cannot delete non-empty directories.

The code works by running a shell per each directory in consideration. The shell uses globbing to detect files matching *.pdf in the directory. Few remarks:

  • -prune near the end prevents find from descending into directories that will be deleted anyway. E.g. there is no point in checking ./folder03/folder03.1 after we qualify ./folder03 for deletion. And to be clear: deleting ./folder03 with rm -r implies deleting ./folder03/folder03.1, even if there's a file matching *.pdf in folder03.1.

  • ! -name . is POSIX equivalent of (non portable) -mindepth 1 of GNU find, if the starting path is .. It's easy to do this portably if the starting path is ., not so easy otherwise. Therefore I designed the solution so you need to cd to the right directory beforehand.

  • *.pdf does not match hidden files (dot files). Your attempt also uses globbing, so I guess it's fine for you.

  • *.pdf is case-sensitive. A case-insensitive pattern is *.[pP][dD][fF].

  • *.pdf matches files of any type, not necessarily regular files. It's by name only. In one of your comments you wrote "it's safe to use *.pdf as the main rule". So be it.

  • If there is no matching file, *.pdf remains in its literal form in the POSIX shell; so there is at least one "match" and we don't really know if it's a match. In a shell with more features (e.g. in Bash) you can do something about it, but I wanted my code to be portable. This is why I test if the first "match" does not exist in the filesystem (! [ -e "$1" ]) instead of relying on the number of matches.

  • You don't need to be able to cd to every directory being tested.

  • If you are not allowed to read a directory then the shell code won't find any *.pdf in it (even if such file is really there). An attempt to rm -r the directory will fail (unless the directory is already empty), some error messages will be generated. -exec [ -r {} ] \; -o ! -prune prevents find from trying to read the content of such directory and from trying to test it, to delete it. You may want to adjust this part of the solution to your needs, if directories you are not allowed to read are an issue.

  • find-sh is explained here: What is the second sh in sh -c 'some shell code' sh?

1

This seems to work well (EDIT: only if it has a single pdf file):

for d in */; do
  if ! [ -f $d/*.pdf ]; then 
    echo "Will remove $d"
  fi
done

(-f looks for a file at the specified path; -e would more generally look for something at that path)

EDIT: to account for paths with spaces and multiple PDF files in a single directory, you will probably need to use find, for example:

for d in */; do
  if [[ -z $(find "$d" -maxdepth 1 -name "*.pdf" -type f) ]]; then 
    echo "Will remove $d"
  fi
done

I changed it from **/ to */ because for your use case, I believe you do not want globstar and **/ - these will make it loop through subdirectories, for example:

> for d in **/; do echo $d; done
folder01/
folder02/
folder03/
folder03/folder03.1/

In the test case this doesn't seem to change the final result but if you're only interested in a .pdf in first level subdirectories, you don't need it looping over any subdirectories.

If you wanted to delete directories that had no pdf's at any level, you could change the if statement to:

if ! [ -f $d/**/*.pdf ]; then

EDIT: or remove the -maxdepth 1 from the find command.

0

Let us say the specific file type is *.pdf >

  1. Put the directories containing *.pdf that you don't want to delete into a file remove

    find -name *.pdf -exec dirname {} ';' > temp && sed 's/\.\///g' temp| sed 's![^/]$!&/!'> remove.txt

  2. Put all the directories in current path to a file current.txt

    ls -d */>current.txt

  3. Compare current.txt and remove.txt and remove files from current.txt that are not in remove.txt

    comm -23 <(sort current.txt) <(sort remove.txt)|sed 's/^/"/g' | sed 's/$/"/g' | xargs rm -r

Note:you can append && rm current.txt remove.txt or if you only want those directories with *.pdf and delete all files and directories in current path than use ls> current.txt in second step instead. This will remove all "files" already existed and created during process