69

What is the best and simplest way to compare two directory structures without actually comparing the data in files? This works fine:

diff -qr dir1 dir2_

But it's really slow because it's comparing files too. Is there a switch for diff or another simple cli tool to do this?

ᄂ ᄀ
  • 4,187
Jonah
  • 955

13 Answers13

44

The following (if you substitute the first directory for directory1 and the second for directory2) should do what you're looking for and swiftly:

find directory1 -type d -printf "%P\n" | sort > file1
find directory2 -type d -printf "%P\n" | sort | diff - file1

The fundamental principle is that it prints out all of the directories including subdirectory paths relative to the base directoryN directories.

This could fall down (produce wierd output) if you have carriage returns in some of the directory names but not others.

37
vimdiff <(cd dir1; find . | sort) <(cd dir2; find . | sort)

will give you a nice side-by-side display of the two directory hierarchies with any common sections folded.

garyjohn
  • 36,494
31

I usually use rsync for this task:

rsync -nav --delete DIR1/ DIR2

BE VERY CAREFUL to always use the -n, aka --dry-run, option, or it will synchronize (change the contents of) the directories.

This will compare files based on file modification times and sizes... I think that's what you really want, or at least you don't mind if it does that? I got the sense that you just want it to happen faster, not that you need it to ignore the difference between file contents. If you do want it to not list differing files with identical names, I think the addition of the --ignore-existing option will do that.

Also be aware that not putting a / at the end of DIR1 will cause it to compare the directory DIR1 with the contents of DIR2.

The output ends up being a bit verbose, but it will show you which files/directories differ. Files/directories present in DIR2 and not in DIR1 will be prefaced with the word deleting.

For some situations, @slartibartfast's answer may be more appropriate, though you'll need to remove the -type d option to enable the listing of non-directory files. rsync will be faster if you've got a significant number of files/directories to compare.

intuited
  • 3,481
24

Similar to the ls answer but if you install tree then you can

tree dir1 > out1
tree dir2 > out2
diff out1 out2
digit
  • 241
8

This worked for my specific need to find missing files in trees expected to match.

diff <( cd dir1; find * |sort ) <(cd dir2; find * | sort)
amhest
  • 81
3

I was just looking for solution for this problem. The solution that I liked the most was:

comm <(ls DIR1) <(ls DIR2)

It gives you 3 columns: 1 - files only in DIR1, 2 - files only in DIR2, 3 - files only in DIR3 For more details look at this blog post.

kyrisu
  • 1,775
2
ls > dir1.txt

ls > dir2.txt

Then just diff the two lists.

MDMarra
  • 20,746
2

This is optimum solution

diff --brief -r dir1 dir2

--brief switch reports only whether the files differ, not the details of the difference.

jkshah
  • 183
1

2020 update. Combining ideas from the above while avoiding the scary delete, I went with

rsync -a --dry-run --itemize-changes source/ destination

gerardw
  • 131
0

I have two very large directories (about 2TB each with tons of subdirectories) that I use rsync to sync them together and sometimes rsync fails to sync properly and I need to find the differences between the two.

since the directory sizes are very large diff will not be practical as it will compare the files too which will take a century.

I tried the current top answer, after 10 minutes of runtime it gave me no result (no Idea how much it would take if I didn't stop it).

Here is what I used to find the differences between the two under 5 minutes:

du  /D1/  | sort > 1.txt  &&  sed -i 's/D1/D4/g' 1.txt
du  /D2/  | sort > 2.txt
diff 1.txt 2.txt

du will list all the directories, subdirectories and files based on size (kb) and passes the output to sort which will sort them by size and writes them to 1.txt & 2.txt for D1 and D2 directories respectively.

sed -i 's/D1/D4/g' 1.txt

This command basically replaces all the D1 with D2 in 1.txt. we need to do this because we use diff to find the differences between the two text files. if we don't do this all the lines will be considered as differences.

finally diff 1.txt 2.txt will show us the differences between the two directories.

Dany
  • 1
  • 1
0

Here's a function that:

  • Compares file sizes
  • Doesn't use temporary files
  • Uses a relative comparison
  • Actually recurses into each folder
  • Doesn't have to slowly examine file contents
  • Doesn't require vim
  • Doesn't show identical lines

Copy paste this function into the terminal:

quickdiff(){ f(){ find "$1" -mindepth 1 -type f -printf '%P %s\n' | sort; }; comm -3 <(f "$1") <(f "$2"); }

Then run it easily with:

quickdiff dir1 dir2

I'm using process substitution with a repeated find command to list the files and their sizes recursively, then comm -3 to only show files in one dir or the other. It only examines files, not folders because find foldername -printf '%s\n' will produce inconsistent sizes on different filesystems (like a .zip mounted on gvfs will show size 0 folders). This means it won't show empty folders that only appear on one folder but not the other, but it will show all files that differ in size or name.

You can add %TY-%Tm-%Td %TH:%TM to the find command if you want to compare date/time to the minute.

SurpriseDog
  • 197
  • 13
0

use "diff -qr" to get the different files and then filter out the file comparison with grep in order to only get the filenames that are only in one of the directories.

diff -qr dir1 dir2 | grep -v "Files.*differ" 
-4

I think only rsync is userfull. why?

diff is useful only for structures keeping files and directories. Diff does not give adequate exit codes when we use symlinks. In that situation diff can return 2 exit codes, even if src and dst are identical (times, sizes, names, timestamps, pointing softlinks etc).

dir, the filesystem does not guarantee file ordering, even if directory contents on src and dst are identical. Maybe you should filter the ls output by sorting it. But pure ls displays only node names.

maybe script including diff, cmp, test -X for node types will be usefull, but remember about overload made by many test/cmp runs. The script will be very slow.

As usual, if you want get simple info "dirs is/isn't identical", you should use rsync with the -n (dry) option. If you want to find what is different, use the diff command.

harrymc
  • 498,455
Znik
  • 376