I would like to benefit from all the potential of parallel command on macOS (it seems there exists 2 versions, GNU and Ole Tange's version but I am not sure).
With the following command:
parallel -j8 find {} ::: *
I will have a big performance if I am located in a directory containing 8 subdirectories. But if all these subdirectories have a small content except for only one, I will have only one thread which will work on the unique "big" directory.
Is there a way to follow the parallelization for this "big directory"? I mean, can the unique thread remaining be helped by other threads (the previous that worked on small subdirectories)?
The ideal case would be that parallel command "switch automatically" when all small sub has been found by
findcommand in the command line above. Maybe I ask too much?Another potential optimization if it exists: considering a common tree directory structure: Is there a way, similar to for example the command
make -j8, to assign each current thread to a sub-(sub-(sub- ....)))) directory and once the current directory has been explored (don't forget, I would like mostly to use this optimization withfindLinux command), another thread explore another directory sub-(sub-(sub- ....)))) directory?Of course, the number of total threads running is not greater than the number specified with
parallelcommand (parallel -j8in my example above): we can say that if a number of tree elements (1 node=1 directory) are greater than a number of threads, we cannot be over this number.I know that parallelize in a recursive context is tricky but maybe I can gain a significant factor when I want to find a file into a big tree structure?
That's why I take the example of command
make -j8: I don't know how it is coded but that makes me think that we could do the same with the coupleparallel/findcommand line at the beginning of my post.
Finally, I would like to get your advice about these 2 questions and more generally what is possible and what is not possible currently for these suggestions of optimization in order to find more quickly a file with classical find command.
UPDATE 1: As @OleTange said, I don't know the directory structure a priori of what I want gupdatedb to index. So, it is difficult to know the maxdepth in advance. Your solution is interesting but the first execution of find is not multithreaded, you don't use parallel command. I am a little surprised that a multithread version of gupdatedb does not exist : on paper, it is faisible but once we want to code it in the script GNU gupdatedb of MacOS 10.15, it is more difficult.
If someone could have other suggestions, I would take them !