5

Assuming there are two files in Linux. FileA and FileB both with some different list of fruits. I apply the below commands.

diff fileA fileB > file.diff

Next, I execute the below command

patch fileA 0< file.diff

Above command patches (corrects the mistakes) the original file (fileA) from the input given by file.diff and sends output to fileA (this is what I understand, I may be wrong). In other words, fileA and fileB match.

"0<" is known as a redirection symbol for standard input (as far as I understand). Now, since the standard input is a keyboard, shouldn't patch command read from the keyboard and not from file.diff? How does the above command work?

pravi
  • 129

4 Answers4

10

This answer was tested using a Bash shell.

The commands below are the same. The 1 is the default and therefore can be omitted. Basically, the standard output from diff is redirected from the screen to the file file.diff.

diff fileA fileB > file.diff
diff fileA fileB 1> file.diff

The commands below are the same. The 0 is the default and therefore can be omitted. Basically, the standard input to patch is redirected from the keyboard to be from the file file.diff instead.

patch fileA 0< file.diff
patch fileA < file.diff

I will try to explain by the following. When I enter the command tty, I get the following output.

/dev/pts/0

This means the Terminal window has been assign the file name /dev/pts/0. Both standard input and standard output are assigned the filenames /dev/fd/0 and /dev/fd/1, repectively.

The command below tests if standard input (/dev/fd/0) and the Terminal window (/dev/pts/0) have the same device and inode values. In other words, test to see if they are the same. In this case the output is true.

if [[ /dev/fd/0 -ef /dev/pts/0 ]]; then echo "true"; else echo "false"; fi

The command below tests if standard input (/dev/fd/0) and the file file.diff have the same device and inode values. In this case the output is false.

if [[ /dev/fd/0 -ef file.diff ]]; then echo "true"; else echo "false"; fi

However, if standard input is redirected to come from the file file.diff, as shown below, then the output it true.

if [[ /dev/fd/0 -ef file.diff ]]; then echo "true"; else echo "false"; fi < file.diff

My answer up to this point has explained the behavior of redirection when using a Bash shell. This behavior should be consistent across all operation systems. I have avoided implementation details, because this can vary over different operation systems. You may be interested in some implementation details, so I present the following for Ubuntu Linux.

The output below shows standard input (/dev/fd/0) and standard output (/dev/fd/1) are symbolic links to the Terminal window (/dev/pts/0).

no redirection

Below are the results when standard input is redirected to come from the file file.diff. Now, standard input (/dev/fd/0) has changed to a symbolic link to the file file.diff.

redirection

9

"0<" is known as a redirection symbol for standard input (as far as I understand). Now, since the standard input is a keyboard, shouldn't patch command read from the keyboard and not from file.diff? How does the above command work?

No. Standard input is connected to the keyboard (or more accurately to the 'tty' device through which the OS provides keyboard input). At any time, the connection can be closed and something else opened in its place – the term "standard input" refers to the specific connection "slot" and not to where it goes. (Which is why it is called "standard input" and not "keyboard input".)

The numbers are not just shell syntax; they represent how programs themselves work with open files. Within each process, every "open file" is represented by a number (the file descriptor, or the handle as Windows calls it) and all read/write calls work on that number. By standard convention, whichever open file is assigned the file descriptor 0 is the "standard input".

If you start the program from a terminal, then the terminal's "tty" is pre-opened as FD 0 – or rather, inherited from the shell where it was already open – and is therefore the program's stdin. (Same for 1 being stdout and 2 being stderr.) Then the 'diff' program will itself need to open some files, so it will call open("fileA", ...) and fileA will be open as FD 3 and so on.

But just like the program can close any file it has opened itself (e.g. it can close fileA by doing close(3)), it can also do close(0) to close its stdin and open something else in its place; as long as the newly open file receives file descriptor 0 it is by definition "standard input".

The shell can do the same right before it starts the program. Using <file.diff or 0<file.diff means the shell will close its original stdin file descriptor with close(0) and open the file open("file.diff") as the new FD 0 which then becomes the new stdin, to be inherited by the 'diff' program. (This happens in the child process that the shell creates for starting 'diff', without affecting the main shell process.) Now, when diff calls read(0, …) it will read from the file.


Side notes:

Usually open() uses the lowest-available file descriptor, e.g. if 0 was just closed, then the next open()ed file will be 0 again. If more precise control is needed, dup2() can be used to choose a specific FD.

It is possible to redirect any file descriptor, e.g. 5>file.txt will give the program a pre-opened FD 5 corresponding to that file, but that's only useful if the program expects to receive one; otherwise it'll remain open but unused. (Some programs have options like gpg --status-fd= which can be used to pass additional FDs.)

Windows has similar concepts and its cmd.exe even has the same 2> syntax, but Windows file handles aren't actually numbered starting from 0 (they're memory pointers), so cmd.exe only implements 0/1/2 and translates them to Windows-style standard handles. (Meanwhile PowerShell is its own strange world.)


Similar to the example David Anderson gave, ls -l /proc/self/fd – on Linux only – can be used to inspect its own file descriptors, or ls -l /proc/<pid>/fd for any other process. For example:

$ ls -l /proc/self/fd 
total 0
lrwx------ 1  users 64 Feb 24 14:38 0 -> /dev/pts/4
lrwx------ 1  users 64 Feb 24 14:38 1 -> /dev/pts/4
lrwx------ 1  users 64 Feb 24 14:38 2 -> /dev/pts/4
lr-x------ 1  users 64 Feb 24 14:38 3 -> /proc/833587/fd/

Here file descriptors 0 (stdin), 1 (stdout), 2 (stderr) are currently connected to the terminal (all inherited from the shell), while 3 was opened by 'ls' itself and is of course the directory being listed. If you redirect stdin using <, you get:

$ ls -l /proc/self/fd < ~/test.c
total 0
lr-x------ 1  users 64 Feb 24 14:39 0 -> /home/grawity/test.c
lrwx------ 1  users 64 Feb 24 14:39 1 -> /dev/pts/4
lrwx------ 1  users 64 Feb 24 14:39 2 -> /dev/pts/4
lr-x------ 1  users 64 Feb 24 14:39 3 -> /proc/833693/fd/

And the same with a few useless file descriptors given to the program:

$ ls -l /proc/self/fd 5< ~/test.c 9> /dev/null
total 0
lrwx------ 1  users 64 Feb 24 14:41 0 -> /dev/pts/4
lrwx------ 1  users 64 Feb 24 14:41 1 -> /dev/pts/4
lrwx------ 1  users 64 Feb 24 14:41 2 -> /dev/pts/4
lr-x------ 1  users 64 Feb 24 14:41 3 -> /proc/833741/fd/
lr-x------ 1  users 64 Feb 24 14:41 5 -> /home/grawity/test.c
l-wx------ 1  users 64 Feb 24 14:41 9 -> /dev/null
grawity
  • 501,077
4

In linux and other unix-like OSs

What the shell does when it wants to run another program, is clone a copy of it self by a call to fork() (this may seem like a lot of work, but through the magic of copy-on-write the effort and resource cost is much reduced). Then that copy replaces itself with the program it wants to run by a call to exec() (the actual OS call is execve())

In the case of input or output redirection being needed. the forked copy know this and these files or streams are opened in-between the fork() and the exec(), thus the original shell retains its connection to the original I/O devices, but the environment into which new program is launched has been changed.

This is explained in detail in LPG

Jasen
  • 1,666
0

The term "redirection" can be a bit confusing, what is redirected is the read-call of the process, because reading file-descriptors like STDIN are pull and not push. The process pulls data from "STDIN".

Usually you have a connection which connects the STDIN to your Keyboard. Everything you type on the keyboard will be hold in a buffer and when the process reads from "STDIN" it will read from this buffer and get the keys you typed.

Process --read-> STDIN --read-> Keyboard-Buffer

When you start the process with proc 0< file.diff this will redirect the read-calls of the process to STDIN to file.diff. There is now a connection from STDIN to file.diff instead of the keyboard.

Process --read-> STDIN --read-> file.diff

The process could still read directly from the keyboard-buffer by accessing something like/dev/pts/0 (depending on the terminal), but a call to read STDIN will now read from file.diff instead.

Falco
  • 502