 The Answer Gang
	The Answer Gang
	 
 Linux 'read'
Linux 'read'From Curtis J Blank to tag on Mon, 21 Aug 2000
Answered by: Jim Dennis, Dan Wilder
I've run into a problem where Linux's 'read' is not reading input from stdin when it is piped to. Here's a quick example:
[Jim Dennis] Of course it is. Try:
ps wax | while read pid tty x x cmd args ; do
echo $pid $cmd $args
done
(Note that the whole while loop is done within the subshell, so the values are available to use between the do and the done tokens).
In your example using awk, naturally the awk print function is being executed from within awk's process. So the variable being read is within scope in this case.
#!/bin/ksh
#
dafunc()
{
echo 1 2 3
}
#
# MAIN
#
dafunc | read a b c
echo $a $b $c
#
Running this script produces a blank line instead of '1 2 3'.
I also tried this command line and it did not work that way either:
echo 1 2 3 | read a b c echo $a $b $c
But piping to awk works:
echo 1 2 3 | awk '{print $2}'
2
I've tried this using the 2.2.14 kernel, on both SuSE 6.4 and Red Hat 6.2. I've used this technique on Solaris UNIX and Tru64 UNIX just fine, but for some reason the Linux 'read' from stdin is not picking this up.
Any ideas why or what I'm overlooking?
[Jim Dennis] When studying shell scripting it's also useful to learn that shell and environment variables are not the same thing. A shell variable is "local" in the sense that it is not "inherited" by children of that process. When teaching shell scripting one of the first concepts I introduce to my students is the memory map of their process. I point out that the shell is constantly spawning child processes (through the fork() system call) and that it is frequently executing external programs (through the exec*() family of system calls). I then explain out a fork() simply makes a clone or copy of our process, and how the exec() overwrites MOST of the clone's memory with a new executable. I draw pictures of this, and label the part that is NOT over-written as the "environment."
The export command simply moves a shell variable and value from the "local" region of memory (that would get over-written by an exec() call) into the environment (a region of memory that is preserved through the exec() system call).
Using this set of pictures (by now I've filled the whiteboard with a couple of copies of our hypothetical processes and their memory blocks) it becomes obvious why changing the value of an environment variable in a child process doesn't affect any copies of that variable in OTHER processes. Just to drive that point home I then write the following reminder in big letters:
The environment is NOT a shared memory mechanism!
(Then I might also explain a little bit about SysV shared memory --- generally pointing out that the shell doesn't provide features for accessing these IPCs).
Incidentally if you really want to do something similar your examples but using bash try this sort of command:
read a b c < <( echo 1 2 3 ) echo $b
In this case we are using "process substitution" (and perfectly normal redirection). Since our read command is happening in the current process (and the echo 1 2 3 command is in a sub-process) the variable's value is accessible to us.
I think process substitution is a feature that's unique to bash. Basically it uses /proc/fd/ (or /dev/fd/*) entries, or temporary named pipes (depending on the features supported by your version of UNIX) to provide a file name that's associated with the output of the sub-process. If you do a command like:
echo <( echo )
... you should get a response like: /dev/fd/63 (On a Linux system using glibc).
I suspect that process substitution could be used in just about any case where you would have used the Korn shell semantics.
Nonetheless I would like to see the next version of bash support the Korn shell (and zsh) semantics (putting the subshell on the "left" of the pipe operator). I'd also like to see them add associative arrays (where the index to our shell variable arrays can be an arbitrary string rather than a scalar/integer) and co-processes (where we can start a process with a |& operator, which puts in in the background, and we can use a series of echo or printf -p and read -p commands to write commands to that process and read responses back. Co-processes are handy for shell scripts which need to access something like the bc command, feed it operatings, reading back results and doing other work with those result; possibly in a loop.
[Dan Wilder] I think you'll find this a ksh (or pdksh) problem, not a Linux problem.
To quote the pdksh man page:
BUGS [ ... ] BTW, the most frequently reported bug is echo hi | read a; echo $a # Does not print hi I'm aware of this and there is no need to report it.
[Jim Dennis] Actually it's just a consequence of the way that pdksh, bash, and older versions of ksh (Korn's original and '88 versions) handle the pipe operator (|).
Whenever you see a pipe in a command line you should understand that a subprocess has implicitly been created. That must exist in order for there to be an un-named pipe. Remember that the pipe in an "interprocess communication mechanism" (IPC). Therefore we have to have multiple processes between/among which to communicate.
In most shells (including Bourne, older Korn, bash, and pdksh) the subprocess was created to handle the commands on the right of the pipe operator. Thus our 'read' command (in the examples below) is happening in a subshell. Naturally that shell exits after completing its commands; and the variables it has set are lost. Naturally the subshell can only affect its own copies for any shell and environment variables.
With newer versions of ksh and zsh we find that the subshell is usually created on the left of the pipe. This allows us to use commands like "echo foo bar bang | read a b c ; echo $a $b $c" with that effect that most people would expect. Note that the follow will work under bash, pdksh, etc: "echo foo bar bang | ( read a b c ; echo $a $b $c )" (We have to do everything with our variables within the subshell).
All of this is really quite obvious once you realize that a | operator is necessarily creating a subprocess.
[Dan Wilder] Try #!/bin/sh.
|   ![[ Answer Guy Current Index ]](../../gx/dennis/answertoc.jpg) |  1  
  2  
  3  
  4  
  5  
  6  
  7 | ![[ Index of Past Answers ]](../../gx/dennis/answerpast.jpg)   |