Unix Essentials Class
Introduction · Table of Contents · Unix vs. other operating systems · Getting Started · Fundamental UNIX Concepts · Unix Files · Directories · Introduction to the Korn Shell · The vi editor · Korn Shell Again · Hyper-Ad Home Page · Technical Tutoring Home Page · Recommended Books · Online Store
File naming conventions · Listing files · Hidden files · Viewing files · Printing files · File maintenance commands · Permissions and file security
As earlier pointed out, everything in UNIX is a file. There are four basic types of files: ordinary files, executable files, directories and device files.
In this section, we will consider some of the important aspects of files: naming conventions, basic file management, viewing files and the UNIX permission scheme for files which is the basis for security.
A. File naming conventions
Case is significant The first difference between UNIX and other common systems that takes getting used to is case significance. UNIX considers upper and lower case alphabetical symbols to be distinct. Thus, the following are all different file names to UNIX:
Unix UNIX unix UnIx uNiX UniX uNIx
At least the first 14 characters are significant Most modern UNIX systems allow file names to be up to 256 characters, convenient when the name has to be meaningful enough to allow rapid visual searches for information about the content of the file. Imagine 1,000,000 similar log files in a directory tree - you are trying to quickly find the one with the info you need. Long file names have their disadvantages, but the advantage is you can pack lots of useful info about the file in its name. This naming convention is system dependent, but you can depend on the system distinguishing at least the first 14 characters of the file name. In systems where only the first 14 characters are significant, the rest of the file name is ignored by the computer and is for the user's use only. In this case, the first 14 characters of each file need to be different so the computer can tell them apart. The more modern a system is, the more likely it is to support longer significance in the file name.
Allowed filename characters Filenames can contain any combination of:
So, for example, these are all valid filenames:
Quack hello 020301 h3ll0.c log-3214.txt
Lost+found .profile bad_progs beef,jerky you.silly.fool
Some special characters are legal but risky - spaces and tabs are particularly poor choices. In extreme cases, you can create files you won't be able to open, so sticking to the above list is strongly recommended. If you make naming conventions using special symbols, it is a good idea to put a README file in the same directory as the files to remind yourself of what you did.
Note that these conventions apply to all types of files. It's a very good idea to look at how other UNIX users approach file naming and try to stick with standard practice. That way, other people who have to use your work can tell what you were doing.
B. Listing files
We have used the "ls" command once or twice so far - its purpose is to list the contents of the current directory. This is one of the most useful commands in UNIX, so it's a good idea to get to know it well.
At its most basic level, ls simply lists the ordinary files in a directory:
We can list more details about our files by including a runtime option -l:
The output here is worth looking at in detail. The first item returned by the command is the block count - this says that the files returned take up 20 blocks of space (don't worry about the block size - this is not very useful).
Next the files are listed one to a line. The first column gives the file type and permissions. We know these are ordinary files because the first character is a dash "-". We'll go into the details of the permissions below.
The next column shows a 1 - this is the number of links to that file. In this case there is only 1 link - the filename itself. Next in line is the owner of the file (student), the group membership of the file (student again), the number of bytes, date and time of last modification, and finally the file name.
With either of the two listing schemas above, files are listed in lexical order:
The first character is considered using the above ordering, then the second, and so on. There is an exception to the above rule about special characters in the case where the first character is a period (or "dot").
C. Hidden files
We can see that our directory /home/student contains exactly four files, right? Wrong! Some of the files aren't showing up. We can see all of the files in the directory by using the runtime option -a:
The files that don't show up all have something in common - they all begin with a dot. This convention tells ls that these are hidden files and are not to be displayed under normal circumstances. Most of these files are not to be played with, or should be treated with caution, and so are hidden from view, unless one deliberately tells the operating system to show them. In addition to hidden configuration files such as .bash_profile, there are two special files, ./ and ../, which represent the current directory and the parent directory respectively. These are the minimum files in any UNIX directory and are required so the user can navigate the filesystem.
D. Viewing files
We've already seen that we can take a quick peek at the contents of small files with cat:
Larger files require one of the pager programs, more or less, because using cat to view them causes some of the content to be missed - cat has no paging function. It is worth examining these two pagers in a little more detail.
Of the two pagers, more is the standard UNIX file pager, included in every distribution. It pages through the file one screen full at a time (using the space bar to page through the file), allows scrolling one line at a time using the enter key. More will search for text it has not displayed yet by typing a slash and then the text one is searching for:
Notice where I entered "/yp.conf" - this was written over the "--More--(52%)" that was there to mark the end of the first page of text. In response more skipped to the end of the file and displayed the text from the search. When more reaches the end of the file, it exits automatically and a command prompt returns. Typing a "q" anytime before more has reached the end of the file exits more immediately and returns a command prompt.
More has two big disadvantages. Most versions of more won't go backwards - you can only page forward through the file. Additionally, it doesn't have a lot of functionality (compared to less).
Less, on the other hand, has a great deal of sophisticated functionality. It will search backwards for text, it will move backwards through the text file. There are a lot of other functions that make it a very useful viewing program. The biggest disadvantage of less is that it is not available on most flavors of UNIX. Less is a GNU program, and so comes on all Linux distributions. The source code is readily available, and so less can be built from source and installed for free when this is possible.
E. Printing files
Printing files in UNIX can be a real pain if you are the one who has to configure the printer. For text-only files, there are simple programs like "lp"
(line printer) or "lpr", for more complicated files one might have to use postscript formatting programs like "enscript". How the printer works is not only machine-dependent, but also network-dependent. When you go to work in a UNIX shop, one of the first dumb questions you will have to ask is: How do I print? Write down what you are told!
F. File maintenance commands
In Windows, most file maintenance is done with either My Computer (a MacIntosh - like interface) Explorer (similar to the old File Manager in Windows 3.x and file managers in other windows systems) or possibly "by hand" in DOS or CMD. These interfaces allow one to copy, rename, move or delete files - the basics of file maintenance.
We have already seen how to list the files in a directory using "ls" and display information about them, so now we need to learn how to do these basic maintenance operations the UNIX way.
Copying a file The file copying program in UNIX is called "cp". Take a minute and examine the man page for cp. Copying a file with cp is pretty straightforward:
cp syntax: cp old_file_name new_file_name
so for example:
What we just did here was to make a copy of file1 called newfile. Then we looked at newfile with cat to show it had the same content, and ran ls to show the new file was in our home directory.
You can also use cp to make a copy of a file in a different location. Suppose we make a new directory "temp" inside our home directory and want to put an exact copy of file1 inside the new directory. The syntax is similar. First, make the new directory:
Now just copy the file into temp and verify that it is there and has the expected content:
There are several new wrinkles here - we'll get into the details of directories in the next section. The point is, if the second argument to cp is a directory name, cp knows to make a file of the same name inside the directory rather than trying to make a copy named temp in the same directory. Temp is NOT overwritten. Had there been no directory named temp and we tried the same command, cp would have made a copy of file1 named temp. The distinction is important - cp is context sensitive and (for this purpose) smart.
Many commands in UNIX are context-sensitive. The exact same command responds differently depending on the environment that it finds. Be careful!
Renaming a file Often a file has to have its name changed. We do this in UNIX with a command called mv, short for move. The syntax is similar to that for cp:
mv syntax: mv old_name new_name
So, for example:
We simply renamed newfile to file4. The contents are unchanged.
Moving a file to a different location Suppose we wanted to put file4 in the temp directory. The same command we just used to rename the file will also move it.
First, we showed the files in the directory with ls. Then, we used ls temp to view the contents of the temp directory. Then, we moved the file4 with mv to temp. Finally, we used ls again to view the contents of the current directory (file4 is now missing) and temp (file4 is now there).
Note that moving a file differs from copying it - moving the file removes it from the old location and places it in the new location, while copy places a new copy in the new location, so we end up with two copies of the file.
What happens if we try to "step on" a file that is already there? Let's try to move another file to temp, but with the name file4:
This illustrates one of the dangers of UNIX - we moved a file to a new location and it destroyed a file that was already there with the same name. The program just silently did what we asked it to do, and did not ask for a confirmation.
This is typical of the behavior of UNIX programs - you have to be very deliberate and careful, because the commands (and the entire design philosophy of UNIX, for that matter) assume you know exactly what you are doing. This can occasionally get the unwary user into trouble.
The moral: Think before you act!
Deleting files If you were thinking you could get in trouble with mv, you can really screw things up with the file deletion program, "rm". The syntax is very simple:
Again, no request for confirmation - the file was just silently deleted and is now gone forever. There is no "undelete" operation possible. Be careful!!
The ultimate horror story in UNIX: as root, execute:
This will delete every file on the computer, wiping the entire thing as clean as a whistle. Don't ever try this - you will have to re-install the operating system at least and will get fired at worst. This command is pointed out so you will know to avoid it and be very, very careful.
Shortcuts can make your work more efficient, and UNIX has plenty of them.
For now, we will be interested in only two so-called wildcards - the asterisk
(*, "splat") and the question mark (?). In addition, we will show how to match any of a class of characters.
The asterisk stands for zero or more characters. Using the asterisk in a UNIX command is the same thing as substituting anything else for the splat:
The splat was matched by the files file2 and file3 in the directory; these were then duly listed. Then, the splat matched the directory temp, which was equivalent to the command ls temp:
The program was smart, recognizing that it had found a matching directory, and so let us know with the "temp:" notation that it was listing the contents of a directory instead of just listing files.
If we want to limit our matches more narrowly, we can use the question mark to match a single character:
This time, the wildcard only matched files of the form file?, and so skipped the temp directory and its contents.
Last, we show how to match character classes with square brackets.
The square brackets define a range of characters, in this case 0 through 9, that are legal matches for the single character represented by the brackets. One can also use ranges of letters of the alphabet. Suppose we wanted to see all files in the /bin directory that began with the letters d through m.
Notice that the brackets here were used to match only the first character - the rest were matched with a splat (*).
One warning is in order - the reason the brackets work is because of the way characters are represented as integers. The ASCII representation of characters has several special characters in between Z and a, so using
[A-z] to represent only alphabetical characters will not work. You'll need to use [A-Za-z].
Square brackets can also be used to list individual character choices. For example, suppose we just wanted to match everything in the /etc directory that begins with a D, a or f. We simply list them in the brackets:
These are examples of more general pattern matching capabilities included in almost all UNIX programs. Another, more advanced course will deal with these and the regular expressions which allow very sophisticated pattern matching using many UNIX utilities. This capability is one of the strong points of UNIX and accounts for a great deal of the flexibility of the operating system.
G. Permissions and file security
Now it is time to review part of the strange output of ls -l in more detail and talk about the file-level security which has been a built-in feature of UNIX almost from the beginning.
We are concerned here with the first, third and fourth columns of data.
If you will look at the first column, you'll notice that file2 and file3 have entries that start with a dash. This indicates they are ordinary files. On the other hand, temp has a "d" in this location - this means temp is a directory.
This column of data identifies both the file type (ordinary, directory, link or device) and the permissions of the file. Notice that after the first character, there are nine more characters with one of "r", "w", "x" or "-". These are the permission bits. The permission bits are either unset (with a value of "-") or set. A fully set permission would look like "rwxrwxrwx". Notice that this is really three sets of three bits - one each for read, write and execute permissions for each of three security groupings. The first set of three bits is for the owner of the file - that's the user named in the third column of ls -l.
The second set of three bits is for the group members named in the fourth column of ls -l. The third set of three bits is for everyone else - formally other(s). Please note that these permissions are relative to the owner and group - they have no absolute meaning by themselves.
We can modify the permission bits with a special command called chmod - short for 'change mode'. The permission set for a file is often called the "mode" of the file.
First, let's create a test file called test:
Look at the permissions of test:
The owner of the file is student. Student is also the name of the group. According to the permissions, user student and members of the student group have permission to read and write to this file. Others can only read the file (note - root in this case is one of the others; however, root has write permissions!).
Let's remove the write permissions from the file and see what happens.
We will need to learn a new command to make writing to this file easier. The command "echo" writes its argument to the standard output:
We can add text to the end of the file with a special redirection command. Using ">>" instead of ">" appends to the end of the file rather than overwriting the file. Let's try it:
We cannot write to test because we don't have write permission. So, add write permission back:
As you can see, we have now added a new line of text to the test file.
What happens if we try to execute this file? To do this, I need to tell the operating system where the file is located (seems stupid, but that's the way it works). The file is in the current directory, so if I type "./test", the system will try to execute the file:
We don't have execute permission. More important, there is no executable content in the file. Let's try putting something in that might execute.
We still did not get any output. Permission to execute was denied. Let's try adding execute permission to the file:
Something has changed - the file now has "x" permissions in all three slots, and the file name has a splat after it - this indicates it is an executable file. So:
Now the file executed, by printing "Hello" to the screen.
Now's a good time to look at the man page for chmod.
Here are some details of how the command works. The users of the system are divided into classes:
There are operators that are used to assign permissions:
The allowed permissions are:
Suppose we wanted only the owner of the file to have all permissions. First, we would unset all permissions:
Now we can't look at the file:
We can't write to the file:
We can't execute the file:
Let's add the permissions back:
Notice we haven't tried to write to the file yet. Since it is executable, we'll need to be careful to write only executable content. So, let's put something else there to show we can now write to it:
Only the owner of the file and root have the right to modify the permissions of the file. It is possible to make weird permissions, such as locking oneself out of your own file:
but this is a pretty silly thing to do.
There is another way to set permissions - octal (base 8) notation. This is based on the idea of using base 8 numbers to represent the various permissions:
Octal numbers are denoted by a leading "0" to differentiate them from integers. One adds the octal numbers for the various permissions and uses the sum as the first argument to chmod:
No permissions for anybody:
Read permissions for everybody:
Read and execute permissions for owner and group only:
All permissions for owner, read and execute permissions for group, execute only for others:
Octal notation is fast and is the preferred way to change mode. This is the way UNIX gurus do it, and is usually one of the several skills expected of UNIX workers. Thus it pays to practice using this notation until it has been mastered. The octal notation is used elsewhere in UNIX and so has additional applications outside of chmod.