Project 2: Implement a Simple File System
(c) 1996, Howard E. Motteler
Assigned Tue Nov 12; Due Thu Dec 5; 100 pts
Project Goals
The purpose of this project is to learn about file systems and to
gain hands-on experience with specific details of file system
design and implementation.
The Project
You are to design and implement a simple file system. You will
write a set of routines to create, open, close, read, write, list
and delete files stored on a simulated hard disk, supplied by the
instructor.
Specifications
You are to implement the following eight procedures:
-
int open_fs(char *fname)
Open an existing file for reading. If the open is successful,
open_fs() returns a file descriptor or channel value that is used by
getc_fs(), and close_fs(). If file fname does not exist, then
open_fs() returns -1 to indicate an error.
-
int create_fs(char *fname)
Creates the file fname with initial size 0 and returns a channel,
opened for writing. If the file fname already exists, or if there
is no more space, then create_fs() returns -1.
-
int getc_fs(int chan)
Returns the next character from the file opened for reading on
channel chan and advances the read pointer one byte, so that
successive calls return successive characters from the file.
Returns -1 at end of file or if channel chan is not opened for
reading.
-
int putc_fs(int c, int chan)
Writes character c on channel chan, and advances the write pointer.
Returns c if the write is successful, and -1 if chan was not opened
for writing or if there is no more space.
-
int close_fs(int chan)
Closes the file opened on channel chan. Causes any buffered,
unwritten values to be written to the disk. Returns 0 for a
successful close and -1 if the buffered values can't be
written.
-
int del_fs(char *fname)
Deletes the file fname; returns 0 if the delete was successful
and -1 if the file does not exist.
-
void dir_fs(int *dsize, struct dir_info *dinfo)
Returns the number of entries in the current directory and an array
records containing file names and sizes. The structure dir_info is
defined in the include file fs.h.
-
void format_fs()
Initialize or reset the file system. Any stored data is lost.
All the procedures except dir_fs() and format_fs() report errors by
returning -1. You can use printed messages for your own debugging,
but the final version of the routines that you submit should not do
any printing.
These eight functions and procedures are the interface to your file
system and should be named exactly as above, so your code can be
linked with test procedures in fstest. The file "fs.h" contains
prototypes for the procedures and the definition of the "dir_info"
structure. It also contains parameters for your file system such as
the longest file name (FNLEN) and the maximum number of open
channels (NCHAN).
Each process should be able to have up to NCHAN files open at a
time. A process should be able to open a single file for reading
more than once, and to read at different positions in a file, for
each open channel. Two or more processes should be able to read the
same file at the same time. The result of multiple writers and/or
simultaneous readers and writers is not specified.
While it is OK to have a sector-sized buffer for each open file,
this is not an exercise in caching, and aside from your
directory structures (and FAT table, i-nodes, etc.) you should not
buffer more than one sector at a time. Do not use anything but
diskop() for storage; in particular, do not use the Unix open(),
read(), write(), etc., to implement your routines.
The diskop() procedure
The provided diskop() procedure emulates a low-level ROM or BIOS
call to a hard disk. Diskop() is called as follows:
diskop(sec, buf, fun);
sec is an integer sector address, from 0 to TOTSECT-1
buf is a char pointer to an array of SECSIZ characters
fun is an integer: either READ or WRITE
The file "disk.h" defines values for SECSIZE, NSECT, NTRACK, NSURF,
that determine the size of the disk, as well as values for READ and
WRITE, and TOTSECT. You should use these names, rather than the
corresponding values, in your project. Although the disk parameters
are subject to change, you can assume (1) that the simulated disk
will never have more that 256 sectors, so that a pointer to a sector
will always fit in one byte, (2) that SECSIZ will not change, and
(3) that the total number of sectors will not vary greatly.
Diskop() simulates a real disk with a file "SIMDISK" that is created
in your home directory. Each call to diskop() opens this file, does
the read or write, and then closes the file. If a file called
SIMDISK does not exist in your home directory, diskop() creates it.
See the file "diskop.c" for more details.
You should not edit the SIMDISK file, or modify it in any way,
except via the diskop() procedure. If you change the length of
SIMDISK, the diskop() routine may give you "seek errors." You can
examine the SIMDISK directly (for example, with hexdump) but in
general the best approach to debugging is to write procedures to
print your data structures in an orderly fashion.
Discussion and Suggestions
The first step in doing this project is to read the documentation
carefully. The next step is to choose a design; FAT tables,
i-nodes, CP/M-style systems, and various other schemes can all be
made to work effectively. A demonstration solution was implemented
using "file index blocks," a sort of large floating i-node that
included file name, attributes, and pointers to either data blocks
or other index blocks. While this particular design is not
necessarily recommended, it does suggest the variety of approaches
that are possible.
The diskop() procedure reads and writes data SECSIZ bytes at a time.
To read and write data structures such as i-nodes, FAT tables, and
directory records, you must map them onto one or more sector-sized
chunks. You can do this formally in C with a union, or you can
simply use casts; for example you could cast an array of structures
as a char*, and use this pointer (possibly with one or more
sector-sized offsets) as the buffer address for diskop(). The
procedure format_fs() creates your file system, and should
initialize the SIMDISK with your directory data structures, free
list, and FAT table or whatever other data structures you are using.
In addition to the eight file system interface procedures defined
above, you will need some procedures of your own, for example to
maintain a list of free blocks or to search your open file table.
You will also need data declarations for things such as file
descriptors and buffers. To aid in debugging your file system, you
should write procedures to print out your open file table, directory
records, and any other data structures you are using.
You will need an "open file table" of some kind, to keep track of
open files. The "channel" value returned by open_fs() and
create_fs() is an index into this table. The open file table
belongs to a particular process, and can be an array of structures,
defined in the file containing your other procedures. A typical
open file table might look something like this:
/* open file table structure
*/
struct {
char name[FNLEN]; /* file name */
int size; /* file size */
int rwpos; /* read or write position in file */
int ptr; /* 1-st block, i-node, whatever */
int mode; /* indicates read or write */
char rwbuf[SECSIZ]; /* buffer for this channel */
} open_file_table[NCHAN];
Some of this open file table information may duplicated in your
directory structure; you must decide what information goes
where.
A good file system design shouldn't have too much space taken up by
directories, i-nodes, index blocks, or whatever data-structures you
are using. In a small file system, at least 80% of the disk space
should be available for user data. Ideally, if you had 100 sectors,
it should be possible to have either a large file of 80 sectors or
80 small 1-sector files. The latter may be difficult. For example,
suppose we have 100 64-byte sectors and are using a DOS/FAT type
file system. The FAT table would take up two sectors, leaving only
18*64/80 = 14 bytes per directory record, which is probably not
enough. A few more sectors would have to be reserved for directory
records, and we would not be able to have 80 small files.
A useful trick if the sector size is too small for a particular file
system design is to work with "blocks" that are some fixed multiple
of the sector size, and to write simple procedures to read and write
to the disk in terms of blocks rather than sectors.
There is a tradeoff between head motion and storage efficiency. For
example, allocating files contiguously would minimize head motion,
but would also waste a lot of disk space. Both storage efficiency
and head motion will be taken into account in evaluating your
project.
Real Unix File File System Calls
In Unix, open() returns a "file descriptor" which is used by read()
and write(), while fopen() returns a "file pointer," a value of type
FILE *, which is used by getc() and putc(). The Unix fopen() is
more "user friendly" than open(). For this project we have only
"file descriptors," which are used both by getc_fs() and putc_fs().
You may want to read the sections in man(2) on the unix routines
open(), creat(), read(), and write(), and the sections in man(3)
about fopen(), getc() and putc(), to see how the real unix i/o
works.
Compiling and Testing Your Program
Do not include a "main()" in the code you submit; that is what
fstest is for. A makefile is provided that will compile a file
"fs.c" containing your procedures and data structures. The makefile
will link your routines with the program "fstest." Fstest allows
your file system to interact with unix files, and can perform a
number of more through tests as well. Fstest also prints out stats
for head moves and diskop() calls. See the documentation at the
beginning of the file "fstest.c" for more information.
Grading
Make sure you have read the general
information on programming projects. Approximately 20% of your
grade is for documentation, another 20% percent is for design, and
the remainder is based on how well your project works. You should
describe your file system's design and implementation at the
beginning of your project; this initial description is worth 15% of
the project grade. Describe your data structures, and briefly say
how each relevant routine acts on those data structures. Do not
ramble incoherently, and do not simply echo the specifications in
this document. The description should be about a page long, and
should not be more than two pages long; unless you have something
very interesting to say, longer descriptions will loose points.
You project will be tested with fstest; the more tests your file
system passes, the more points it will get. Crashing, losing data,
etc., will cost points; for example, crashing when your file system
is full will cost about 10 points. Frenzied head movement and
excessive calls to diskop() will be penalized. In particular, you
should not have to call diskop() for every call of getc_fs() or
putc_fs(). Filesystems that do not work at all, or that hold only
one file at a time, or that call diskop() for every character, will
get very few points; projects that do not compile get 0 points.
You program should work with the supplied versions of diskop.c and
fstest.c, disk.h and fs.h; when I compile and test you program for
grading, I will use the original versions, with possible minor
changes to disk.h as discussed above. The supplied makefile will
compile your file systems procedures and link them to fstest. If
you want to modify the makefile, say to link other libraries, check
with me first, and submit your makefile in addition to your other
files.
You must do the project described here. Doing some other similar or
dissimilar project, no matter how difficult or clever, may be worth
0 points. This is not a group project; please do your own work, and
be careful about sharing your code. It is OK to discuss design
issues, and/or to use some real filesystem, such as Unix, CP/M, or
MS/DOS, as a source of ideas, but in your documentation you should
give credit to your sources.
Extra points for hierarchical directories
Hierarchical directories (in contrast to the default "flat"
directory structure implicit in the preceding specifications) are
worth 10 extra points. Procedures mkdir_fs() and rmdir_fs() create
and delete directories, and cd_fs() changes the current working
directory; these are outlined in "fs_skel.c". For hierarchical
directories, the procedures open_fs(), create_fs(), del_fs(), and
dir_fs() should all work with respect to the current working
directory.
What to turn in
Use the UCS submit program to turn in
a single file "fs.c" that contains your file system implementation,
including your data declarations and related procedures. The
command to submit your project is: submit cs421_0101 proj2
fs.c
Getting Started
A tar file is provided, containing files for
this project. You should read this document, the "man page"-style
comments at the start of diskop.c and fstest.c, and the header
files fs.h and diskop.h. The file fs_skel.c contains an outline of
the project and a number of hints; you may want to rename it fs.c
and use it as a starting point for your own coding.
Files
The files in the tar file are:
diskop.c code for the diskop() routine
disk.h simulated disk parameters
fs.h file system parameters, structures, and prototypes
fs_skel.c skeletal fs.c with a few hints
fstest.c program to test your file system
Makefile makefile to compile everything
Hints and Tips
- A perfect project (100 points) that has working hierarchical
directories (+10 points) and is turned in two or more days early (10%
bonus) is worth 121 points.
- For the current parameters in disk.h, I will test your project
with up to 60 "small" files, of less than 128 bytes each; with NCHAN
"medium" files, no more than 2800 bytes each; and with one "large"
file of no more than 14000 bytes. If the particular data structures
you have chosen don't leave you with enough space to successfully
complete all these tests, you should mention this in your comments
and suggest values that will work; it is not too serious if you
can't quite reach all the limits.
- Do not modify the supplied header files. If you absolutely
must use your own header file, give it a new name, e.g., "myfs.h",
and don't forget to submit it along with your fs.c file.
- The structure "dir_info" is probably not what you will want to
use for storing your directories on disk; dir_info is only intended
to give the directory routine "dir_fs()" something specific to
return.
- If you are implementing hierarchical directories, your cd_fs()
procedure should recognize ".." as the parent directory, but you
don't have to parse things like "../.." You will need some way of
finding the parent directory; you might have an explicit directory
record for ".." or you can simply "hard-code" this feature in your
cd_fs() procedure.
- Since all the reads and writes are sequential, you only need to
buffer to buffer one block of file data at a time. You should also
keep copies of things like directory records, i-nodes, or FAT tables
in memory, but be careful to write things back to disk in a reliable
manner. If your filesystem works correctly in the fstest
"interactive" mode but not in the "command line" mode, it probably
means your are failing to write or read some file system
data-structures (such as your directory records) to or from the
SIMDISK.
- A sample run of the "test1" script is
available. Please note that you do not want to try and
match the SIMDISK stats there exactly; the values for diskop() calls
and track distance can vary quite a lot, depending on the data
structures you are using. If your stats are more than about 1.5
times the sample values in the "grind" tests, you should probably
consider a more efficient approach. It is also possible to get
considerably lower diskop() stats than in the sample run,
as the demo program doesn't try to cache i-nodes or directory
structures, but this is not required.
- You can use the fstest "stat" command and stat_fs() prototype
to implement dump or test routines, as an aid in debugging.
- If something isn't in the spec, you don't have to do it! You
don't need: random access; file date, owner, or protection codes;
there is no "open for write" aside from the "create" command; child
processes do not have to share file position pointers with
the parent process, as in Unix; and so on.