Project 2: Building a Simple File System

(c) 1996, Howard E. Motteler

Assigned Wed Apr 3; Due Mon Apr 29; 100 pts


Project Goals

The purpose of this project is to learn about file systems and to gain hands-on experience with specific details of file system design and implementation.

The Project

You are to design and implement a simple file system. This involves writing a set of routines to create, open, close, read, write, list and delete files that are stored on a simulated hard disk. This simulated disk is implemented in a procedure "diskop()" suppled by the instructor. A program "fstest.c" for testing your file system procedures is also provided.

Specifications

You are to implement the following eight procedures:

  1. int fs_open(char *fname)
    Open an existing file for reading. If the open is successful, fs_open() returns a file descriptor or channel value, an integer that is used by fs_putc(), fs_getc(), and fs_close(). If file fname does not exist, then fs_open() returns -1, to indicate an error.

  2. int fs_create(char *fname)
    Creates the file fname, with initial size 0, and returns a channel value, opened for writing. If the file fname already exists, or if there is no more space, then fs_create() returns -1.

  3. int fs_getc(int chan)
    Returns the next character from the file opened for reading on channel chan and advances the read pointer so that successive calls return successive characters from the file. fs_getc() returns -1 at end of file or if channel chan is not opened for reading.

  4. int fs_putc(int c, int chan)
    Writes character c on channel chan, and advances the write pointer. Returns c if the write is successful, and -1 if chan was not opened for writing or if there is no more space.

  5. int fs_close(int chan)
    Closes the file opened on channel chan. Causes any buffered, unwritten values to be written to the disk. Returns 0 for a successful close and -1 if the buffered values can't be written.

  6. int fs_del(char *fname)
    Deletes the file fname; returns 0 if the delete was successful and -1 if the file does not exist.

  7. struct dir_list* fs_dir()
    Returns a pointer to a structure holding directory information (a list of file names and sizes). This structure dir_list is defined in the include file fs.h.

  8. void fs_format()
    Initialize or reset the file system. Any stored data is lost.

All the procedures except fs_dir() and fs_format() should report errors by returning -1.

These eight functions and procedures form the interface to your file system and should be named exactly as above, so that your code can be linked with test procedures in fstest. The file "fs.h" contains prototypes for the procedures described above, and the definition of the "dir_list" structure. It also contains parameters for your file system such as the longest file name (FNLEN) and the maximum number of open channels (NCHAN).

A single process should be able to have up to NCHAN files open at a time, should be able to open a single file for reading more than once, and should be able to read at different positions in a file for each open channel. Two or more processes should be able to open the same file for reading, and should be able to read it correctly. The result of multiple writers and/or simultaneous readers and writers is not specified.

You can use printed messages for your own debugging, but the final version of the routines that you submit should not do any printing, except optionally to report error conditions (other than end-of-file from fs_getc) that cause a procedure to return an error. Please print any such message to stdout, not stderr.

NOTE: While it is OK to have sector-sized buffers for each open file, this is not an exercise in caching, and aside from your directory structures you should not buffer more than two sectors at a time. Do not use anything but diskop() for storage; in particular, do not use the Unix open(), read(), write(), etc., to implement your routines.

The diskop() procedure

Diskop() is called as follows:
    diskop(sec, buf, fun);

    sec   is an integer sector address, from 0 to TOTSECT-1
    buf   is a char pointer to an array of SECSIZ characters
    fun   is an integer: either READ or WRITE
The file "diskop.h" defines values for SECSIZE, NSECT, NTRACK, NSURF, that determine the size of the disk, as well as values for READ and WRITE, and TOTSECT. You should use these names, rather than the corresponding values, in your project. Although the disk parameters are subject to change, you can assume (1) that the simulated disk will never have more that 256 sectors, so that a pointer to a block will always fit in one byte, and (2) that SECSIZ (the number of bytes in a sector) will be a power of 2, and will be at least 64.

Diskop() simulates a real disk with a file, called SIMDISK, that is created in your home directory. Each call to diskop() opens this file, does the read or write, and then closes the file. If a file called SIMDISK does not exist in your home directory, it is first created. See the file "diskop.c" for more details.

You almost certainly should not edit the SIMDISK file, or modify it in any way, except by using the diskop() procedure. If you change the length of SIMDISK to less that the defined value, the diskop() routine will give you seek errors. It may be useful to examine SIMDISK in debugging your procedures; to help in testing, you can save versions of SIMDISK by renaming it or by moving it to a subdirectory.

Discussion & Suggestions

The first step in doing this project is to read the documentation carefully. The second step is to choose a design--FAT tables, i-nodes, CP/M-style systems, and various other schemes can all be made to work effectively. A demonstration solution was implemented using "file index blocks", a sort of large floating i-node that included file names, attributes, and pointers to either data blocks or other index blocks; this design is not particularly recommended, but does suggest the variety of approaches that are possible.

In addition to the eight file system interface procedures you will probably want to write some procedures of your own, for example to maintain a list of free blocks or to search your open file table. You will also need some data declarations for things such as file descriptors and buffers. You will need an "open file table" of some kind, to keep track of open files. The "channel" value returned by fs_open() and fs_create() is an index into this table. The open file table belongs to a particular process, and it can simply be an array of structures, defined in the file containing your other procedures. A typical open file table might look something like the following:

/*  open file table structure
 */
struct {
	char name[FNLEN];    /* file name */
	int size;	     /* file size */
	int rwpos;	     /* read or write position in file */
	int ptr;	     /* 1-st block, i-node, whatever */
	int mode;	     /* indicates read or write */
	char rwbuf[SECSIZ];  /* buffer for this channel */

	} open_file_table[NCHAN];
You will also need to choose data structures for your file blocks, directory, and free list. For whatever directory structure you choose, you will probably want a corresponding "in-memory" directory record structure that you can copy to and from the disk as a unit. Some of the open file table information may duplicated in this directory structure; you must decide what information to put where. To aid in debugging your file system, you may want to write procedures to print out your open file table, directory records, and any other data structures you are using.

The procedure fs_format() creates your file system; this procedure should initialize your directory data structures, free list (if you have one), etc. If the file SIMDISK does not exist when when fs_format() first calls diskop(), diskop() will create it.

In designing your file system, note that there is a tradeoff between head motion and storage efficiency. For example, allocating files contiguously would minimize head motion, but would also waste a lot of disk space. Both storage efficiency and head motion will be taken into account in evaluating your project.

A good file system shouldn't have too much overhead taken up by internal structures such as i-nodes, index blocks, or whatever structures you are using. In a small file system, at least 80% of the disk space should be available for user data, so if a disk has for example 100 sectors then no more than about 20 sectors should be used for the file system structures. Ideally in such circumstances you could have either a single large file of 80 sectors or 80 small 1-sector files. The latter may not be possible; for example suppose we have 64-byte sectors and are using a DOS/FAT type file system. The FAT table would need two sectors, leaving only 18*64/80 = 14 bytes per directory record, which is probably not enough.

A useful trick if the sector size is too small for a particular file system design is to work with "blocks" that are some fixed multiple of the sector size, rather than directly with sectors.

Real Unix File File System Calls

In Unix, open() returns a "file descriptor" which is used by read() and write(), while fopen() returns a "file pointer," a value of type FILE *, which is used by getc() and putc(). The Unix fopen() is more "user friendly" than open(). For this project we have only "file descriptors," which are used both by fs_getc() and fs_putc().

You may want to read the sections in man(2) on the unix routines open(), creat(), read(), and write(), and the sections in man(3) about fopen(), getc() and putc(), to see how the real unix i/o works.

Compiling and Testing Your Program

Do not include a "main()" in the code you submit; that is what fstest is for. A makefile is provided that will compile a file "fs.c" containing your procedures and data structures. The makefile will link your routines with the main program "fstest." Fstest allows your file system to interact with unix files, and can perform a number of special, more intensive "thrash" tests as well. Fstest prints out stats for head moves and diskop() calls. See the comments at the beginning of the file "fstest.c" for more information.

Documentation and Coding Style

You should describe your file system design and implementation at the beginning of your project. This description is worth 20% of the project grade. Describe your data structures, and briefly say how each relevant routine acts on those data structures. Do not ramble incoherently, and do not simply echo the specifications in this document. Try to keep the description to not more than a page or two of text; unless you have something very interesting to say, longer descriptions will loose points.

Please include your name, SSN, and user-id at the start of your file. You can use any coding style you like, as long as you are consistent with it, and as long as block structure is reflected in the level of indentation. Procedures and major sections need at least some description, and the more tricky a section of code or procedure is, the more it needs careful documentation.

Grading

Make sure you have read the general information on programming projects. Approximately 20% of your grade is for documentation, another 20% percent is for design, and the remainder is based on how well things work.

You project will be tested with fstest; the more tests your file system passes, the more points it will get. Crashing, losing data, etc. will cost points; for example, crashing when your file system is full will cost about 10 points. Frenzied head movement and excessive calls to diskop() will also be penalized. In particular, you should not have to call diskop() for every call of fs_getc() or fs_putc(). Filesystems that do not work at all, or that hold only one file at a time, or call diskop() for every character, will get very few points; projects that do not compile get 0 points.

You program should work with the supplied routines diskop.c and fstest.c, and the corresponding header files diskop.h and fs.h; when I compile and test you program for grading, I will be using the original versions. The supplied makefile will compile your file systems procedures and link them to fstest. If you want to use C++, or have some other valid reason for modifying the makefile, check with me first, and submit your makefile in addition to your other files.

You must do the project described here. Doing some other similar or dissimilar project, no matter how difficult or clever, may be worth 0 points. This is not a group project; please do your own work, and be careful about sharing your code. It is OK to discuss design issues, and/or to use some real filesystem, such as Unix, CP/M, or MS/DOS, as a source of ideas, but in your documentation you should give credit to your sources.

Extra points for hierarchical directories

Hierarchical directories (in contrast to the default "flat" directory structure implicit in the preceding specifications) are worth 10 extra points. The procedures fs_mkdir() and fs_rmdir() to create and delete directories, and fs_cd() to change the current working directory are outlined in "fs_help.c". Note that if you implement hierarchical directories, the procedures fs_open(), fs_create(), fs_del(), and fs_dir() should all work with respect to the current working directory.

What to turn in

Use the UCS submit program to turn in a single file "fs.c" that contains your file system implementation, including your data declarations and related procedures. The command to submit your project is: submit cs421_0101 proj2 fs.c

Getting Started

A tar file is provided, containing files for this project. You should read this document, the "man page"-style comments at the start of diskop.c and fstest.c, and the header files fs.h and diskop.h. The file fs_help.c contains an outline of the project and a number of hints; you may want to rename it fs.c and use it as a starting point.

Files

Relevant files in the tar file are:
    diskop.c       code for the diskop() routine
    diskop.h       simulated disk parameters
    fs.h           file system parameters, structures, and prototypes       
    fs_help.c      skeletal fs.c with a few hints
    fstest.c       program to test your file system
    Makefile       makefile to compile everything