Project 2: Implement a Simple File System

(c) 1996, Howard E. Motteler

Assigned Tue Nov 12; Due Thu Dec 5; 100 pts


Project Goals

The purpose of this project is to learn about file systems and to gain hands-on experience with specific details of file system design and implementation.

The Project

You are to design and implement a simple file system. You will write a set of routines to create, open, close, read, write, list and delete files stored on a simulated hard disk, supplied by the instructor.

Specifications

You are to implement the following eight procedures:

  1. int open_fs(char *fname)
    Open an existing file for reading. If the open is successful, open_fs() returns a file descriptor or channel value that is used by getc_fs(), and close_fs(). If file fname does not exist, then open_fs() returns -1 to indicate an error.

  2. int create_fs(char *fname)
    Creates the file fname with initial size 0 and returns a channel, opened for writing. If the file fname already exists, or if there is no more space, then create_fs() returns -1.

  3. int getc_fs(int chan)
    Returns the next character from the file opened for reading on channel chan and advances the read pointer one byte, so that successive calls return successive characters from the file. Returns -1 at end of file or if channel chan is not opened for reading.

  4. int putc_fs(int c, int chan)
    Writes character c on channel chan, and advances the write pointer. Returns c if the write is successful, and -1 if chan was not opened for writing or if there is no more space.

  5. int close_fs(int chan)
    Closes the file opened on channel chan. Causes any buffered, unwritten values to be written to the disk. Returns 0 for a successful close and -1 if the buffered values can't be written.

  6. int del_fs(char *fname)
    Deletes the file fname; returns 0 if the delete was successful and -1 if the file does not exist.

  7. void dir_fs(int *dsize, struct dir_info *dinfo)
    Returns the number of entries in the current directory and an array records containing file names and sizes. The structure dir_info is defined in the include file fs.h.

  8. void format_fs()
    Initialize or reset the file system. Any stored data is lost.

All the procedures except dir_fs() and format_fs() report errors by returning -1. You can use printed messages for your own debugging, but the final version of the routines that you submit should not do any printing.

These eight functions and procedures are the interface to your file system and should be named exactly as above, so your code can be linked with test procedures in fstest. The file "fs.h" contains prototypes for the procedures and the definition of the "dir_info" structure. It also contains parameters for your file system such as the longest file name (FNLEN) and the maximum number of open channels (NCHAN).

Each process should be able to have up to NCHAN files open at a time. A process should be able to open a single file for reading more than once, and to read at different positions in a file, for each open channel. Two or more processes should be able to read the same file at the same time. The result of multiple writers and/or simultaneous readers and writers is not specified.

While it is OK to have a sector-sized buffer for each open file, this is not an exercise in caching, and aside from your directory structures (and FAT table, i-nodes, etc.) you should not buffer more than one sector at a time. Do not use anything but diskop() for storage; in particular, do not use the Unix open(), read(), write(), etc., to implement your routines.

The diskop() procedure

The provided diskop() procedure emulates a low-level ROM or BIOS call to a hard disk. Diskop() is called as follows:
    diskop(sec, buf, fun);

    sec   is an integer sector address, from 0 to TOTSECT-1
    buf   is a char pointer to an array of SECSIZ characters
    fun   is an integer: either READ or WRITE
The file "disk.h" defines values for SECSIZE, NSECT, NTRACK, NSURF, that determine the size of the disk, as well as values for READ and WRITE, and TOTSECT. You should use these names, rather than the corresponding values, in your project. Although the disk parameters are subject to change, you can assume (1) that the simulated disk will never have more that 256 sectors, so that a pointer to a sector will always fit in one byte, (2) that SECSIZ will not change, and (3) that the total number of sectors will not vary greatly.

Diskop() simulates a real disk with a file "SIMDISK" that is created in your home directory. Each call to diskop() opens this file, does the read or write, and then closes the file. If a file called SIMDISK does not exist in your home directory, diskop() creates it. See the file "diskop.c" for more details.

You should not edit the SIMDISK file, or modify it in any way, except via the diskop() procedure. If you change the length of SIMDISK, the diskop() routine may give you "seek errors." You can examine the SIMDISK directly (for example, with hexdump) but in general the best approach to debugging is to write procedures to print your data structures in an orderly fashion.

Discussion and Suggestions

The first step in doing this project is to read the documentation carefully. The next step is to choose a design; FAT tables, i-nodes, CP/M-style systems, and various other schemes can all be made to work effectively. A demonstration solution was implemented using "file index blocks," a sort of large floating i-node that included file name, attributes, and pointers to either data blocks or other index blocks. While this particular design is not necessarily recommended, it does suggest the variety of approaches that are possible.

The diskop() procedure reads and writes data SECSIZ bytes at a time. To read and write data structures such as i-nodes, FAT tables, and directory records, you must map them onto one or more sector-sized chunks. You can do this formally in C with a union, or you can simply use casts; for example you could cast an array of structures as a char*, and use this pointer (possibly with one or more sector-sized offsets) as the buffer address for diskop(). The procedure format_fs() creates your file system, and should initialize the SIMDISK with your directory data structures, free list, and FAT table or whatever other data structures you are using.

In addition to the eight file system interface procedures defined above, you will need some procedures of your own, for example to maintain a list of free blocks or to search your open file table. You will also need data declarations for things such as file descriptors and buffers. To aid in debugging your file system, you should write procedures to print out your open file table, directory records, and any other data structures you are using.

You will need an "open file table" of some kind, to keep track of open files. The "channel" value returned by open_fs() and create_fs() is an index into this table. The open file table belongs to a particular process, and can be an array of structures, defined in the file containing your other procedures. A typical open file table might look something like this:

/*  open file table structure
 */
struct {
	char name[FNLEN];    /* file name */
	int size;	     /* file size */
	int rwpos;	     /* read or write position in file */
	int ptr;	     /* 1-st block, i-node, whatever */
	int mode;	     /* indicates read or write */
	char rwbuf[SECSIZ];  /* buffer for this channel */

	} open_file_table[NCHAN];
Some of this open file table information may duplicated in your directory structure; you must decide what information goes where.

A good file system design shouldn't have too much space taken up by directories, i-nodes, index blocks, or whatever data-structures you are using. In a small file system, at least 80% of the disk space should be available for user data. Ideally, if you had 100 sectors, it should be possible to have either a large file of 80 sectors or 80 small 1-sector files. The latter may be difficult. For example, suppose we have 100 64-byte sectors and are using a DOS/FAT type file system. The FAT table would take up two sectors, leaving only 18*64/80 = 14 bytes per directory record, which is probably not enough. A few more sectors would have to be reserved for directory records, and we would not be able to have 80 small files.

A useful trick if the sector size is too small for a particular file system design is to work with "blocks" that are some fixed multiple of the sector size, and to write simple procedures to read and write to the disk in terms of blocks rather than sectors.

There is a tradeoff between head motion and storage efficiency. For example, allocating files contiguously would minimize head motion, but would also waste a lot of disk space. Both storage efficiency and head motion will be taken into account in evaluating your project.

Real Unix File File System Calls

In Unix, open() returns a "file descriptor" which is used by read() and write(), while fopen() returns a "file pointer," a value of type FILE *, which is used by getc() and putc(). The Unix fopen() is more "user friendly" than open(). For this project we have only "file descriptors," which are used both by getc_fs() and putc_fs(). You may want to read the sections in man(2) on the unix routines open(), creat(), read(), and write(), and the sections in man(3) about fopen(), getc() and putc(), to see how the real unix i/o works.

Compiling and Testing Your Program

Do not include a "main()" in the code you submit; that is what fstest is for. A makefile is provided that will compile a file "fs.c" containing your procedures and data structures. The makefile will link your routines with the program "fstest." Fstest allows your file system to interact with unix files, and can perform a number of more through tests as well. Fstest also prints out stats for head moves and diskop() calls. See the documentation at the beginning of the file "fstest.c" for more information.

Grading

Make sure you have read the general information on programming projects. Approximately 20% of your grade is for documentation, another 20% percent is for design, and the remainder is based on how well your project works. You should describe your file system's design and implementation at the beginning of your project; this initial description is worth 15% of the project grade. Describe your data structures, and briefly say how each relevant routine acts on those data structures. Do not ramble incoherently, and do not simply echo the specifications in this document. The description should be about a page long, and should not be more than two pages long; unless you have something very interesting to say, longer descriptions will loose points.

You project will be tested with fstest; the more tests your file system passes, the more points it will get. Crashing, losing data, etc., will cost points; for example, crashing when your file system is full will cost about 10 points. Frenzied head movement and excessive calls to diskop() will be penalized. In particular, you should not have to call diskop() for every call of getc_fs() or putc_fs(). Filesystems that do not work at all, or that hold only one file at a time, or that call diskop() for every character, will get very few points; projects that do not compile get 0 points.

You program should work with the supplied versions of diskop.c and fstest.c, disk.h and fs.h; when I compile and test you program for grading, I will use the original versions, with possible minor changes to disk.h as discussed above. The supplied makefile will compile your file systems procedures and link them to fstest. If you want to modify the makefile, say to link other libraries, check with me first, and submit your makefile in addition to your other files.

You must do the project described here. Doing some other similar or dissimilar project, no matter how difficult or clever, may be worth 0 points. This is not a group project; please do your own work, and be careful about sharing your code. It is OK to discuss design issues, and/or to use some real filesystem, such as Unix, CP/M, or MS/DOS, as a source of ideas, but in your documentation you should give credit to your sources.

Extra points for hierarchical directories

Hierarchical directories (in contrast to the default "flat" directory structure implicit in the preceding specifications) are worth 10 extra points. Procedures mkdir_fs() and rmdir_fs() create and delete directories, and cd_fs() changes the current working directory; these are outlined in "fs_skel.c". For hierarchical directories, the procedures open_fs(), create_fs(), del_fs(), and dir_fs() should all work with respect to the current working directory.

What to turn in

Use the UCS submit program to turn in a single file "fs.c" that contains your file system implementation, including your data declarations and related procedures. The command to submit your project is: submit cs421_0101 proj2 fs.c

Getting Started

A tar file is provided, containing files for this project. You should read this document, the "man page"-style comments at the start of diskop.c and fstest.c, and the header files fs.h and diskop.h. The file fs_skel.c contains an outline of the project and a number of hints; you may want to rename it fs.c and use it as a starting point for your own coding.

Files

The files in the tar file are:
    diskop.c       code for the diskop() routine
    disk.h         simulated disk parameters
    fs.h           file system parameters, structures, and prototypes       
    fs_skel.c      skeletal fs.c with a few hints
    fstest.c       program to test your file system
    Makefile       makefile to compile everything

Hints and Tips