UMBC CMSC 313, Computer Organization & Assembly Language,
Spring 2002, Section 0101
Project: Base64 Content-Transfer-Encoding
Also available in PDF.
Due: Tuesday March 12, 2002
Objective
The objectives of this programming assignment are 1) to gain some
familiarity with data manipulation at the bit level, 2) to develop further
experience using Linux system calls.
Background
Exchanging binary files by email is not quite straightforward because
many mail servers were designed to handle text, not binary data. Attempts
to send binary files through these servers can result in mangled files.
For example, some mail servers might ignore the most significant bit of
each byte, since standard ASCII encoding uses only 7 bits. Other mail
servers truncate all data beyond the 80th character of each line. In fact,
the whole concept of a line is meaningless when we work with binary files.
To complicate matters, email is often routed through several servers, so
the problem might not be with either the sender's mail server or the
receiver's mail server.
The MIME (Multipurpose Internet Mail Exchange) standard defined in Internet
RFC 1521 is a comprehensive mechanism for formatting Internet messages. For
many people, MIME is synonymous to email attachments. We are interested in
just one section of this standard the Base64 Content-Transfer-Encoding that
specifies how binary files should be converted into a text file that can be
sent intact through most mail servers. The complete specifications of the
Base64 standard are (what else) attached at the end of this project
description.
Remark: Unlike other organizations (e.g., ANSI, ISO) which publish
standards with lofty-sounding titles, the Internet Engineering Task Force's
(IETF's) standards are for historical reasons published as Request for
Comments (RFCs). Although not all RFCs are standards, the specifications of
just about every Internet protocol can be found in an RFC. For more
information on RFCs and how they are published, check out http://www.rfc-editor.org.
Assignment
Your assignment is to write an assembly language program that prompts the
user for the file names of an input file and an output file. The program
must transform the data in the input file into a text file in a manner that
complies with the Base64
Content-Transfer-Encoding. The output of the program must be stored in
the output file.
As a reference standard, we will use the mimencode command on
linux.gl.umbc.edu. Using
mimencode with the -u option, we can convert the output of your
program back to binary. If your program works correctly, the output of
mimencode -u should be identical to
the original input file.
For 15% extra credit, write an assembly language program that reverses the
process of your first program. I.e., the second program prompts the user
for an input file and an output file. If the input file is a properly
formatted text file that conforms to the Base64
standard, your program should store the corresponding binary file in
the output file.
Implementation Issues
- All of the file conversion must be done by your program. You
are, of course, not allowed to make a system call to
mimencode.
- Files can be opened for reading using a system call to the
open() function. The C function
prototype of open() is:
int open(const char *pathname, int flags);
According to the Linux system call convention, the syscall number for
open() should be stored in EAX, a pointer to a null-terminated
string with the name of the file to be opened should be stored in EBX and
the flag O_RDONLY should be stored in ECX. The return value,
stored in EAX, is a file descriptor (a 4-byte integer) that can be used in
subsequent syscalls to read(). Further information on
open() can be obtained from the Linux man pages. Type 'man 2
open'.
- Symbolic constants for syscall numbers, flags, etc can be found in a
file called stddefs.mac in the directory:
afs/umbc.edu/users/c/h/chang/pub/cs313. Copy this file into your
own directory. Then, the file can be included in your assembly language
program using the NASM directive:
%include "stddefs.mac"
- To open a file for writing, a syscall to creat() is more
appropriate. The C function prototype for creat() is:
int creat(const char *pathname, mode_t mode);
Calling creat() is very similar to calling open(). The
difference is that the file is opened for writing and the file is created
if it does not already exist. If a file with the same name already exists,
it is overwritten. As before, the return value stored in EAX is a file
descriptor. The second argument to creat() is used to set the
permissions of the newly created file (as in the chmod Unix
command). You will most likely want to allow the user to read and write to
the file, so store the expression S_IREAD|S_IWRITE in the ECX
register. S_IREAD and S_IWRITE are defined in
stddefs.mac.
- Remember to close all open files before your program terminates. This
is accomplished with a syscall to close() with the file descriptor
as the sole argument. The close() function has the following
function prototype:
int close(int fd);
- Once a file is opened, you can read from and write to it using the
read() and write() syscalls as you have done with
stdin and stdout.
- Despite what the man pages say, you can tell that you have reached the
end of a file you are reading when read() returns 0.
- Recall that read() stores the characters read at the address
provided and returns the number of characters read. The string read in is
not null-terminated. Also, if the string is read from stdin, the
last character is a '\n'. Thus, some massaging of the string is
needed before it can be used as a file name.
- You should not assume that the file has run out of bytes when
read() does not return the maximum number of bytes requested.
- It is inefficient to read 3 bytes at a time.
- The functions open(), creat() and read()
return the value -1 if an error is encountered. The cause of the error is
given as an error code in the global variable errno. If you wish
to examine these values, you must declare errno to be an external
label. Symbolic names for some of the possible values for errno
can be found in stddefs.mac. Consult the Linux man pages for the
meaning of each error. If you reference errno, then you must link
your program using 'gcc -nostartfiles' instead of ld.
- Recall that the Intel Pentium CPU is little endian. If you
move multiple bytes into a register, the bytes might not be ordered the
way you like.
- Assembly language instructions that you might find useful
include: AND, OR, SHL, SHR, XCHG.
- A common task that you will want to perform is: add a new character to
the output buffer, then write out the buffer if it is full. You will
probably want to write a subroutine to do this. Invent your own parameter
passing conventions.
- Read the Base64 specifications for handling the last few bytes of
input carefully. The output may need to be padded with 1 or 2 '='
as appropriate.
- If you want to have your output appear identical to the output from
mimencode, print out 72 characters per line.
Turning in your program
Use the UNIX 'submit' command on the GL system to turn in your
project. The class name for submit is 'cs313' and the project name
is 'project'. Sample runs and a typescript file is not needed for
this project. The grader will simply test your program using
mimencode and some binary files. Include a README file if your
submission needs any special attention.
References
- Borenstein, N. and Freed, N. MIME (Multipurpose
Internet Mail Extensions) Part One: Mechanisms for Specifying and
Describing the Format of Internet Message Bodies. RFC 1521,
September 1993. Available at
ftp://ftp.isi.edu/in-notes/
Last Modified:
22 Jul 2024 11:29:37 EDT
by
Richard Chang
to Spring 2002 CMSC 313 Section Homepage