Using version control for class submissions
There are a few of great reasons to use a version control system for student submission in a programming class. First, students typically do not get enough exposure to version control systems in school, but are likely to see heavy use of them after they graduate. Second, checking in intermediate versions makes it easier for students to work in multiple locations, and for the instructor or TA to get a copy of their recent code for questions during office hours. Third, enforcing incremental checkins provides a backup in case of accidental loss of work that can happen especially toward the end of a project. Finally, those same incremental checkins allow the instructor or TA to follow the development process during grading, which can both make it more difficult to cheat by copying from classmates, and help in determining when partial credit might be merited.
CVS on AFS
A clone of a CVS repository has just the most recent versions of each file. The version history is entirely contained in the files of the central repository. The directory structure of that central repository exactly matches the directories of the checked-out copy, making it possible to have fine-grained read and write access control to subsets of the main repository, This makes it possible to have a single class-wide repository with controlled access to per-student subdirectories.
Creating the source code repositories for each student on the university's unix file system gives more control and does not require students to create an account with an external service. UMBC's unix servers use AFS, which allows very fine-grained per-directory access control, so it is easy to give permissions to just the instructor, TA, and individual student. The biggest problem I've encountered is that AFS requires a token that is not granted through the normal passwordless public/private key SSH login setup. Accessing a repository remotely therefore requires entering a password, sometimes per version control operation! This is a major pain for accessing the university-hosted server from anywhere else. If all development is happening on the university servers though, this problem does not come up.
CVS is an old version control system without many of the features you'd now expect. Checkins are not atomic and each file has its own independent version history. The only way to identify a single checkin is by date or to have the student tag it. File moves and deletions are not tracked or versioned well.
Common problems:
- Since the repository internal directory structure mimics the cloned tree, some students will start working directly in those repository directories. If they do, their files will be invisible to version control operations.
- Some students will invariably check in a zip file of their entire project, or generated binary files. Neither of these play well with version control. Binary files (zip files or intermediate build files) bloat the repository and there's no useful per-line versioning history for grading. Also, each revision will be a whole new copy of the binary file. Generated files also introduce spurious changes when nothing has changed in the source (from embedded compile time markers, etc.).
Notes and scripts on using CVS with AFS:
GIT on AFS
A clone of a GIT repository contains the entire history. This makes it possible to work and check in changes locally, then push them to the central repository all at once. GIT does not allow any access controls within a repository so it is necessary to create a separate repository for each student. This is a bigger pain for grading, but some shell scripting can make it a bit easier. The same problems with AFS and passwords over SSH make remote access to an AFS-hosted GIT repository a bit of a pain, but using it locally on the campus unix servers is still fine.
Common problems:
- Forgetting to commit and push, or worse, when there are TA or instructor changes to the central repository, commit, pull, merge, and push
- GIT is the only version control system I know that will remove commits from the history with basic user-level commands. It is too easy for a "git reset --hard" to lose data.
- Since a clone includes the full history, especially commits with binary files can end up taking up quite a bit of unnecessary space on the students drive.
Notes and scripts on using GIT with AFS
GIT online
Services like github are widely used with lots of documentation. They can be easily accessed from anywhere. Scripting repository creation is a problem, though github's classroom lets you create a template repository that the students clone. This does make pushing updates a bit trickier though, since you have to explicitly pull them from the template and push back to the student's directory.
I've used a set of bash scripts to manage student/github use. I do it within a github organization, which ensures the instructor and TA have non-revokable admin access to the student repos. Sign up with github education to make it free to create the organization.
Notes on my scripts:
- Create a template repository within the organization and share the URL for students to fork. They will not have access until they join the organization
- Collect gitids, and invite them to the organization. They'll need to accept the invitation email.
- Get a copy of the scripts, and edit the config.sh and gitids.txt files (see the included README.txt).
- Make a local clone of the template repository. "cd" into that directory and run the "setup.sh" script. This will create a "remote" for each student named StudentGitId, and a branch named github/StudentGitID. It is OK to run this more than once as students add the class.
- Get students to consistently tag the commit they want you to grade for each assignment. They can use the github interface to do this with the "create a new release" link. I typically use a name like proj1, assn1, or hw1.
- From the local clone directory, run "getassn.sh AssignmentTag". This will create a grade/AssignmentTag/StudentGitID branch for each student. It will also report if they did not tag, and the date of the tagged commit. It is fine to run this more than once if students submit late.
- To grade, do a "git checkout grade/AssignmentTag/StudentGitID". Grade what's there. I will usually create a grading notes text file and add/commit it in each branch.
- If you do add any grading notes or commit other content, run "mergeassn.sh AssignmentTag" after you are done grading. This will merge and push your changes back to each student's github repository
- To deliver content, commit and push it to the base respository, then have the students use the github "Sync Fork" button to get it.
- The basic method for other actions is to do a loop over the gid ids. For example: 'for gid in $(<../gitscripts/gitids.txt); do git checkout github/$gid && git pull; done'
Common problems:
- Students can upload files through the web interface. This never works well
- Duplicate directories uploaded in the wrong place
- "Incremental checkins" that are just uploading the working project one file at a time without any development history
- The web interface doesn't do line end translations correctly. This can result in "differences" with every source file when checked out by the instructor or TA.
- Mistakes checking in binary files (especially build files) can bloat the local repository size to larger than the github 1 GB respoitory size limit. If this happens, the student will not be able to push their files back to github for submission.