Courant Institute New York University FAS CAS GSAS

Subversion keywords and unicode

Monday, August 4, 2008 - 12:30pm

I'm editing some text files, and I like to use subversion to do this. It helps with managing my own different revisions of the files, and can mark them automatically with metadata.

(Subversion is a version control system--you create a repository, work on a local copy, then commit changes to the repository when you're satisfied with them. Programmers use version control systems to keep track of giant code bases, so you can prepare new versions of software while at the same time release bug fixes for previous versions. But there's no reason why the files managed need to be an executable piece of software--it can be a website or a single text file. And there's no reason why you need more than one developer to have a version control system in place.)

Anyway, subversion allows you mark keywords in your file to keep track of some metadata. For instance, I like to note the date when a file was last committed (had changes saved to the repository). To do this, you use the string "$Date$" in the file. Then you declare that you want this keyword substituted. On the command-line, assuming you're in a checked-out working copy of the version-controlled directory, you type

$ svn propadd svn:keyword Date foo.txt

When you next check the status, you will find that the file's metadata has been changed:

$ svn status
 M      foo.txt

The "M" appearing in the second column rather than the first indicates a modification to metadata rather than actual file content. Commit the changes, then the next time you commit or update the file, you will see the string "$Date$" has been changed to "$Date: <actual date committed>$"

Pretty cool, except I'm having trouble getting this to work on unicode files. I don't always need unicode but I feel cutting-edge when I use it. Even after setting the property and committing changes the keyword stayed unsubstituted. I finally got it to work by writing the property value "in unicode." My workaround is extremely inelegant, the epitome of muddling through, but it does do the job.

  1. Create a file with the property value in it. In this case, the property value is just the word "Date". I'm using Aquamacs, an emacs build for Mac OS X. There is a set of keystrokes in emacs to change the text encoding for a file, but I keep forgetting it. Instead, I use a local variable and save the file. One way to do this is to declare at the end of the file:
    Local variables:
    coding: utf-8

    (Eight bits is all the unicode I need, which is lucky because that's all subversion supports.) Save the file, and a little "u" appears in the emacs status line, letting you know the change of encoding. Save the file to some descriptive name like "svn-keywords-utf8.txt"

  2. Set the property value. Subversion allows you to set property values from a file, not just on the command line.
    $ svn propset svn:keyword -F svn-keywords-utf8.txt foo.txt
  3. Commit as before.

Voilà. Now, there must be smarter ways to do this. Some occur to me as I write. One could use iconv to change the property-value file's encoding with having to invoke a text editor. But the "real" way would be to make the terminal encode in unicode. According to the preferences for my Terminal application, I am doing this, but perhaps because I'm logged in via ssh to another server, it's not carrying over. I don't know. But like I said, this works.