Textmate and UTF-8 and BOM, oh my! 04.30.2007

By default in Textmate for the Mac, it creates UTF-8 files without any BOM markers. Although this is typically recommend since it avoids having useless bytes in the beginning of the file for things like shell scripts, Homesite on Windows doesn't read a UTF-8 file in correctly unless it has a BOM. It reads it in instead as "ANSI".

(This is also true in pages where the body and any file-encoding meta tags are missing. In a Textmate-created file saved as UTF-8 with no BOM, Firefox reads it in as ISO-8859-1. With BOM, Firefox correctly reads in the file as UTF-8)

After chatting with Allan Odgaard about the issue on IRC, he came up with a Textmate command to prepend the currently open file with the correct UTF-8 BOM.

Create a new Textmate command, set the save to "current file", the input to "entire document" and the output to "discard". The text for the command should be:

  printf >"$TM_FILEPATH" '\xEF\xBB\xBF'
  cat >>"$TM_FILEPATH"

This will add in the necessary bytes, reread the file into the input buffer, and rescan the file so that Textmate and any open projects will be aware of the change.

Textmate will respect and preserve a file with existing BOM markers. It just won't create them on its own. This should help with that.