08 Feb Regular Expressions Save the Day
Comic by xkcd.com This week, we received a message from the past: a captioning file from the early ‘90s, delivered in an unsupported format. It came with an executable file that was designed to drive a simple encoder from a 5¼” floppy disk—no joke! Fortunately, we also received a text version of the file, which we were able to reformat into a captioning file we could use. Before we show you how we converted the file, have a look back at this post—another reminder of the value of saving data in archival formats, like text files. Doing so ensures they can be opened and read by future applications, where the compiled CC files of the past are totally orphaned. And now, on to this week’s challenge. Here’s how we reformatted the text version (source file):
The first steps to converting this file: basic search and replace.
- To remove the F in the time codes, replace F1, F2, and F0 with :1, :2, and :0.
- Convert the position codes to a supportable format. We remember how to read these old files, and know that what used to be called ,C 14 (centre position, bottom row) is now referred to as *Cf16 in Cheetah .ASC import/export format.
- There are also left and right position codes; search and replace those as well.
Search and replace is an amazing tool, and by doing a progressive series of searches and replaces you can reformat files extensively. But when you need to reorder elements, like we had to do here to pop the timecode line up above the text, and then to separate it, you need GREP and a text editor that can GREP*. Basically, you use GREP to define groups of content, name the groups, and then put them back in a different order. In our example here, the position code was Content Group 1, the first line of the caption was Content Group 2, the third line was Content Group 3, and the timecode line was Content Group 4. We put the content groups back in the order 4, 1, 2, 3. Bob’s your uncle! Systematic thinking can save us from unnecessary work (in this case, redoing a caption file made by someone else a long time ago). So the final word: Get out there and GREP. *We used BBEdit for the Mac, a well-known and well trusted text editor (TextWrangler is the free version). On Windows, you can use Notepad++ or UltraEdit. These tools support the standard regular expression language, so the techniques work regardless of which tool you’re using.
Sound-Alike Words with Very Different Meanings
This week while proofing a transcript, we found the following error:
PROTEGE stood in for POTAGER.
These two words sound very similar but in fact are two distinct things. PROTEGE, of course, meaning a talented student, POTAGER being a kitchen garden! This type of mistake is common, especially if, as a transcriber, you’re not necessarily familiar with the vocabulary, leading you to understand the word as something else.
Another recent example: LINDISFARNE (an island off the coast of England), transcribed as LINDA’S FARM. Oops.
Always make sure you’re listening carefully to avoid these kinds of mistakes in your captioning. And if you think you’re hearing a word you might not know, be sure to do your due diligence and determine exactly what that word might be. We’ve said it before and we’ll say it again: accuracy is key in closed captioning!
May’s Recipe: Nightcap: Single Malt Scotch—our easiest recipe yet!
Photo by Coffee Geek by CC BY-NC-ND 2.0
Select a high quality Scotch whisky, then spell it like the Scots do—without an “e.” Pour one ounce in a glass and drink it neat (i.e., no ice), breathing in the peaty terroir! And for a one-of-a-kind whisky amusement ride, we highly recommend a visit to Edinburgh’s Scotch Whisky experience. Cheers!