Speaking in code: hands-free input with Talon

May 23, 2021

This is an article version of my MagnoliaJS 2021 talk. You can download the slides here.

I often use the phrase “coding by voice” to quickly describe Talon and what it’s for, but that phrase isn’t fully accurate. “Coding by voice” is just a stand-in for the more broad idea of “hands-free input.” The technology I’m talking about isn’t limited to coding, even though that’s a common use for it. If you are coding a website, for example, you are likely switching between writing code in an editor, testing your site and reading documentation in a web browser, and using any number of supplemental tools and applications. Hands-free input aims to help you with all of those tasks and more.

That said, Talon is often used as a tool for programmers to code by voice. At the time of this writing, you all but need to be a programmer to use Talon. It has a setup and configuration process that’s a bit technical, so I would guess its current user base is mostly programmers or technically proficient users. I imagine that will change over time as Talon becomes easier to set up and configure.

Why would anyone want to code by voice?

For me, personally, it’s accessibility. I have a condition called spinal muscular atrophy which, among other things, causes me to have extremely weak muscles—to the point where typing on a physical keyboard is impossible. I require alternate forms of input and speech recognition one of them.

Another big reason people might want to code by voice is if they have repetitive strain injury (RSI). RSI tends to cause pain in the hands, arms, neck, and more. If you do a web search “RSI in programmers” you will find countless stories of RSI affecting people’s work and personal lives. If ignored, RSI symptoms can be severe. Using voice input can help here by giving your hands and arms a rest and break up some of the repetitiveness of typing^[1].

For years I’ve mostly adapted to the problem of not being able to type by using an onscreen keyboard, which is software that allows you to use a mouse to type by clicking on a virtual keyboard. I’ve optimized my onscreen keyboard setup but typing with a mouse is still tedious and, at times, frustratingly slow.

A couple of years ago I came across Talon. I’ve since started using Talon for about a third of my daily computer usage, sometimes more. Others use Talon as their primary means of input. It’s an effective tool, but it takes a shift in the way you think about computer input.

Misconceptions and the current state of voice-controlled productivity

The most well-known type of speech recognition software at this point is probably the digital assistants found in our smart devices. These are normally used in short bursts for quick actions or used to request bite-sized pieces of information.

For long-form usage a popular option is Dragon, which is available for Windows but was discontinued on the Mac in 2018. Both macOS and iOS have built-in dictation available as well as Voice Control, Apple’s voice-controlled accessibility software.

That said, none of these options are great for programming, as seen in this hilarious video of a guy trying to write code using Windows Vista’s speech recognition feature.

I’d wager that this is what most people imagine when they hear “code by voice.”

There are some alternatives specifically for programming. Serenade is one option, though it only supports certain editors and languages (and so isn’t a system-wide alternative). Window users have Dragonfly/Caster as an option, although I don’t know a lot about it.

Bup we’ll be looking at Talon.

Cross-platform, hands-free input with Talon

Talon is powerful, cross-platform software created by developer Ryan Hileman. It is available for free (though the developer has a Patreon). It enables hands-free input via speech recognition, noise recognition, and eye tracking. I’ll be limiting this presentation to speech recognition because I don’t have experience with the others yet. But, in short, eye tracking lets you move the cursor around using the movement of your eyes and head, and noise recognition enables you to script certain actions in response to particular noises (a hissing sound, for example).

Talon’s speech recognition capability is powerful and getting better all the time. Talon’s superpower is its scriptability. It has its own declarative syntax for defining commands and behavior and beyond that it can be extended with Python, which we’ll see some examples of shortly.

Setup and configuration

Talon does require a bit of setup to use. It needs a speech engine (which you can get from its website) ^[2]. Out the box, Talon doesn’t do anything in response to spoken word. It needs configuration files and/or scripts to tell it what to do. There is a community maintained repo, knausj_talon, that you can use to be productive immediately. This is a comprehensive command set that is essentially considered the standard or default configuration. It must be downloaded separately but the Talon website has instructions for setting it up.

In time, you will likely want to modify your configuration or write your own command set. More about this shortly.

Code-by-voice concepts

Coding by voice introduces a new way of thinking about how you type your code. Just as you can optimize code input from a keyboard—learning keyboard shortcuts, configuring your editor, etc—you can do the same for inputting code by voice. We’ll look at some concepts and techniques for dictating code and managing the workflow around it.

Nothing by default

Conventional speech recognition software works by listening for your speech and then transcribing whatever you say. Talon, on the other hand, does no transcribing by default. You need scripts to tell it what it should listen for and what it should do when it hears a command.

That might not seem like a big difference on the surface, but it is. When you’re writing natural language it makes sense to speak it out loud and let the computer transcribe it. But you can’t speak code out loud in the way you speak natural language. So you need a way to express code out loud that a computer can understand. Talon’s command set acts as the interpreter between what you’re saying and what you want the computer to do. Operating in this sort of “command mode” makes it easier to use your system, navigate your code, and enter precise syntax^[3]. And since it’s customizable, the sky’s the limit on what you can accomplish.

Scripts

Sometimes I call them scripts, sometimes I call them a command set but I’m referring to the same thing—it’s the collection of files you put in your Talon user folder, usually a mix of .talon and .py files.

If you’re just starting out you’ll probably want to use the knausj_talon scripts that I mentioned before. It’t fully featured and has help menus and such that can give you an idea of what you can say in different situations.

I chose to write my own command set, although I used knausj_talon as a reference. I wanted to learn how Talon scripts work and I preferred to start with a smaller set of commands and grow from there. I also have a little bit of difficulty enunciating certain words so I continually tweak my commands to choose words that are easier for me to say.

Scripting Talon could be a whole article unto itself, but I’ll show you two quick examples.

Consider this file, whatever.talon:

1enter: key(enter)
2
3next: key(space)
4
5arrow function:
6  insert('() => {}')
7  key(left)
enter: key(enter)

next: key(space)

arrow function:
  insert('() => {}')
  key(left)

This tells Talon about three commands:

When I say “enter,” press the enter/return key.
When I say “next,” press the spacebar.
When I say “arrow function,” insert a JavaScript arrow function template, then press the left arrow key so that my cursor is inside the body of the function.

Here’s a more complex exanple using Python. Consider this code.py file:

1from talon import Module
2 
3mod = Module()
4 
5@mod.capture(rule='runner <word>')
6def npm_script(m) -> str:
7  'Run an npm script.'
8  return 'npm run ' + m.word.lower()
from talon import Module

mod = Module()

@mod.capture(rule='runner <word>')
def npm_script(m) -> str:
  'Run an npm script.'
  return 'npm run ' + m.word.lower()

This is defining what Talon calls a capture. In this example I want to capture the pattern runner <word> and return the form npm run <word>.

This is just the definition of the behavior. We need to enable it in a .talon file. Let’s make a file, code.talon:

1<user.npm_script>: insert(npm_script)
<user.npm_script>: insert(npm_script)

This tells Talon to listen for the capture we defined and, when it occurs, insert the value it returns.

The phonetic alphabet

Computers aren’t as good at understanding spoken word as humans are, so we need to avoid ambiguity when dictating to them. When it comes to dictating individual letters, simply saying the name of letter won’t work. Too many characters sound the same. Consider the “e” sound. “b,” “c,” “d,” “t,” and more sound very similar. This problem isn’t limited to speech recognition software. Military and other organizations need clarity when spelling over radio and telephone where sound quality is degraded. The most common solution is the NATO phonetic alphabet. It substitutes words for alphabet characters (e.g., “Charlie” for “C” and “November” for “N”).

We can use this same technique for code dictation. But we need speed too—not just clarity—and the NATO alphabet becomes cumbersome when you’re trying to dictate quickly. Most Talon users will have a phonetic alphabet with shorter words and fewer syllables so as to speed up dictation and make it easier. Below is a table that shows the words used for the NATO phonetic alphabet, knausj_talon’s alphabet, and the one I use.

Character	NATO	`knausj_talon`	Blake
A	Alfa	Air	Air
B	Bravo	Bat	Bill
C	Charlie	Cap	Cap
D	Delta	Drum	Drum
E	Echo	Each	Each
F	Foxtrot	Fine	Faint
G	Golf	Gust	Gust
H	Hotel	Harp	Ham
I	India	Sit	Sit
J	Juliett	Jury	Jury
K	Kilo	Crunch	Crunch
L	Lima	Look	Little
M	Mike	Made	Made
N	November	Near	Near
O	Oscar	Odd	Orange
P	Papa	Pit	Pink
Q	Quebec	Quench	Queen
R	Romeo	Red	Red
S	Sierra	Sun	Sun
T	Tango	Trap	Trap
U	Uniform	Urge	Urge
V	Victor	Vest	Vest
W	Whiskey	Whale	Wet
X	Xray	Plex	Plex
Y	Yankee	Yank	Yank
Z	Zulu	Zip	Zoo

This technique of using short, distinguishable commands is used throughout most user’s Talon config files and scripts. I’ve modified my commands over time as I come across words with which, in my voice, the speech engine struggles. Getting good accuracy with Talon is a mix of factors—the specific speech engine, your configuration, your microphone, your environment’s background noise, and your dictation style. Generally speaking, you should notice Talon’s accuracy improve as better speech models become available, you weed out problematic words, and you improve your dictation technique. Your microphone is a big piece of it as well and we’ll cover that shortly.

Precise key input

Mainstream speech recognition software doesn’t work well for coding because it’s designed for natural language input. Dragon’s features like automatic word spacing and auto-capitalizing sentences and proper nouns are great for dictating prose. But programing languages follow a much different syntax than natural language and programming languages can be significantly different from each other as well. Punctuation and symbols that aren’t found as often in prose are everywhere in code, so we need a quick and precise way of dictating them.

Keyboards have limited space so they make efficient use of it by reusing keys to input punctuation and symbols. We don’t have the same limitation when using speech recognition so we’ll avoid using modifiers for punctuation. You can use whatever commands work for you, but you’ll want to include all the symbols and punctuation normally found in the programming languages. Here’s a demo of that in action:

Text formatting

Code often has special formatting that’s strange looking compared to natural language so we’ll need a way to specify the format of certain pieces of text. Formatting commands often work by starting with a trigger word and then following that up with the text to be formatted. So a command like this:

1camel hello world
camel hello world

…would produce output like this:

1helloWorld
helloWorld

camel in this case is just an arbitrary word we’re choosing to signal that we want some text to be camel cased. Here’s a demo of the various formatters I use:

You may have noticed that I also used the formatter technique—a command word followed by some other words—to do things like dictate a sentence. This is because, as I said before, Talon doesn’t automatically assume you’re dictating the way Dragon and similar software does. You need to let Talon know that you want to dictate something.

If you think about it, a sentence is just another type of formatting—first word capitalized, followed by words separated with spaces. Same for titles where each word needs to be capitalized. I also use a formatter, more, which just adds a space in front of otherwise normal prose formatting. That means if I dictate few words and then stop to think, I can easily append more text by saying more, followed by the rest of my sentence.

Cursor movement and editing

When I first started using speech recognition for writing and coding, I felt helpless because when I made a mistake I didn’t know how to fix it. Although the end goal of speech recognition technology is to be mistake-free, for now mistakes do happen. They happen when you are typing on a keyboard, too, but you aren’t usually bothered by that because you have the confidence to fix your mistakes quickly. I believe that’s important for using speech recognition, too. In order to feel confident coding by voice, you need to feel confident that you can fix mistakes as they come up. Cursor movement and text editing play a huge role in that.

The knausj_talon command set has a comprehensive list of commands for text navigation. My commands may differ somewhat since I use my own custom configuration but the idea is the same.

The way I navigate and select text is based on the arrow keys and the keyboard shortcuts that move the cursor word by word.

Those are the cursor movement and text selection commands that I use. Again, this is all scriptable so you can have even more powerful behavior if you want. knausj_talon comes with even more ways to move the cursor and select text.

App-specific and language-specific commands

Talon scripts have the ability to become active based on the application you are currently using and even the programming language you are currently writing in. I don’t have as many of these set up in my own personal scripts yet but they can be very powerful. For example, you can make commands for writing functions, parameters, types, etc. They can be language-specific and only activate when needed.

One example of app-specific commands is for playing games. A while back I experimented with using some custom commands to play Minecraft:

Considerations for coding by voice

As I described before, it does require some technical setup to get started. After downloading Talon, you’ll need to download the speech engine, a speech recognition model, and some scripts.

You may find it overwhelming at first to have so many commands to learn. Fortunately, knausj_talon comes with a nice contextually-aware help menu system that you can call up to get a list of available commands.

Talon can be configured with .talon files, which use a syntax that is simpler than a programming language, and these can take you a long way. If you want to dive deeper with customizing your scripts, you’ll need to know some Python and learn about how Talon scripts work. I only dabble in Python and that was enough to be able to write what I think is a fairly good starter command set. So you definitely don’t need to be an expert.

Community and support

While the learning curve to using Talon may seem steep, it is manageable with the help of the Talon community which is pretty active online, including the developer of Talon. There’s a Talon Slack with some cool, helpful folks. The developer is very active there and will help you out when you need it.

If you end up using Talon and want to support the project, you can support Ryan on Patreon. And if you support the beta tier you can get access to pre-release speech recognition improvements. There is a speech recognition model called Conformer that’s in beta at the time of this writing that is very accurate and has received good feedback thus far.

Further resources

Talon’s documentation is a work-in-progress. Talon’s official docs will get you through the initial installation and setup process. It also has some developer documentation. At the time of this writing, the best place for learning about Talon’s scripting API is the Talon community wiki. The Talon community is nice and welcoming and you can get help there too on Slack.

Talon website: https://talonvoice.com/
Talon community Slack: https://talonvoice.slack.com/
Talon community wiki: https://talon.wiki/

I recommend you follow Talon’s installation instructions found on its website. But here’s a quick link to the knausj_talon scripts as well as my personal scripts (which you are welcome to use, fork, etc).

knausj_talon: https://github.com/knausj85/knausj_talon
Blake’s Talon scripts: https://github.com/blakewatson/talon-scripts

I was deeply inspired by Emily Shea’s “Perl Out Loud” talk, which you should definitely check out (note that specific info about Talon is obsolete but the talk shows the concepts really well).

I’m not a doctor, and I’m only describing RSI briefly because I know some people use Talon to prevent RSI symptoms. You should seek proper medical advice if you are experiencing symptoms of RSI. ↩︎
Talon supports two speech engines at this time—wav2letter, which you can get from Talon’s website, and Dragon, which you can purchase from Nuance. Dragon for Mac was discontinued in 2018 but at the time of this writing, Talon supports the latest Dragon version for Mac. You will likely need to find it on eBay as you can no longer get the Mac version from Nuance. The wav2letter speech engine requires a speech recognition model. There’s a model available from Talon’s website. If you join the Patreon beta tier, you can get the Conformer speech model which has improved accuracy over the publicly available model. Conformer will be available publicly sometime later this year. ↩︎
Talon has the concept of “modes,” which means that it can support dictation mode—a mode that continually listens and transcribes what you say, similar to the way other dictation software works. ↩︎