My computer treats me like a computer
I’m coming up to a curve in the road. If I’m honest, I’ve already started the curve. My ability to use a hardware keyboard left me a long time ago, but fortunately I’ve retained my ability to use a mouse, albeit with more and more difficulty. I’ve taken to using SteerMouse to push the sensitivity, acceleration, and speed of my mouse past what most people would be comfortable with. I use an external switch that takes a minute amount of force to activate as my clicking mechanism. And I’ve lost the ability to scroll via touch or a wheel or a gesture. I now rely on scrollbars heavily.
But even with just a mouse, I’m able to type and use the keyboard thanks to the built-in macOS accessibility keyboard. It’s super customizable and I’ve decked mine out with all sorts of shortcuts and macros. But even then, typing is a chore. So several years ago, I started learning how to use Talon, a hands-free input tool, to perform tasks on my computer (depending on the situation; sometimes I prefer voice, sometimes I prefer onscreen keyboard). I’ve gotten pretty good at that at this point, though it struggles to understand me at times.
But I’m getting to the point where even using the mouse is difficult. I fear that I will have to move on to using different mechanisms for moving the mouse cursor—probably eye gazing. And this is the curve in the road I mentioned—my post-mouse era. When you have a progressive condition like mine, you have many of these curves. And I’ve managed to overcome each one as I’ve come to it. But this one really freaks me out.
Voice control software can do a lot, but it’s only as good as its ability to understand you. It does okay for me but doesn’t understand me well enough to take away the frustration. I get by, but that’s it. The great thing about Talon, however, is that it is customizable with voice commands, a phonetic alphabet, and ability to do all sorts of things that are more difficult to do with mainstream voice tools like Dragon.[1] This is what allows me to use Talon for programming without pulling my hair out.
Having said that, Talon needs me to utter specific sequences of commands. If those commands are misheard, then Talon will either reject them outright or perform a wrong action instead. It makes me speak like I’m a computer. If I commit the spoken equivalent of missing a semicolon, then I get an error. That’s annoying, but that’s the state of things. (This is in no way a criticism of Talon. Talon is great. I’m a big fan and I support the project financially. I’m just describing where technology is at now versus where I’m imagining it could be in the future.)
But I’m a person, and I don’t enjoy being a computer. It suddenly occurred to me that I’m accommodating my computer rather than my computer accommodating me. It took me a long time to come to this idea, and the way it happened is that I used Whisper and LLMs for the first time. I saw that it was possible for a computer to hear my utterances and transcribe them with an extremely high amount of accuracy. I’m dictating this blog post right now. By the time you’re reading it, I will have gone back over it and edited and reworded things, yes. But I’m able to vomit this article onto the page just by speaking. And it feels incredible.
But dictating is one thing. As I’ve said, Talon and Dragon do that, even if they don’t do it well for my purposes. I want to do more. I want to speak to my computer as if I had a human assistant right next to me who understood my context, my life, and my intentions. That’s where the LLMs come in. I realize that they don’t understand me per se, but this feels like the first time in computer history that software has come so close to understanding the intention of language. Yes, I know it’s merely predicting the next token or whatever. I get it. Nevertheless, I can see it now. I can visualize this new accessible world and it feels like we’re close to it. I’ve gotten a taste of this vision, and now I can’t put it out of my mind.
I want to start up my computer and not worry about moving my mouse or typing things. I want to tell my computer to open my code editor and go to a particular project. I want to tell it to create a new file and open a function block and type the code I tell it to.[2] I want to tell it to flip over to the terminal and run that code, check it in the browser, etc. I want it to use the browser for me and I want to tell it where to navigate and what to do. I don’t want to mess with a keyboard or a mouse. And I don’t want to speak like a computer. I want to talk normally and have my intentions carried out.
I want to play games that I used to play before I lost the strength to play them. I want to write faster and work better without the struggle of peripheral inputs. I want the computer to help me make the computer more accessible to me. Why should I need to wait for another developer, or even myself, to implement assistive technologies if the computer can understand how it works and make affordances for me?[2:1]
And don’t tell me this already exists. I’m not talking about accessibility tools that require me to speak or input a certain way. I’m talking honest-to-goodness natural language with a human-like capacity of understanding. Maybe that’s AGI. I don’t know.
I digress. I realize that is a lot to ask for. And if you feel the need to, you can roast me over on Mastodon.[3] But while meaningful human interaction with computers has been the stuff of science fiction, it now feels like, in the not-too-distant future, it could be reality.
To be fair, it’s been several years since I’ve used Dragon because it is no longer available on Macs. It’s possible that Dragon is much better nowadays on Windows than I’m giving it credit for. But I still imagine I would have some frustrations using it. ↩︎
I have some cognitive dissonance here because I want my computer to know how to code but not replace me as a programmer. Lol, rip. ↩︎ ↩︎
I think AI training and power consumption present legal and ethical dilemmas, which I hope can be resolved. I’d love to use responsible AI that doesn’t trample people’s rights or use excessive resources. ↩︎