How terminal works. Part 1: Xterm, user input
Motivation
This blog series explains how modern terminals and command-line tools work. The primary goal here is to learn by experimenting. I’ll provide Linux tools to debug every component mentioned in the discussion. Our focus is to discover how things work. For the explanation of why things work in a certain way, I encourage the reader to read excellent articles:
and to visit a computer history museum:
Please note that I talk solely about Linux (because that is what I use), but many discussed concepts should apply to other Unix-like systems.
I’ve chosen the “learn by experimenting” approach because that’s how I’ve learned about command-line tools. In my case, there was no single “click” moment after which I’ve understood all the things. Instead, I’ve learned through a never-ending process of building mental models, proving them to be wrong, and then adjusting those models to reflect new knowledge.
Target audience are people who want to start working on command-line tools.
The series consists of 4 parts. The first two parts discuss how xterm work. Parts 3 and 4 talk about different features of tty:
- Part 1: Xterm, user input;
- Part 2: Xterm, CLI tool output;
- Part 3: pty, stty;
- Part 4: pty, sessions.
Introduction
Let’s start the discussion with an inaccurate diagram that shows a general use case for working with a command-line shell:
(1) (2) (3)
user <---> xterm <---> bash
The user interacts with bash using a terminal emulator xterm. xterm is a GUI app that receives “key pressed” events and writes corresponding characters into a bidirectional filehandle (2). Bash reads those characters from (2) does something and sends the output back to xterm, using the same filehandle (2). Xterm reads, bash outputs from (2) and renders them on the screen. (2) is “just a file” and this communication scheme looks pretty simple.
If the user asks bash to execute a command, let’s say cat log.txt
then bash
spawns cat
which uses the same filehandle to send its output to xterm:
bash
(1) (2) (4)
user <---> xterm <---> cat
Again, pretty simple. In this unrealistic model (2) is “just a file” xterm and cat exchange plain text.
In reality, things are slightly more complicated. Evolution extended the simple scheme of “using a bidirectional filehandle to exchange plain text” to implement additional features:
- TUI interfaces. The terminal can draw characters at an arbitrary part of the screen; command-line tools can ask capabilities of the terminal and can handle window resize;
- job control. Shell organizes processes into logical groups which can be paused/resumed or stopped altogether;
- access control for the filehandle (2). Bash has a feature to spawn background processes. This might lead to a situation when two processes are writing their output into the same filehandle (2) at the same time; there should be some access control mechanism;
- “fixing” stupid tools which believe that the terminal is just a file with plain text; so that those tools look and feel better.
User input
Requirement above and 50 years of history led us to this scheme:
(1) (2) (3)
user <---> xterm <---> tty <---> bash
The first thing to notice is a “middle man” tty between xterm and bash. We will discuss tty in parts 3 and 4. For now, we will just say that:
- tty sits between xterm and bash and passes data from one to the other in both directions;
- depending on its configuration, tty changes data it receives from one side before passing to the other;
- there is command
stty raw -echo -isig
which configures tty to pass data “as is without modification”.
Using stty raw -echo -isig
to disable most effects of tty is our primary
strategy to explore how xterm works. Until the part 3, we will ignore the
existence of tty and will concentrate on exploring xterm’s behavior.
Let’s start by discussing a bi-directional link between a user and xterm.
Converting scancodes that come from a keyboard into GUI events happens in two
steps. First, Linux handles hardware events and turns them into keycodes that
can be read by userland (using device descriptors like
/dev/input/by-id/usb-2.4G_2.4G_Wireless_Device-event-kbd
). Second, Windows
system (X or Wayland) reads Linux keycodes and converts them into its own
keycodes, and also assigns a keysym (i.e. a Unicode character). To check how it
works, one can use:
sudo showkey
to explore Linux keycodes (visit this page for more info);xev
(orwev
for Wayland users) to explore GUI events.
For example, when I press the q
button on my keyboard, depending on my keyboard layout I see:
- showkey: keycode 16
- xev: keycode 24 (keysym 0x71, q)
- xev: keycode 24 (keysym 0x6ca, Cyrillic_shorti)
xterm receives keypress events and writes data into tty(2):
- it encodes printable characters using configured encoding (most probably UTF-8);
- on receiving some key combinations, it executes actions such as copy-paste from clipboard;
- it encodes other key combinations and non-printable characters (such as arrow keys) using ANSI escape sequences (see post #2 for more details about ANSI sequences).
So converting key presses into data written into tty(2) happens in 3 steps, two involving kernel and one in xterm. Now let’s figure out what xterm sends into tty after all those 3 steps. There are two strategies we can use to accomplish this task:
strace
: trace system calls (we will tracewrite
andread
calls, but be aware that there are also aio API);- run a command-line tool that will
- disable tty’s input/output processing using
stty raw -echo -isig
- log its inputs.
- disable tty’s input/output processing using
strace
Let’s start with strace
because it’s quite a practical approach. In your daily
life, if you’ll get stuck with misbehaving command-line tools, you can attach to
a running process and observe what your terminal is writing into filehandles and
what your shell reads. You don’t need to restart running programs to figure out
what is going on.
First, here is a little helper to find out PID of a terminal by clicking it with a computer mouse (for users of XWindows system):
xprop | grep '_NET_WM_PID(CARDINAL)' | awk '{print $3}'
Then let’s observe what xterm writes and reads into/from filehandles (please
replace -p 22853
with an appropriate PID):
sudo strace -f -e 'trace=write,read' -e write=all -e read=all -p 22853 2>&1 | grep -v EAGAIN
For testing, I’ve entered qwe
sequence and strace gave me:
write(4, "q", 1) = 1
| 00000 71 q |
read(4, "q", 4096) = 1
| 00000 71 q |
write(4, "w", 1) = 1
| 00000 77 w |
read(4, "w", 4096) = 1
| 00000 77 w |
write(4, "e", 1) = 1
| 00000 65 e |
read(4, "e", 4096) = 1
| 00000 65
That makes sense. xterm sends (writes) q
. tty+bash echoes back q
to display
it so that the user can see what he/she entered. Then a sequence we
follows
the same pattern. Now, I’ll try arrow keys: the left arrow and then the right
arrow:
write(4, "\33[D", 3) = 3
| 00000 1b 5b 44 .[D |
read(4, "\10", 4096) = 1
| 00000 08 . |
write(4, "\33[C", 3) = 3
| 00000 1b 5b 43 .[C |
read(4, "\33[C", 4096) = 3
| 00000 1b 5b 43 .[C |
For left arrow key, xterm sends \33[D
and receives back \10
. man ascii
tells us that 33
Oct is the same 1b
Hex and it’s a \ESC
(escape) ASCII
control character. 10
Oct is 08
Hex and its BS
backspace control character
(commonly abbreviated as \b
thanks to C programming language). We will discuss
ANSI escape sequences and ASCII control characters soon, for now, we can confirm
that using strace helps to observe what xterm is actually doing: it sends
qwe\ESC[D\ESC[C
and receives qwe\b\ESC[C
.
Let’s use strace to observe what bash is doing.
sh-4.4$ echo $$
5944
Entering the sequence qwe<left><right>
gives me symmetrical result from the
bash side: It receives qwe\ESC[D\ESC[C
and sends qwe\b\ESC[C
back.
read(0, "\33", 1) = 1
| 00000 1b . |
read(0, "[", 1) = 1
| 00000 5b [ |
read(0, "D", 1) = 1
| 00000 44 D |
write(2, "\10", 1) = 1
| 00000 08
I’ve promised to ignore tty for a while, but just to show why it might be useful
to strace both a terminal and bash, let’s experiment. Let’s execute cat -
command and observe in real-time what xterm is sending to tty and what cat
receives.
First, let’s get the PID of a shell and then execute cat
echo $$
10519
sh-4.4$ cat -
Then in the other terminal window, let’s find out the PID of cat
using “parent
PID” option of ps:
ps --ppid 10519
PID TTY TIME CMD
10560 pts/5 00:00:00 cat
In my system, experiment shows that xterm writes characters one by one
immediately after I’ve pressed a keyboard button. Yet cat
receives the entire
line only after I’ve pressed Enter. I can use the Backspace key to erase
previously entered characters, which is relatively complicated logic. This logic
is part of what tty is capable of.
read(0, "qwe\33[D\33[C\n", 131072) = 10
| 00000 71 77 65 1b 5b 44 1b 5b 43 0a qwe.[D.[C. |
We will discuss tty in the 2nd part. For now, let’s just enjoy the success of
our debugging approach: we’ve just observed what exactly xterm and bash send
to each other and how tty (which sits in the middle) can alter data before
sending it to a consumer. The big limitation of such an approach is that reading
sequences like \33[D\33[C\n
require a certain patience and might be quite hard
if applications output a lot of data ¯_(ツ)_/¯.
Printing non-printable
While playing with strace we’ve encountered sequences like this \33[D\33[C
which I’ve later written like this: \ESC[D\ESC[C
. In my daily life, I
sometimes encounter different notations, for example \u001b[D\u001b[C
,
\x1b[D\x1b[C
, or something else. Different software uses different conventions
for visualizing non-printable characters. Also, many programming languages have
a way to embed non-printable characters into string literals using a sequence of
printable characters. But again, conventions for representing non-printable
characters using printable ones differ between programming languages.
Let’s discover how different software visualizes the ESC (escape) ASCII character:
printf "\x1b" > data.txt
- vi, emacs:
^[
- less:
ESC
- code, gedit : on my systems render some nonsense
- hexdump:
1b
(hexdump supports many output formats) - od -a :
esc
(od supports many output formats) - strace:
\33
and1b
- python:
\x1b
open("/tmp/data.txt", "r").read()
- Haskell:
\ESC
import qualified Data.ByteString as BS BS.readFile "/tmp/data.txt" >>= print
- nodejs:
\u001b
const fs = require('fs') console.dir( fs.readFileSync('/tmp/data.txt', 'utf8') )
To make things more confusing, some popular programming languages support syntax for embedding non-printable characters into string literals, but don’t provide easily accessible function to convert a string into the same notation. For example, using the C programming language, I can easily make a string containing ESC character:
char* str = "\x1b";
But the easiest way I know to visualize it using printable characters is to write code like this:
#include <stdio.h>
#include <ctype.h>
int main() {
FILE* f = fopen("/tmp/data.txt", "r");
int c = fgetc(f);
while (!feof(f)) {
if (isprint(c))
printf("%c", c);
else
printf("\\x%x", c);
c = fgetc(f);
}
return 0;
}
The moral here is that different tools visualize non-printable characters
differently. To make things less confusing it’s helpful to train your eye to
recognize magic strings ^[
, \ESC
, ESC
, esc
, 1b
, \x1b
, 0x1b
,
\u001b
, 33
, 27
. Also, it’s helpful to choose tools you can understand even
under stress.
stty raw -echo -isig
We’ve traced xterm using strace
to check what it sends to bash. We can
accomplish a similar task without using a tracing tool. The most fool-proof way
to do so is to disable the effects of tty and to dump binary data which comes
from tty into a file. Then we can explore the content of a file using our
favorite tool of choice:
stty raw -echo -isig; dd bs=1 of=/tmp/data.txt
I prefer to use vi or od:
vi /tmp/data.txt
od -ac /tmp/data.txt
It might be cool to visualize the same data in real-time. One can use this bash one-liner:
stty sane -isig -echo -icanon; while true; do od -N 1 -ax -; done
Or convert man ascii
into a small c
program. It executes stty raw
-echo
on startup, so that tty doesn’t change terminal output and hence the tool
shows what terminal sends into tty.
Pressing a sequence of a
, 1
, Ctrl+d
, Ctrl+l
gives:
a
1
EOT (end of transmission)
FF '\f' (form feed)
Alt+d
gives 2 characters:
ESC (escape)
d
Ctrl+Alt+Shift+d
gives:
ESC (escape)
EOT (end of transmission)
which is the same as Ctrl+Alt+d
, Shift is just ignored.
That behavior of xterm is not set in stone and it is configurable in a
terminal-dependent way. Depending on its configuration, xterm might send
different things in response to Ctrl+Alt+Shift
combination. Here is the
discussion about xterm modified
keys.
UTF-8
Utf8 has a few nice features which I didn’t appreciate enough until recently:
- Utf8 is a self-synchronizing code. If you take any Utf8-encoded string and
randomly chop off the beginning so that you end up in the middle of
multi-byte character, then:
- you’ll be able to detect an error: an attempt to decode an invalid character;
- you’ll be able to recover from the error by discarding bytes of a broken character and figure out the beginning of the next valid character.
- ASCII characters (including control character) are valid one-byte Utf8 encoded characters.
Combined (1) and (2) give us a nice property that control characters will never appear as part of multi-byte characters. I.e. python code below is correct for any string comprising multi-byte Utf8 characters:
"编程很有趣".find("\n") == -1
Let’s understand why this is the case. 编程很有趣
is encoded using 15 bytes;
below I’ve represented bytes using decimal numbers:
编 程 很 有 趣
231 188 150 | 231 168 139 | 229 190 136 | 230 156 137 | 232 182 163
All these numbers are greater than 127 Dec. But all ASCII characters (including control characters) are lesser or equal to 127 Dec. So it’s safe to search for single-byte ASCII characters in Utf8 strings without decoding them because all bytes of every valid multi-byte character is guaranteed to be greater than 127 Dec.
Error recovery is possible, and it works surprisingly simple. In binary
notation, all ASCII characters start with leading 0
, all bytes of multi-byte
characters start with 1
. In addition, for multi-byte characters:
- only the first byte of a character can start with
11
; - all continuation bytes (bytes 2, 3, 4) start with
10
.
You can easily observe these properties in action:
编
11100111
10111100
10010110
程
11100111
10101000
10001011
很
11100101
10111110
10001000
有
11100110
10011100
10001001
趣
11101000
10110110
10100011
Each byte starts with 1
indicating that it’s a part of a multi-byte character.
Also, each byte contains an indication if it’s the first byte or a continuation
byte.
Conclusion
Using strace and by disabling tty features, we’ve explored how keyboard input from users reaches command-line tools. We also saw that xterm might send non-printable characters and different tools visualize non-printable characters differently. Also, we’ve improved our mental resilience by getting accustomed to different notations and by trying different tools for visualizing control characters. Finally, we said a few words about Utf8 encoding, which is the most widely used Unicode encoding nowadays.
In this blog, post we’ve discussed how xterm handles user input. Next post will discuss how xterm visualizes the output of CLI tools.
Stay tuned :)