English version

Category codes in TEX

Character category codes, token expanding and 'hash' ("#") problem in TEX

I recently finally understood what "category codes" are in TEX. Coupled with token expansion, these are very delicate things to manipulate. To introduce the problem I will give you an example that comes from my actual experience. Suppose you are using plain TEX, and would like to insert hyperlinks to your document. A way to do this is to use the eplain set of macros, which defines in particular \href and \xrdef:

Now, here is the problem. Suppose you want to create a macro which creates hyperlinks in your document. For instance a macro \exercise that increments a counter and prints "Exercise XX" as a hyperlink to the answer. The first solution that pops into the mind is the following:
\newcount\exno \exno=0
\newcount\answ \answ=0

\def\exercise{
  \global\advance\exno by 1
  \href{#ex\the\exno}{Exercise \the\exno}
}

\def\answer{
  \global\advance\answ by 1
  {\xrdef{ex\the\answ} Answer of exercise \the\answ\quad}
}
  
Suppose we put this code in file exercises.tex, we can include in a document:
\input eplain
\enablehyperlinks

\input exercises

\exercise
This is the first exercise.

\exercise
This is the second exercise.

\eject %page break

\answer
Answer to the first exercise.

\answer
Answer to the second exercise.
  
However, this solution does not work and the following error occurs:
! Illegal parameter number in definition of \exercise.
 
                   e
l.10   \global\advance\exno by 1 \href{#e
                                         x\the\exno}{Exercise \the\exno}
? 
  
The problem is the following: the character "#" is normally used for macro parameters, like #1, #2 and so forth. So TEX expects a number to follow "#" and not the letter "e". However, it is possible to use directly \href{#label}{link to label} (not in a macro) without any problem. Why so?
In fact, \href contains some TEX trickery to allow the use of "#" as a "normal" character. So, when TEX encounters \href in normal text, it does a bit of magic before continuing to read. However, when reading the definition of macro \exercise, it does not expand (i.e., "execute") \href since it only wants to compute the list of tokens that will be associated with macro \exercise. When converting the body of the macro into tokens, it then finds the hash "#" to be a problem. To understand how to bypass this problem, we need to know what are the "category codes" in TEX.

Category codes in TEX

In fact, TEX does not "know" that "#" is the character for arguments, at neither that "{" and "}" are used for grouping. All it does know is that a character of category code 1 (like "{") opens a group, a character of category code 2 (like "}") closes a group, and a character of category 6 (like "#") is used for macro arguments. Category codes are a kind of labels that are attached to characters, but are not fixed. When TEX reads a character, its actions are determined by the category of that character. There are 16 categories that I will not describe here (one can find them on the TeXbook), but what is important to know is that it is possible to change the category of one character by using the command \catcode. For instance, the category codes of "{", "}", and "#" are set every time TEX is run by using the following commands:
\catcode`{=1
\catcode`}=2
\catcode`#=6
  
Suppose you prefer pikes ("<" and ">") to be used for grouping, you can just add at the beginning of your file:
\catcode`<=1
\catcode`>=2
  
And then you can write some TEX using pikes instead of braces, and it is even possible to mix them:
\def\a< this is a >
\def\b{ this is b >
  
Two other important category codes for us now are the categories 11, category "letters" (a-z and A-Z), and 12, category "others" which contains for instance "@" or "!". TEX does not do anything special when it encounters a character of category 12 (it just prints it), so, if you want for instance to type a lot of "#" without using "\#", it is possible to do the following:
\catcode`#=12
This is # just some normal \TeX with a lot # of # hash characters # scattered around.
However, this will prevent you from using arguments in later macro definitions, so it is advisable to restore the category of "#" afterwards:
\catcode`#=12
Text with lots of #.
\catcode`#=6
Normal code should now use \#.
A better solution is to use braces since category code definitions obey grouping (i.e., the category code is restored to its value before the beginning of the group whenever leaving the group):
{ \catcode`#=12
  Text with lots of #.
}
% category code of # is restored
Normal code should now use \#.

Back to the problem of # in \href

It now seems possible to solve our problem by changing the category code of "#" in the \exercise macro:
\def\exercise{
  \catcode`\#=12
  \global\advance\exno by 1 \href{#ex\the\exno}{Exercise \the\exno}
}
But this gives exactly the same error as above:
! Illegal parameter number in definition of \exercise.
 
                   e
l.10   \global\advance\exno by 1 \href{#e
                                         x\the\exno}{Exercise \the\exno}
?         
Now, to understand the problem, it is really important to know what \catcode does and does not. It does change the category code of character that will be read next, but does not change the category code of characters already read and converted to tokens. So, the problem is again that when reading the definition of macro \exercise, the whole body is converted to tokens, so the \catcode command is not executed, and whenever TEX reads "#" it is still of category code 6. The solution is then to change the category code of "#" before defining \exercise, as follows:
{
  \catcode`#=12
  \gdef\exercise{
    \global\advance\exno by 1 \href{#ex\the\exno}{Exercise \the\exno}
  }
}
Notice the grouping so that the category of "#" reverts back to 6 after the definition. Notice also that the definition of \exercise should now be global (using \gdef) since \def also obeys grouping. Without it, \exercise would be defined only until the current groups ends, hence not defined after the last "}".
Now, to conclude, suppose we want the \exercise macro to have an argument so that it prints as "Exercise XX (argument)", how can we do that since "#" is not available for arguments anymore?
Answer: as explained before, we are not restricted to the hash sign for macro arguments, any character with category code 6 will do the trick, for instance "a" after \catcode`a=6. However, this would forbid us to use the "a" which would be a problem as we need it (e.g., for \advance). It is better to use for instance "!" or "@":
{
  \catcode`#=12
  \catcode`!=6
  \gdef\exercisearg!1{
    \global\advance\exno by 1 \href{#ex\the\exno}{Exercise \the\exno} ({\it !1})
  }
}
\exercisearg{category codes\dots}
You can download the files I've used: the TEX file, and the .dvi file obtained after TEXing the .tex file.

Category code 13: active characters

One special category code deserves the right to have a name: category 13, denoted by \active. Active characters are normal control sequences, but are not prefixed with an espace (\). For instance, ~ is an active character that has been def'd to a non breakable space (a space with infinite penalty if breaking it). It is possible to make any character active to use them as control sequences. For instance,
\catcode`?=\active
\catcode`a=\active
\def?{coucou}
\defa{hello tout le monde}

?a?a?a?a?a?a?
will print coucouhello tout le mondecoucouhello tout le mondecoucouhello tout le mondecoucouhello tout le mondecoucouhello tout le mondecoucouhello tout le mondecoucou, though it is usually not advisable to change the category codes of letters or digits for obvious reasons... (Except maybe the letter 'e' but only if your name is George Perec and you want to TEX-typeset La disparition. But remember also to redefine first all sectioning commands.)
Dernière modification : jeudi 10 décembre 2009
Boite aux lettres
Powered by the ENS Lyon