m4 is a powerful macro processor thatโs been around for more than thirty years. Hereโs the first of two parts, introducing its many magnificent talents.
A macro processor scans input text for defined symbols โ the macros โ and replaces that text by other text, or possibly by other symbols. For instance, a macro processor can convert one language into another.
If youโre a C programmer, you know cpp, the C preprocessor, a simple macro processor. m4 is a powerful macro processor thatโs been part of Unix for some 30 years, but itโs almost unknown โ except for special purposes, such as generating the sendmail.cf file. Itโs worth knowing because you can do things with m4 that are hard to do any other way.
The GNU version of m4 has some extensions from the original V7 version. (Youโll see some of them.) As of this writing, the latest GNU version was 1.4.2, released in August 2004. Version 2.0 is under development.
While you wonโt become an m4 wizard in three pages (or in six, as the discussion of m4 continues next month), but you can master the basics. So, letโs dig in.
A simple way to do macro substitution is with tools like sed and cpp. For instance, the command sed's/XPRESIDENTX/President Bush/' reads lines of text, changing every occurrence of XPRESIDENTX to President Bush. sed can also test and branch, for some rudimentary decision-making.
As another example, hereโs a C program with a cpp macro named ABSDIFF() that accepts two arguments, a and b.
#define ABSDIFF(a, b) \ ((a)>(b) ? (a)-(b) : (b)-(a))
Given that definition, cpp will replace the following code:
diff = ABSDIFF(v1, v2);with:
diff = ((v1)>(v2) ? (v1)-(v2) : (v2)-(v1));
v1 replaces a everywhere, and v2 replaces b. ABSDIFF() saves typing โ and the chance for error.
Unlike sed and other languages, m4 is designed specifically for macro processing. m4 manipulates files, performs arithmetic, has functions for handling strings, and can do much more.
m4 copies its input (from files or standard input) to standard output. It checks each token (a name, a quoted string, or any single character thatโs not a part of either a name or a string) to see if itโs the name of a macro. If so, the token is replaced by the macroโs value, and then that text is pushed back onto the input to be rescanned. (If youโre new to m4, this repeated scanning may surprise you, but itโs one key to m4 s power.) Quoting text, like ` textโ, prevents expansion. (See the section on โQuoting.โ)
m4 comes with a number of predefined macros, or you can write your own macros by calling the define() function. A macro can have multiple arguments โ up to 9 in original m4, and an unlimited number in GNU m4. Macro arguments are substituted before the resulting text is rescanned.
Hereโs a simple example (saved in a file named foo.m4):
one
define(`one', `ONE')dnl
one
define(`ONE', `two')dnl
one ONE oneONE
`one'
The file defines two macros named one and ONE. It also has four
lines of text. If you feed the file to m4 using m4 foo.m4, m4 produces:
one
ONE
two two oneONE
one
Hereโs whatโs happening:
*Line 1 of the input, which is simply the characters one and a newline, doesnโt match any macro (so far), so itโs copied to the output as-is.
*Line 2 defines a macro named one(). (The opening parenthesis before the arguments must come just after define with no whitespace between.) From this point on, any input string one will be replaced with ONE. (The dnl is explained below.)
*Line 3, which is again the characters one and a newline, is affected by the just-defined macro one(). So, the text one is converted to the text ONE and a newline.
*Line 4 defines a new macro named ONE(). Macro names are case-sensitive.
*Line 5 has three space-separated tokens. The first two are one and ONE. The first is converted to ONE by the macro named one(), then both are converted to two by the macro named ONE(). Rescanning doesnโt find any additional matches (thereโs no macro named two()), so the first two words are output as two two. The rest of line 5 (a space, oneONE, and a newline) doesnโt match a macro so itโs output as-is. In other words, a macro name is only recognized when itโs surrounded by non-alphanumerics.
*Line 6 contains the text one inside a pair of quotes, then a newline. (As youโve seen, the opening quote is a backquote or grave accent; the closing quote is a single quote or acute accent.) Quoted text doesnโt match any macros, so itโs output as-is: one. Next comes the final newline.
Input text is copied to the output as-is and that includes newlines. The built-in dnl function, which stands for โdelete to new line,โ reads and discards all characters up to and including the next newline. (One of its uses is to put comments into an m4 file.) Without dnl, the newline after each of our calls to define would be output as-is. We could demonstrate that by editing foo.m4 to remove the two dnl s. But, to stretch things a bit, letโs use sed to remove those two calls from the file and pipe the result to m4:
$ sed 's/dnl//' foo.m4 | m4
one
ONE
two two oneONE
one
If you compare this example to the previous one, youโll see that there
are two extra newlines at the places where dnl used to be.
Letโs summarize. Youโve seen that input is read from the first character to the last. Macros affect input text only after theyโre defined. Input tokens are compared to macro names and, if they match, replaced by the macroโs value. Any input modified by a macro is pushed back onto the input and is rescanned for possible modification. Other text (that isnโt modified by a macro) is passed to the output as-is.
Any text surrounded by `' (a grave accent and an acute accent) isnโt expanded immediately. Whenever m4 evaluates something, it strips off one level of quotes. When you define a macro, youโll often want to quote the arguments โ but not always. Listing One has a demo. It uses m4 interactively, typing text to its standard input.