Macro Magic: m4, Part One

m4 is a powerful macro processor that’s been around for more than thirty years. Here’s the first of two parts, introducing its many magnificent talents.

A macro processor scans input text for defined symbols — the macros — and replaces that text by other text, or possibly by other symbols. For instance, a macro processor can convert one language into another.

If you’re a C programmer, you know cpp, the C preprocessor, a simple macro processor. m4 is a powerful macro processor that’s been part of Unix for some 30 years, but it’s almost unknown — except for special purposes, such as generating the sendmail.cf file. It’s worth knowing because you can do things with m4 that are hard to do any other way.

The GNU version of m4 has some extensions from the original V7 version. (You’ll see some of them.) As of this writing, the latest GNU version was 1.4.2, released in August 2004. Version 2.0 is under development.

While you won’t become an m4 wizard in three pages (or in six, as the discussion of m4 continues next month), but you can master the basics. So, let’s dig in.

Simple Macro Processing

A simple way to do macro substitution is with tools like sed and cpp. For instance, the command sed's/XPRESIDENTX/President Bush/' reads lines of text, changing every occurrence of XPRESIDENTX to President Bush. sed can also test and branch, for some rudimentary decision-making.

As another example, here’s a C program with a cpp macro named ABSDIFF() that accepts two arguments, a and b.

#define ABSDIFF(a, b) \
   ((a)>(b) ? (a)-(b) : (b)-(a))

Given that definition, cpp will replace the following code:

diff = ABSDIFF(v1, v2);

with:

diff = ((v1)>(v2) ? (v1)-(v2) : (v2)-(v1));

v1 replaces a everywhere, and v2 replaces b. ABSDIFF() saves typing — and the chance for error.

Introducing m4

Unlike sed and other languages, m4 is designed specifically for macro processing. m4 manipulates files, performs arithmetic, has functions for handling strings, and can do much more.

m4 copies its input (from files or standard input) to standard output. It checks each token (a name, a quoted string, or any single character that’s not a part of either a name or a string) to see if it’s the name of a macro. If so, the token is replaced by the macro’s value, and then that text is pushed back onto the input to be rescanned. (If you’re new to m4, this repeated scanning may surprise you, but it’s one key to m4 s power.) Quoting text, like ` text‘, prevents expansion. (See the section on “Quoting.”)

m4 comes with a number of predefined macros, or you can write your own macros by calling the define() function. A macro can have multiple arguments — up to 9 in original m4, and an unlimited number in GNU m4. Macro arguments are substituted before the resulting text is rescanned.

Here’s a simple example (saved in a file named foo.m4):

one
define(`one', `ONE')dnl
one
define(`ONE', `two')dnl
one ONE oneONE
`one'

The file defines two macros named one and ONE. It also has four

lines of text. If you feed the file to m4 using m4 foo.m4, m4 produces:

one
ONE
two two oneONE
one

Here’s what’s happening:

*Line 1 of the input, which is simply the characters one and a newline, doesn’t match any macro (so far), so it’s copied to the output as-is.

*Line 2 defines a macro named one(). (The opening parenthesis before the arguments must come just after define with no whitespace between.) From this point on, any input string one will be replaced with ONE. (The dnl is explained below.)

*Line 3, which is again the characters one and a newline, is affected by the just-defined macro one(). So, the text one is converted to the text ONE and a newline.

*Line 4 defines a new macro named ONE(). Macro names are case-sensitive.

*Line 5 has three space-separated tokens. The first two are one and ONE. The first is converted to ONE by the macro named one(), then both are converted to two by the macro named ONE(). Rescanning doesn’t find any additional matches (there’s no macro named two()), so the first two words are output as two two. The rest of line 5 (a space, oneONE, and a newline) doesn’t match a macro so it’s output as-is. In other words, a macro name is only recognized when it’s surrounded by non-alphanumerics.

*Line 6 contains the text one inside a pair of quotes, then a newline. (As you’ve seen, the opening quote is a backquote or grave accent; the closing quote is a single quote or acute accent.) Quoted text doesn’t match any macros, so it’s output as-is: one. Next comes the final newline.

Input text is copied to the output as-is and that includes newlines. The built-in dnl function, which stands for “delete to new line,” reads and discards all characters up to and including the next newline. (One of its uses is to put comments into an m4 file.) Without dnl, the newline after each of our calls to define would be output as-is. We could demonstrate that by editing foo.m4 to remove the two dnl s. But, to stretch things a bit, let’s use sed to remove those two calls from the file and pipe the result to m4:

$ sed 's/dnl//' foo.m4 | m4
one


 ONE


 two two oneONE
one

If you compare this example to the previous one, you’ll see that there

are two extra newlines at the places where dnl used to be.

Let’s summarize. You’ve seen that input is read from the first character to the last. Macros affect input text only after they’re defined. Input tokens are compared to macro names and, if they match, replaced by the macro’s value. Any input modified by a macro is pushed back onto the input and is rescanned for possible modification. Other text (that isn’t modified by a macro) is passed to the output as-is.

Quoting

Any text surrounded by `' (a grave accent and an acute accent) isn’t expanded immediately. Whenever m4 evaluates something, it strips off one level of quotes. When you define a macro, you’ll often want to quote the arguments — but not always. Listing One has a demo. It uses m4 interactively, typing text to its standard input.

Listing One: Quoting demonstration

$ m4
define(A, 100)dnl
define(B, A)dnl
define(C, `A')dnl
dumpdef(`A', `B', `C')dnl
A:      100
B:      100
C:      A
dumpdef(A, B, C)dnl
stdin:5: m4: Undefined name 100
stdin:5: m4: Undefined name 100
stdin:5: m4: Undefined name 100
A B C
100 100 100
CTRL-D
$

The listing starts by defining three macros A, B, and C. A has the value 100. So does B: because its argument A isn’t quoted, m4 replaces A with 100 before assigning that value to B. While defining C, though, quoting the argument means that its value becomes literal A.
You can see the values of macros by calling the built-in function dumpdef with the names of the macros. As expected, A and B have the value 100, but C has A.
In the second call to dumpdef, the names are not quoted, so each name is expanded to 100 before dumpdef sees them. That explains the error messages, because there’s no macro named 100. In the same way, if we simply enter the macro names, the three tokens are scanned repeatedly, and they all end up as 100.
You can change the quoting characters at any time by calling changequote. For instance, in text containing lots of quote marks, you could call changequote({,})dnl to change the quoting characters to curly braces. To restore the defaults, simply call changequote with no arguments.
In general, for safety, it’s a good idea to quote all input text that isn’t a macro call. This avoids m4 interpreting a literal word as a call to a macro. Another way to avoid this problem is by using the GNU m4 option --prefix-builtins or -P. It changes all built-in macro names to be prefixed by m4_. (The option doesn’t affect user-defined macros.) So, under this option, you’d write m4_dnl and m4_define instead of dnl and define, respectively.
Keep quoting and rescanning in mind as you use m4. Not to be tedious, but remember that m4 does rescan its input. For some in-depth tips, see “Web Paging: Tips and Hints on m4 Quoting” by R.K. Owen, Ph.D., at http://owen.sj.ca.us/rkowen/howto/webpaging/m4tipsquote.html.
Decisions and Math
m4 can do arithmetic with its built-in functions eval, incr, and decr. m4 doesn’t support loops directly, but you can combine recursion and the decision macro ifelse to write loops.
Let’s start with an example adapted from the file /usr/share/doc/m4/examples/debug.m4 (on a Debian system). It defines the macro countdown(). Evaluating the macro with an argument of 5 — as in countdown(5) — outputs the text 5, 4, 3, 2, 1, 0, Liftoff!.
$ cat countdown.m4
define(`countdown', `$1, ifelse(eval($1 > 0),
   1, `countdown(decr($1))', `Liftoff!')')dnl
countdown(5)
$ m4 countdown.m4
5, 4, 3, 2, 1, 0, Liftoff!

The countdown() macro has a single argument. It’s broken across two
lines.That’s fine in m4 because macro arguments are delimited by
parentheses which don’t have to be on the same line. Here’s the
argument without its surrounding quotes:
$1, ifelse(eval($1 > 0), 1,
   `countdown(decr($1))', `Liftoff!')
)

$1 expands to the macro’s first argument. When m4 evaluates that
countdown macro with an argument of 5, the result is:
5, ifelse(eval(5 > 0), 1,
   `countdown(decr(5))', `Liftoff!')

The leading “5,” is plain text that’s output as-is as the first number in the countdown. The rest of the argument is a call to ifelse. Ifelse
compares its first two arguments. If they’re equal, the third argument
is evaluated; otherwise, the (optional) fourth argument is evaluated.
Here, the first argument to ifelse, eval(5> 0), evaluates as 1 (logical“ true”) if the test is true (if 5 is greater than 0). So the first two arguments are equal, and m4 evaluates countdown(decr(5)). This starts the recursion by calling countdown(4).
Once we reach the base condition of countdown(0), the test eval(0> 0) fails and the ifelse call evaluates `Liftoff!’. (If recursion is new to you, you can read about it in books on computer science and programming techniques.)
Note that, with more than four arguments, ifelse can work like a case or switch in other languages. For instance, in ifelse(a,b,c,d,e,f,g), if a matches b, then c; else if d matches e then f; else g.
The m4 info file shows more looping and decision techniques, including a macro named forloop() that implements a nestable for-loop.
This section showed some basic math operations. (The info
file shows more.) You’ve seen that you can quote a single macro
argument that contains a completely separate string (in this case, a
string that prints a number, then runs ifelse to do some more work). This one-line example (broken onto two lines here) is a good hint of m4’s power. It’s a mimimalist language, for sure, and you’d be right to
complain about its tricky evaluation in a global environment, leaving
lots of room for trouble if you aren’t careful. But you might find this
expressive little language to be challenging enough that it’s addictive.
Building Web Pages
Let’s wrap up this m4 introduction with a typical use: feeding an input file to a set of macros to generate an output file. Here, the macro file html.m4 defines three macros: _startpage(), _ul(), and _endpage(). (The names start with underscore characters to help prevent false matches with non-macro text. For instance, _ul() won’t match the HTML tag <ul& gt;.) The _startpage()
macro accepts one argument: the page title, which is also copied into a
level-1 heading that appears at the start of the page. The _ul() macro makes an HTML unordered list. Its arguments (an unlimited number) become the list items. And _endpage() makes the closing HTML text, including a “last change” date taken from the Linux date utility.
Listing Two shows the input file, and Listing Three is the HTML output. The m4 macros that do all the work are shown in Listing Four. (Both the input file and the macros are available online by clicking there.)

Listing Two: webpage.m4h, an“ unexpanded” web page

_startpage(`Sample List')
_ul(`First item', `Second item',
   `Third item, longer than the first two')
_endpage


Listing Three: An m4- generated web page

$ m4 html.m4 webpage.m4h > list.html
$ cat list.html
<html>
<head>
<title>Sample List</title>
</head>
<body>
<h1>Sample List</h1>
<ul>
<li>First item</li>
<li>Second item</li>
<li>Third item, longer than the first two</li>
</ul>


 <p>Last change: Fri Jan 14 15:32:06 MST 2005
</p>
</body>
</html>

In Listing Four, both _startpage() and _endpage() are straightforward. The esyscmd macro is one of the many m4 macros we haven’t covered — it runs a Linux command line, then uses the command’s output as input to m4. The _ul() macro outputs opening and closing HTML <ul> tags, passing its arguments to the _listitems() macro via $@, which expands into the quoted list of arguments.
_listitems() is similar to the countdown() macro shown earlier: _listitems() makes a recursive loop. At the base condition (the end of recursion), when $# (the number of arguments) is 0, the empty third argument means that ifelse does nothing. Or, if there’s one argument ($# is 1), ifelse simply outputs the last list item inside a pair of <li> tags. Otherwise, there’s more than one argument, so the macro starts by outputting the first argument inside <li> tags, then calls _listitems() recursively to output the other list items. The argument to the recursive call is shift($@). The m4 shift macro returns its list of arguments without its first argument — which, here, is all of the arguments we haven’t processed yet.
Notice the nested quoting: some of the arguments inside the (quoted) definition of _listitems()
are quoted themselves. This delays interpretation until the macro is
called. (m4 tracing, which we’ll cover next month, can help you see
what’s happening.)

Listing Four: html.m4, macros to generate an HTML page from Listing Two

define(`_startpage', `
<head>
<title>$1</title>
</head>
<body>
<h1>$1</h1>')dnl
dnl
define(`_endpage', `
<p>Last change: esyscmd(date)</p>
</body>
</html>')dnl
dnl
define(`_listitems', `ifelse($#, 0, ,
   $#, 1, `<li>$1</li>',
   `<li>$1</li>
_listitems(shift($@))')')dnl
define(`_ul', `<ul>
_listitems($@)
</ul>')dnl

To be continued...
This month, you’ve seen some basics of m4: scanning input text, replacing any tokens that match macro names with the macro values.
Next month, we’ll dig deeper into m4: diversions, included files, frozen files, debugging and tracing, and other built-in macros.
If you’d like to do more in the meantime, the GNU m4 info file (type info m4) has a lot of information and examples.

Jerry Peek is a freelance writer and instructor who has
used Unix and Linux for over 20 years. He’s happy to hear from readers;
see https://www.jpeek.com/contact.html. 
You can get sample files from this column by clicking there.
	
[Read previous article]
[Read next article]

[Read Jerry’s other Linux Magazine articles]