Building text files with m4 macros

ArticleCategory:

Webdesign

AuthorImage:

[Photo de l'auteur]

TranslationInfo:

Original in fr John Perr

fr to en:John Perr

AboutTheAuthor:

Linux user since 1994; he is one of the french editor of LinuxFocus.


Abstract:

This tutorial describes how to ease the maintenance of text or HTML files,using the m4 macro processor.


ArticleIllustration:

[Illustration]

ArticleBody:

Introduction

A macro command language is often needed when using a text editor. Most of them already have such languages among their features. Even the C compiler provides such a facility for programmers throught the C preprocessor CPP. When it is used to maintain configuration files or a small web site, the GNU/m4 macro processor can efficiently reduce work load. The GNU/m4 macro processor is part of all linux distributions and is a standard among Unix users.

In the following, we show how to use the GNU/m4 macro processor to maintain a set of HTML pages for a small web site. This system will help to keep the whole site coherent. Of course, there are dozens of ways to obtain the same result with Unix tools; that's the beauty of Unix.

This technique is used for the construction of the well known sendmail.cf. There is an m4 macro kit available from Berkley university and designed by Eric Allman.

The GNU/m4 macro processor is not limited to text and HTML editing. It can prove very useful for programmers wishing to extend the features of CPP or to those wishing to have features equivalent to CPP with other languages.

Definition

A macro processor is a program which interprets commands (named macros) defined by the user. Macros are often embedded into the text to process. For instance, the following definition:

define(AUTHOR,`Agatha Christie<a.christie@scotland-yard.gov>')

allow to use the word "AUTHOR" anywhere in the text. It will be replaced with "Agatha Christie<a.christie@scotland-yard.gov>" after processing it with m4. There are, of course, more useful functions, as will be shown next.

An example

Let's suppose we have to maintain a web site that has the same pages but within different languages. Moreover, each page has the same header and footer in order to give the site a coherent look. In order to keep things simple and thus avoid the use of a browser to see the result, our example will only deal with text. This will also allow people using lynx to easily browse our site. Here is the HTML code for one page:

HTML Version

  
<!-- Start of header -->
<HTML>
<HEAD>
<TITLE>Lynx homepage</TITLE>
<META name="description" content="Site lynx et m4"> 
</HEAD>
<BODY BGCOLOR="#FFFFFF" LINK="#008000" VLINK="#808080" ALINK="#8080FF">
<TABLE>
  <TBODY>
  <TR><TD align=middle colspan="2">
      <H1>Lynx a fully-featured World Wide Web client for character-cell
displays</H1>
  <TR><TD align="left" valign="top" width="15%">
      <a href="./index-en.html">English</A><BR>
      <a href="./index-fr.html">French</A><BR>
      <a href="./index-es.html">Italian</A><BR>
      <a href="./index-it.html">Spanish</A><BR>
      <a href="./index-de.html">German</A><BR>
      <TD align=left>
<!-- End of header -->

      <P>Links to the current sources and support materials for Lynx are
   maintained at <A HREF="http://www.crl.com/~subir/lynx.html">Lynx
   links</A></P>
      <P> and at the Lynx homepage
      <A HREF="http://lynx.browser.org/">Lynx
      Information.</A></P>
      <P>View these pages for information about Lynx, including new
      updates.</P>

      <P>Lynx is distributed under the GNU General Public License (GPL) without
   restrictions on usage or redistribution.  The Lynx copyright statement,
   "COPYHEADER", and GNU GPL, "COPYING", are included in the top-level
   directory of the distribution.  Lynx is supported by the Lynx user
   community, an entirely volunteer (and unofficial) organization.</P>

<!-- Start of footer -->
</TBODY>
</TABLE>
<HR size="0" noshadow>
<FONT SIZE=-2>
<EM>Page maintained by John Perr.<BR>
Page updated on 25/07/99
- © <A HREF="mailto:webmaster@lynx.browser.org">lynx.browser.org</A>1999
</EM></FONT>
</BODY>
</HTML>
<!-- End of footer -->

Here is the result:

[Click here to enlarge image] [Click here to enlarge image]
with lynx with netscape

All pages will have the same header or footer style, only the language and the body of the page will differ. We are now going to design m4 macros that are going to be inserted into the HTML text of our pages in order to replace all the repetitive data.
Before going into the detail of the macros, let us have a look at the above example written with such macros:

Macro Version

  
LYNX_TITRE(Lynx a fully-featured World Wide Web
            client for character-cell displays)
LYNX_ENTETE(Lynx homepage)

      <P>Links to the current sources and support materials
      for Lynx are maintained at
      <A HREF="http://www.crl.com/~subir/lynx.html">
      Lynx links</A></P>
      <P> and at the Lynx homepage
      <A HREF="http://lynx.browser.org/">
      Lynx Information.</A></P>
      <P>View these pages for information about Lynx,
      including new updates.</P>

      <P>Lynx is distributed under the
      GNU General Public License (GPL) without
   restrictions on usage or redistribution.
   The Lynx copyright statement, "COPYHEADER",
   and GNU GPL, "COPYING", are included in the top-level
   directory of the distribution.
   Lynx is supported by the Lynx user community,
   an entirely volunteer (and unofficial) organization.</P>
LYNX_PIED

As such, writing HTML pages is simpler and the text is not lost among HTML tags. To write pages with others languages, translations of this file will have to be built. The french version would be:

  
LYNX_TITRE(Lynx un navigateur en mode console)
LYNX_ENTETE(Un site pour les utilisateurs de lynx)

   <P>Visitez le
   <A HREF="http://lynx.browser.org/">
   site officiel de lynx</A>
   pour plus d'informations sur Lynx,
   y compris les nouvelles mises à jour.</P>
       
   <P>Les liens vers les sources de la version
   courante et divers supports pour Lynx sont
   tenus à jour sur le site
   <A HREF="http://www.crl.com/~subir/lynx.html">
   liens Lynx</A>.</P>

   <P>Lynx est distribue dans le cadre de la lisence GNU
   (General Public License - GPL)
   sans restriction sur son utilisation ni sa distribution.
   Les mentions des droits de reproduction de Lynx, "COPYHEADER",
   et GNU GPL, "COPYING", sont inclus dans la racine de
   l'arborescence de la distribution. Lynx est supporte par
   la communaute des utilisateurs de Lynx, une communaute
   entièrement benevole (et non-officielle).</P>
LYNX_PIED

For each language, the same macros LYNX_TITRE, LYNX_ENTETE and LYNX_PIED are used but with different arguments. These 3 macros allow an efficient replacement for the HTML codes of the header and footer. This is the main advantage of this system: the definition of header and footer is consistent for all the site. If the style of the header and footer has to be changed, only the macro definition file will have to be modified instead of editing each page by hand.

Macro Definitions

Above, 3 macros have been defined in order to achieve most of the formatting. Here is the file defining those macros. Comments follow:

  
divert(-1)
# File mac.css
# Version 1.0 M4 macros for Lynx
#
# A file trans-LANG.m4 is defined for each
# language, based on the french one.
# If no translation file exist,
# french is the default.
#
divert(0)
changequote({,})dnl # change quotes to curly braces
ifdef({LANG},,{define({LANG},{fr})})dnl # Default= french
include({trans-}LANG{.m4})dnl	  # call translation file
undefine({format})dnl		  # Suppress the format definition
define({_ANNEE_},esyscmd(date +%Y))dnl 	  #Current year
define({LYNX_TITRE},{define(_TITLE_,$1)})dnl # First macro
dnl # Second macro
define({LYNX_ENTETE},{<!-- Header start -->
<HTML>
<HEAD>
<TITLE>$1</TITLE>
<META name="description" content="Site lynx and m4"> 
<META name="keywords" content="m4, lynx, GPL">
</HEAD>
<BODY BGCOLOR="#FFFFFF" LINK="#008000" VLINK="#808080" ALINK="#8080FF">
<TABLE>
  <TBODY>
  <TR><TD align=middle colspan="2">
      <H1>_TITLE_</H1>
  <TR><TD align="left" valign="top" width="15%">
      <a href="./index-en.html">_ANGLAIS_</A><BR>
      <a href="./index-fr.html">_FRANCAIS_</A><BR>
      <a href="./index-es.html">_ESPAGNOL_</A><BR>
      <a href="./index-it.html">_ITALIEN_</A><BR>
      <a href="./index-de.html">_ALLEMAND_</A><BR>
      <TD align=left>
<!-- end of header -->})dnl
dnl # Third macro
define({LYNX_PIED},{<!-- Start of footer -->
</TBODY>
</TABLE>
<HR size="0" noshadow>
<FONT SIZE=-2>
<EM>_MAINTENEUR_.<BR>
_MAJ_
esyscmd(date +%d/%m/%y)
- &copy <A HREF="mailto:webmaster@lynx.browser.org">
lynx.browser.org</A>
_ANNEE_</EM></FONT>
</BODY>
</HTML>
<!-- End of footer -->})dnl

Description

Lines between "divert(-1)" and "divert(0)" are comments. "Divert" is one of the builtin macros of the m4 processor. It is designed to divert the output of the processor. Using -1, tells the processor not to write the lines coming next in the final HTML file, which is what we want.

The "changequote" macro redefines the quotes normally used to quote macro arguments. They are replaced here with curly braces because in text files, and especially french ones, quotes are heavily used and would introduce misinterpretation of macros. Curly braces are less often used for text or HTML, so that is why they have been choosen here.

The "ifdef" macro is used to test whether the macro LANG is defined and to default it to "fr" if not. The LANG macro is used to set the language. In the lines below, we shall see how to define it when calling m4, in order to choose the language of the HTML page.

The "include" line has the same meaning as with C and is used to include an external file. We use it to load the language specific macro definitions used in the header and footer. Here are its contents according to language:

  
divert(-1)
# File trans-fr.m4
# Definitions for french
divert(0)
define({_ANGLAIS_},{Anglais})dnl
define({_FRANCAIS_},{Français})dnl
define({_ITALIEN_},{Espagnol})dnl
define({_ESPAGNOL_},{Italien})dnl
define({_ALLEMAND_},{Allemand})dnl
define({_WEBMASTER_},{John Perr})dnl
define({_MAINTENEUR_},{Page maintenue par _WEBMASTER_})dnl 
define({_MAJ_},{Date de mise à jour:})dnl
  
divert(-1)
# File trans-en.m4
# Definitions for english
divert(0)
define({_ANGLAIS_},{English})dnl
define({_FRANCAIS_},{French})dnl
define({_ITALIEN_},{Spanish})dnl
define({_ESPAGNOL_},{Italian})dnl
define({_ALLEMAND_},{German})dnl
define({_WEBMASTER_},{John Perr})dnl
define({_MAINTENEUR_},{Page maintained by _WEBMASTER_})dnl
define({_MAJ_},{Page updated on })dnl

If you speak Spanish, Italian or German, you should be able to write similar files for these languages.

The line "undefine" suppresses the default definition of the built-in named "format" because it is not used here. If this line is omitted, each time the word "format" appears within the text will disappear unless it is quoted, i.e., surrounded with curly braces. Such a practice is not advisable when editing a simple web page.

Next comes the definition of the current year. It is obtained from the macro "easyscmd" which calls the unix command "date". This command is also used within the definition of the footer in order to print the date at which the page has been updated.

The following line defines the first of our three main macros: LYNX_TITRE. This macro defines another macro called _TITRE_. This double definition is necessary in order to use the title several times within the header and footer of the page from one definition. Note the use of $1 to refer to the first argument of the macro.

The remaining lines define the two others main macros: LYNX_ENTETE and LYNX_PIED which correspond to the contents of the header and footer of our HTML page except for the variable elements of the page. These are:

The "dnl" which appears at the end of each line is a built-in macro of m4 meaning "Delete to New Line". With "dnl" m4 does not generate an empty line when interpreting a macro.

Creating pages

Now that our system is set, the generation of a web page from the files is done with the following command:

Where "XX" is the code to use for each language. Note that the -D option is used, as with gcc, to define a macro from the command line.

Summary

The table below presents the files and their use in this application.

The following files are used to generate HTML pages:
index-XX.html The body of the page, that is text written by the author or the translator. It is different for each page and each language. (the code is XX=en for English, es for Spanish, etc...)
mac.css Standard definitions. This file is common to all pages and all languages. it can be seen as a sort of style sheet.
trans-XX.m4 Standard definitions for one language. this file is common to all pages for one language. (the code is XX=en for English, es for Spanish, etc...)

Conclusion

Despite its power, the m4 macro processor cannot be compared to a scripting language like Perl or Tcl. Once its few peculiarities have been acquired, it is a quick and handy tool to help process text files. To learn more, consult the documentation bundled with your distribution. You should find a tutorial to m4, about 30 pages long, that covers all the aspects of the GNU/m4 macro processor. You can also have a look at the site of the Linux User Group of Bordeaux (ABUL) which is maintained with a kit of m4 macros, similar to those presented here.

Links

GNU/m4 is available from ftp://prep.ai.mit.edu/pub/gnu/m4-1.4.tar.gz
Download the files presented here: The Lynx m4 macro kit

Thank you to Paul Kienzle for reviewing this article.