Documentation Index

GMLP Markup Language Processor

Gmlp

Post text is "processed" based on a small array of regular expressions and a few replacement strings — a "translation" table. This array is a separate, user-defined file, editable by Admin (see file ADMIN).

The GMLP code is in the module MOD/CONVERT.PHP and the translation table is in TRANSLATE.INI. It is recommended that this translation table be customized.

This code is based on the actual GMLP API.

Conversions

There are four types of conversions that can be defined.

Character
Word
Line
Block

All conversions are processed as key/value associative arrays, with the key being replaced by the value for most conversions, the exceptions being the "character pair" and the "block" conversions.

Character Conversions

These are defined very differently than the other conversions and are like:

        $translate['chars'] = array(
        
'*' => 'b',
        
'_' => 'u',
        
'\'' => 'code'
        
);

The keys are pairs so that the conversions are like, *bold* becomes <b>bold</b>, etc.

Another character conversion (although that is a bit of a misnomer) is an array of string substitutions:

        $translate['entities'] = array(
        
'--' => '&mdash;',
        );

These substitutions need to be enabled by a configuration setting of entitytranslate = 1.

Word Conversions

There are two type of these. The first calls a function on every match. This one converts text between double single quotes ('') via htmlentities():

        $translate['codes'] = array(
        
"/''(.*)''/U" => 'htmlentities'
        
);

The second simply does a replacement. These convert inline URLs, double quoted sentences and PHP functions:

        $translate['inlines'] = array(
        
'/(http[s]*:\/\S*)(:)([a-zA-Z0-9 _]+)/' => '<a href="$1">$3</a>',
        
'/""(.*)""/U' => '<em>"$1"</em>',
        
'/([a-zA-Z_]+\(\))/' => '<code>$1</code>',
        );

Line Conversions

These convert lines to paragraphs. There are three ways to do this. Here is an example of them:

        $translate['lines'] = array(
        
'/^<pre>.*<\/pre>/' => '',
        
'/^[A-Z \/&\?\'!]+$/' => 'convertcase',
        
'/^h([1-6]):\s*(.*)/' => '<h$1>$2</h$1>',
        );

If the value is an empty string the line gets used as is; if the value is a function the line (the regular expression result) is converted by that function; else the line is replaced by the value.

These are slightly complicated in that in the first two cases character and word conversions are bypassed, but not in the latter. For example:

        h2:Header *Five*

will end up as:

        <h2>Header <b>Five</b></h2>

There are two others that need special mention:

        '/^<(message|notice)>/' => '<div class="$1">',
        
'/^<\/(message|notice)>/' => '</div>',

These are for enclosing text to be styled by a CSS class (see HTM/COMMON.CSS). These of course need to be each on it's own line.

Block Conversions

These convert one or more paragraphs in a very odd way. Here is an example of a block conversion definition:

        $translate['blocks']['php'] = array(
        
'begin' => '/^\s*<\?php/',
        
'end' => '/^\s*\?>/',
        
'post' => 'highlightstr',
        
'first' => 1,
        
'last' => 1,
        );

A block starts with the begin tag and ends with the end tag. In that example post means to post-process the block by the function highlightstr(). The keys first and last tell the code to keep the begin and end tags in the block (otherwise they will be discarded).

Here is the function:

        <?php
        
function highlightstr($s) {
            return 
highlight_string($s,TRUE);
        }
        
?>

Here is a more complex example:

        $translate['blocks']['space'] = array(
        
'begin' => '/^\s{1,8}/',
        
'end' => '/(^\S+|^$)/',
        
'pre' => '<pre>',
        
'replace' => 'htmlentities',
        
'post' => '</pre>',
        
'first' => 1,
        
'last' => 1,
        
'continue' => 1,
        );

Here, pre and post are strings to prepend and postpend the block with; replace replaces each line with the result of the function htmlentities(); and continue means to put back that end line in the the text. (post and replace have dual roles.)

The funny begin expression is to not conflict with this lines expression:

        '/^\s{9,}(.*)/' => '<span style="display:block;text-align:center;">$1</span>',

This is so because these translations are performed in order: Lines, Words, Characters. With two special cases: sometimes (as explained above) a Line will have Word, Character conversions applied, and Blocks can have Word, Character conversions applied if the key convert is defined to true.

There is one final translation to note:

        $translate['line'] = array(
        
'/^\t(.*)/' => '        $1',
        );

This one is always applied, except for lines that continue (values of empty string or a function).

Notes

1. That is not a word but obviously it should be.
2. This is to make sure that a space-block does not "eat" the ending line which might be part of regular text.
3. This one is really just to allow offline created text (by a text editor) to be pasted into the Admin post form which may have lines beginning with a TAB.