GMLP Markup Language Processor
Gmlp
Post text is "processed" based on a small array of regular expressions and a few replacement strings — a "translation" table. This array is a separate, user-defined file, editable by Admin (see file ADMIN).
The GMLP code is in the module CONVERT.PHP and the translation table is in TRANSLATE.INI
. It is recommended that this translation table be customized.
This code is based on the actual GMLP API.
Conversions
There are four types of conversions that can be defined.
Character
Word
Line
Block
All conversions are processed as key/value associative arrays, with the key being replaced by the value for most conversions, the exceptions being the "character pair" and the "block" conversions.
Character Conversions
These are defined very differently than the other conversions and are like:
$translate['chars'] = array(
'*' => 'b',
'_' => 'u',
'\'' => 'code'
);
The keys are pairs so that the conversions are like, *bold* becomes <b>bold</b>, etc.
Another character conversion (although that is a bit of a misnomer) is an array of string substitutions:
$translate['entities'] = array(
'--' => '—',
);
These substitutions need to be enabled by a configuration setting of entitytranslate = 1
.
Word Conversions
There are two type of these. The first calls a function on every match. This one converts text between double single quotes ('') via htmlentities()
:
$translate['codes'] = array(
"/''(.*)''/U" => 'htmlentities'
);
The second simply does a replacement. These convert inline URLs, double quoted sentences and PHP functions:
$translate['inlines'] = array(
'/(http[s]*:\/\S*)(:)([a-zA-Z0-9 _]+)/' => '<a href="$1">$3</a>',
'/""(.*)""/U' => '<em>"$1"</em>',
'/([a-zA-Z_]+\(\))/' => '<code>$1</code>',
);
Line Conversions
These convert lines to paragraphs. There are three ways to do this. Here is an example of them:
$translate['lines'] = array(
'/^<pre>.*<\/pre>/' => '',
'/^[A-Z \/&\?\'!]+$/' => 'convertcase',
'/^h([1-6]):\s*(.*)/' => '<h$1>$2</h$1>',
);
If the value is an empty string the line gets used as is; if the value is a function the line (the regular expression result) is converted by that function; else the line is replaced by the value.
These are slightly complicated in that in the first two cases character and word conversions are bypassed, but not in the latter. For example:
h2:Header *Five*
will end up as:
<h2>Header <b>Five</b></h2>
There are two others that need special mention:
'/^<(message|notice)>/' => '<div class="$1">',
'/^<\/(message|notice)>/' => '</div>',
These are for enclosing text to be styled by a CSS class (see HTM/COMMON.CSS). These of course need to be each on it's own line.
Block Conversions
These convert one or more paragraphs in a very odd way. Here is an example of a block conversion definition:
$translate['blocks']['php'] = array(
'begin' => '/^\s*<\?php/',
'end' => '/^\s*\?>/',
'post' => 'highlightstr',
'first' => 1,
'last' => 1,
);
A block starts with the begin
tag and ends with the end
tag. In that example post
means to post-process the block by the function highlightstr()
. The keys first
and last
tell the code to keep the begin and end tags in the block (otherwise they will be discarded).
Here is the function:
<?php
function highlightstr($s) {
return highlight_string($s,TRUE);
}
?>
Here is a more complex example:
$translate['blocks']['space'] = array(
'begin' => '/^\s{1,8}/',
'end' => '/(^\S+|^$)/',
'pre' => '<pre>',
'replace' => 'htmlentities',
'post' => '</pre>',
'first' => 1,
'last' => 1,
'continue' => 1,
);
Here, pre
and post
are strings to prepend and postpend the block with; replace
replaces each line with the result of the function htmlentities()
; and continue
means to put back that end line in the the text. (post
and replace
have dual roles.)
The funny begin
expression is to not conflict with this lines
expression:
'/^\s{9,}(.*)/' => '<span style="display:block;text-align:center;">$1</span>',
This is so because these translations are performed in order: Lines, Words, Characters. With two special cases: sometimes (as explained above) a Line will have Word, Character conversions applied, and Blocks can have Word, Character conversions applied if the key convert
is defined to true.
There is one final translation to note:
$translate['line'] = array(
'/^\t(.*)/' => ' $1',
);
This one is always applied, except for lines
that continue (values of empty string or a function).
Notes
1. That is not a word but obviously it should be.
2. This is to make sure that a space-block does not "eat" the ending line which might be part of regular text.
3. This one is really just to allow offline created text (by a text editor) to be pasted into the Admin post form which may have lines beginning with a TAB.