NL = " " FROMLINE = 'From someone@somewhere.com Fri Jun 19 18:51:56 1998' LOG = "debug: FROMLINE is: '$FROMLINE'$NL" REGEXP1 = '[^ ]+ +.*\/..:' REGEXP2 = '[^ ]+ .*\/..:' REGEXP3 = '()\/..:' :0 * FROMLINE ?? $ $REGEXP1 { CASE1=$MATCH } :0 * FROMLINE ?? $ $REGEXP2 { CASE2=$MATCH } :0 * FROMLINE ?? $ $REGEXP3 { CASE3=$MATCH } LOG = "${NL}REGEXP1 = '$REGEXP1'${NL}MATCH = '$CASE1'${NL}" LOG = "${NL}REGEXP2 = '$REGEXP2'${NL}MATCH = '$CASE2'${NL}" LOG = "${NL}REGEXP3 = '$REGEXP3'${NL}MATCH = '$CASE3'${NL}" HOST
The first and second regexps should always match the same text. However, the procmail regexp engine is confused by the extra possibilities offered by the redundant '+' operator in the first regexp and loses track of the start of the text to be extracted. This can be seen in what is output by procmail when you feed it the above rcfile, as shown below. The third regexp happens to always match the same thing as the second (correctly working) regexp for the particular application here: extracting the hour from the timestamp in the "From " pseudo-header.
debug: FROMLINE is: 'From someone@somewhere.com Fri Jun 19 18:51:56 1998' REGEXP1 = '[^ ]+ +.*\/..:' MATCH = ' Fri Jun 19 18:51:' REGEXP2 = '[^ ]+ .*\/..:' MATCH = '18:' REGEXP3 = '()\/..:' MATCH = '18:'
Thanks to Stan Ryckman