The FXPDFSCAN process scans a nominated folder for PDF files deposited there by a customer-supplied program, extracts the fax number and other variables from the text, and generates an FS file to fax or e-mail the PDF document or another nominated document.  To allow customization of the text extraction, the fields to extract can be specified using 'regular expressions' to match patterns in the text file.  Some useful information about regular expressions can be found at http://www.regular-expressions.info.

The setup screen for this process selects the folder to be scanned and the USR file which will be used to send the document. The file extension scanned for defaults to .PDF.

The USR file should include FS template items on $fs_template commands.  These commands, and USR $var_def commands are scanned for a substring REGEX: followed by a regular expression, all  in double-quotes.  The regular expression must create a subexpression (backreference) using either the special names FAXNUM or EMAILTO (which specify the destination and also cause the text file itself to be included in the message) or the special name ROUTETO (which specifies the receiver name) or any other name which must match the variable name with which the regular expression is associated.

The commands scanned are:

The regular expression on a $fax_phone command is assumed to produce a named subexpression (backreference) with the name FAXNUM.  In this case the file is converted to a TIF file in the CALLBACK\TEMP folder and a $fax_filename command is added for it.  An example of a regular expression which extracts the number from an angle-bracketed tagged expression <TOFAXNUM:16417416000> would be:

    $fs_template $fax_phone "REGEX:<TOFAXNUM:(?<FAXNUM>.*(?=>))"

The regular expression on a $email_address command is assumed to produce a named subexpression (backreference) with the name EMAILTO.  In this case the file is saved into the CALLBACK\TEMP folder and an $email_attach command is added for it.  An example of a regular expression which extracts an e-mail address from an angle-bracketed tagged expression <TOEMAIL:tim@copia.com> follows (note that the e-mail address should not contain angle-brackets):

    $fs_template $fax_phone "REGEX:<TOEMAIL:(?<EMAILTO>.*(?=>))"

The regular expression on a $fax_receiver command is assumed to produce a named subexpression (backreference) with the name ROUTETO.  For example:

    $fs_template $fax_receiver "REGEX:<TONAME:(?<ROUTETO>.*(?=>))"

The regular expression on a $var_def command is assumed to produce a named subexpression (backreference) which must have a name which is the same as the variable name: For example:

    $var_def WHO "REGEX:<WHO:(?<WHO>.*(?=>))"

The regular expression is used to scan each separate text item extracted from the PDF.  Note that if the angle-brackets enclose a string of multiple spaces, the PDF extraction may assume they are separate fields and the extraction will fail to match the < and > in separate fields.

The matched text is deleted from the PDF unless the USR file contains a $var_def variable named PDFSCAN_NOSTRIP which is non-empty.  By default only the first page is scanned, unless the USR file contains a $var_def variable named PDFSCAN_PAGELIMIT which is defined as a larger maximum number of pages.  To limit the scan to the top section of the page, the USR file can contain a $var_def variable named PDFSCAN_TOPLIMIT which is defined as the number points (1/72 inch) from the top to scan for matched variables.

The PDF file is converted to TIF using the internal PDF converter.  You can add any of the conversion options mentioned in that topic to a QPDF_OPTIONS variable in the USR file.  To suppress the conversion and use an alternate converter such as Adobe Reader, define a PSFSCAN_NOCONVERT variable with a non-empty value in the USR file.

As the FS file is created the PDF file is moved to the CALLBACK\TEMP folder (with the converted TIF file if applicable), from which it will be referenced in the FS file.  You should arrange to delete these files at intervals, for example using the CFHK utility.

A variable named SCAN_PATH is always added to the FS, containing the name of the folder which is scanned for text files.  This can be used to locate an attachment file which is placed in the scanned folder alongside the text file.  If this variable is used on a $fax_filename or $email_attach command, the file is moved to CALLBACK\TEMP and the command is adjusted to reference the new pathname.  Only CFG variables and variables extracted from regular expressions are expanded on this command.  For example:

$var_def SECONDFILE "REGEX: .... (?<SECONDFILE> ....)"

$fs_template $fax_filename "@SCAN_PATH\@SECONDFILE"

might become in the resulting FS file:

$var_def SECONDFILE "123456.PDF"

$fax_filename "@FFREQ\TEMP\123456.PDF"

The FS file is written to the specified TOSEND folder, unless the USR file contains an $fs_template $fax_pre-process command or PFDSCAN_NOCONVERT is specified (see above), in which case the FS file is written to PREPROC.