Jan 10, 2008

Reading and extracting data from a word file attached to an email

A customer receives intervention requests via email and wants to have them pushed as tickets in ITSM v7. These requests come a attached word files with a standard format.

ITSM v7 runs on version 7.0.1 of ARS on Solaris.
The customer won't change the format for a regular ARS-template-shaped email.


1/ Download antiword sources here and compile

Antiword allows us to display the text contained in a word file.
It is maintained as a package in various linux distributions like Debian or Fedora, and even available in Cygwin, but not on Solaris.
It compiled like a charm on Solaris once gcc was installed.

2/ Write a perl/bash+awk/foobar/... script

I did this with perl, overusing conditions and regular expressions to generate a string that looks like this :

The field identifier must be unique, and your (separator) must not be found in a value. I used ";" but something more complex like "@X@Y@Z@" or "Wh@t3v3R" for example would be a better choice.

3/ Create an ARS form

This is were attachments will be pushed and analyzed.
You need :
- a long field for the post-treatment string,
- an attachment pool with one attachment field,
- a temporary display only integer field,
- a temporary display only character field [0],
- an AttachmentID character field [40],
- one field per word field

4/ Push attachments to your form

Add a filter to your "AR System Email Attachments" that pushes the document to your form if its name matches whatever its name should match.

5/ Add filters to your ARS form

- the first filter saves the .doc file into your temporary folder and executes your script to get the big string out of it and set it into the long field.
- one filter per field :
* set the location of the first occurence of the researched field name into the temporary integer field (STRSTR)
Tmp_Int = STRSTR($String$, "headerdate=")
* set the substring of the string starting from the fieldname to the end into the temporary character field (SUBSTR)
Tmp_Char = SUBSTR($String$, $Tmp_Int$)
* set the location of the first occurence of your separator into the temporary integer field (STRSTR)
Tmp_Int = STRSTR($Tmp_Char$, "")
* you have the beginning and the end of the value you are searching for, set the substring into the corresponding field of your form (SUBSTR)
headerdate = SUBSTR($Tmp_Char$, + 1 , $Tmp_Int$ - 1)

6/ Push the fields to a new ticket

A last filter must now be able to push the data into the HPD:Interface_Create form.

You can see my perl script here.

No comments: