Integration via email is a feature of last resort
Integration via email is a feature of last resort
Email is probably the single thing that you pretty much guarantee every company is capable of. That doesn’t mean that it’s a good thing to do integration with; the way that various mail clients can / will munge the data is quite extraordinary. It’s a given that at some point you will have received an email that has an attachment called winmail.dat or emails that inexplicably have an empty body, and an attachment called ATT00001.htm that actually contains the email. That last situation seems to occur quite a lot when an message generated by Outlook (with graphics in the signature) is forwarded by someone using a Mac.
The Adapter can of course attach to either an IMAP or POP3 mailbox and collect messages; I personally regard it as a feature of last resort1 but sometimes you have to do what you have to do. There are two ways in which we can process incoming emails, firstly by using DefaultMailConsumer which has the default of treating each and every MIME part of a message as a new unique message which needs processing. Often that generates a lot of spurious messages which need further classification given the propensity for people to include pictures and other silliness in their signatures; Secondly, there’s RawMailConsumer which doesn’t do any parsing of the MIME message, and just gives it to you verbatim; it’s a lot like trying to read your email via telnet to port 110.
An example of an email message is shown below (server names and recipients have been changed to protect the guilty). In this case, the MIME Message contains 3 parts; 2 attachments (image003.jpg and image004.jpg) and a nested MIME part which is the actual body of the message, which contains both HTML and plain text. This is a classic outlook message in fact, where the sender has chosen to include some nonsense graphic (or two) as part of their signature.
Delivery-date: Thu, 10 Oct 2013 10:36:53 +0100
Received: from mta.nobody.com ([10.9.8.7]:61409 helo=mta.somebody.com)
by mta.anybody.com with esmtps (TLSv1:AES128-SHA:128)
(Exim 4.80.1)
(envelope-from <nobody@nobody.com>)
id 1VUCft-00181b-9y
for somebody@somebody.com; Thu, 10 Oct 2013 10:36:53 +0100
Received: from mta.nobody.com by
mta.somebody.com with mapi id
14.01.0218.012; Thu, 10 Oct 2013 10:39:26 +0100
From: Nobody <nobody@nobody.com>
To: "Somebody" <somebody@somebody.com>
CC: "noreply@anybody.com" <noreply@anybody.com>
Subject: Test Multipart Message
Thread-Topic: Test Multipart Message
Thread-Index: Ac7FnKF9z5bxAPmzRDSQ45ZBY6YW3A==
Date: Thu, 10 Oct 2013 09:39:26 +0000
Message-ID: <AA217B98DF7BEB479FEA1BAF75D31ADBA57C11@mta.nobody.com>
Accept-Language: en-GB, en-US
Content-Language: en-US
X-MS-Has-Attach: yes
X-MS-TNEF-Correlator:
Content-Type: multipart/related;
boundary="_005_AA217B98DF7BEB479FEA1BAF75D31ADBA57C11mtanobodycom_";
type="multipart/alternative"
MIME-Version: 1.0
--_005_AA217B98DF7BEB479FEA1BAF75D31ADBA57C11mtanobodycom_
Content-Type: multipart/alternative;
boundary="_000_AA217B98DF7BEB479FEA1BAF75D31ADBA57C11mtanobodycom_"
--_000_AA217B98DF7BEB479FEA1BAF75D31ADBA57C11mtanobodycom_
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Hello World
--_000_AA217B98DF7BEB479FEA1BAF75D31ADBA57C11mtanobodycom_
Content-Type: text/html; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
<html>
<body>
<h1>Hello World</h1>
<p class=3D"MsoNormal"><span style=3D"color:black"><img border=3D"0" width=
=3D"500" height=3D"10" id=3D"Picture_x0020_2" src=3D"cid:image003.jpg@01CEC=
5A3.B2FCDBE0" alt=3D"bar"></span><span lang=3D"EN-US" style=3D"color:black"=
></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:9.0pt; color:black"><img wi=
dth=3D"128" height=3D"50" id=3D"Picture_x0020_1" src=3D"cid:image004.jpg@01=
CEC5A3.B312ADD0" alt=3D"my_company___logo_cmyk-small"></span><span lang=3D"=
EN" style=3D"font-size:9.0pt; font-family:"Arial","sans-seri=
f"; color:#1F497D"></span></p>
</body>
</html>
--_000_AA217B98DF7BEB479FEA1BAF75D31ADBA57C11mtanobodycom_--
--_005_AA217B98DF7BEB479FEA1BAF75D31ADBA57C11mtanobodycom_
Content-Type: image/jpeg; name="image003.jpg"
Content-Description: image003.jpg
Content-Disposition: inline; filename="image003.jpg"; size=4919;
creation-date="Thu, 10 Oct 2013 09:30:30 GMT";
modification-date="Thu, 10 Oct 2013 09:30:30 GMT"
Content-ID: <image003.jpg@01CEC5A3.B2FCDBE0>
Content-Transfer-Encoding: base64
.. content skipped
--_005_AA217B98DF7BEB479FEA1BAF75D31ADBA57C11mtanobodycom_
Content-Type: image/jpeg; name="image004.jpg"
Content-Description: image004.jpg
Content-Disposition: inline; filename="image004.jpg"; size=2632;
creation-date="Thu, 10 Oct 2013 09:30:30 GMT";
modification-date="Thu, 10 Oct 2013 09:30:30 GMT"
Content-ID: <image004.jpg@01CEC5A3.B312ADD0>
Content-Transfer-Encoding: base64
... content skipped.
--_005_AA217B98DF7BEB479FEA1BAF75D31ADBA57C11mtanobodycom_--
We don’t actually care about the images, or even the HTML part of the message, we only care about text/plain part of the MIME message; Hello World. If we were to process that mail message using DefaultMailConsumer then it would generate at least 2 messages; which isn’t useful. In situations like this I end up using RawMailConsumer and process the raw data within the Adapter directly.
In order to extract “Hello World” without the surrounding MIME nonsense can be done by using the MimePartSelector service multiple times; as in the following example.
<service xsi:type="java:com.adaptris.core.services.mime.MimePartSelector">
<selector xsi:type="java:com.adaptris.mail.SelectByPosition">
<position>0</position>
</selector>
<preserve-part-headers-as-metadata>true</preserve-part-headers-as-metadata>
</service>
<service xsi:type="java:com.adaptris.core.services.metadata.RegexpMetadataService">
<regexp-metadata-query>
<query-expression>(?s)(.*)</query-expression>
<metadata-key>originalpayload</metadata-key>
</regexp-metadata-query>
</service>
<service xsi:type="java:com.adaptris.core.services.metadata.PayloadFromMetadataService">
<metadata-tokens>
<key-value-pair>
<key>Content-Type</key>
<value>__CONTENT_TYPE__</value>
</key-value-pair>
<key-value-pair>
<key>originalpayload</key>
<value>__PAYLOAD__</value>
</key-value-pair>
</metadata-tokens>
<template><![CDATA[Mime-Version: 1.0
Content-Type: __CONTENT_TYPE__
__PAYLOAD__
]]>
</template>
</service>
<service xsi:type="java:com.adaptris.core.services.metadata.RemoveMetadataService">
<reg-exp>originalpayload</reg-exp>
</service>
<service xsi:type="java:com.adaptris.core.services.mime.MimePartSelector">
<selector xsi:type="java:com.adaptris.mail.SelectByHeader">
<header-value-reg-exp>.*text/plain.*</header-value-reg-exp>
<header-name>Content-Type</header-name>
</selector>
</service>
The main problem with handling nested MIME parts is that the JavaMail API doesn’t play nicely if the message isn’t considered complete (previous versions of JavaMail would throw an exception, the current version doesn’t appear to, it just behaves somewhat erratically); The headers aren’t considered part of the data, so MimePartSelector either discards them or preserves them as metadata. What we need to do is we recreate the minimum set of headers that are required for making the message RFC2045 compliant each time that we finish using MimePartSelector. So to describe the above adapter configuration in sequence :
- Select the first MIME part and make that the message (this discards the parts containing image003.jpg and image004.jpg); we preserve the part headers as metadata without a prefix; so after this service is executed there is a new item of metadata the key is Content-Type and the value is multipart/alternative; boundary=”xxx”.
- Perform a regular expression in DOTALL mode selecting all the data and storing it against the metadata key originalpayload; at this point there are two items of metadata Content-Type and originalpayload. Enabling DOTALL mode is important otherwise you’re only going to match the first line.
- Re-create the payload with the appropriate MIME headers using PayloadFromMetadataService; hopefully it should be pretty obvious what is happening here, we are inserting the values associated with Content-Type and originalpayload wherever we see the token
__CONTENT_TYPE__
and__PAYLOAD__
respectively. - Remove the metadata item originalpayload; you don’t have to, but we aren’t going to use it again.
- Do the final MimePartSelector which selects by the Content-Type header, where it contains text/plain; you could do another select by position here, but I’ve mixed it up a little.
- At the end of this service chain; you end up with just
Hello World
as your payload. If any of the MIME parts had a Content-Transfer-Encoding such as base64, then it would have automatically been converted at the same time.
These features are at odds our core design, and will therefore always present some difficulties in terms of performance, ease of use, and integration. As such, when using them, you won’t get the best possible experience. Due to our compatibility rules, these features will continue to exist — but we want to make you aware that there are often better alternatives. You should give careful consideration to alternatives… ↩︎