Some days ago I implemented web harvesting functionality for a customer working this way:
A business analysts places a request (trigger) file with some meta data on a network folder and BizTalk catches it up, performs a http post (HTTP Adapter) containing the meta data from the request file.
Easily achieved with the great RawString class from Microsoft.
The result of the post is a html which I caught up with a multi-part message with one body part of type XmlDocument.
Using RawString as type for the result body part as well would only work in case you have a pipeline which sets the message type to RawString in the message context.
Otherwise you receive some error like “Type “” of message is unknown…”
But using XmlDocument isn’t that nice at all too. When you try to access the XmlDocument in the orchestration it is “just in time” parsed and the XmlReader will perform a DTD validation because of the html root element. So it tries to download the DTD from “http://www.w3.org/1999/xhtml” and maybe fails because there is a firewall blocking this.
A web request timeout error message will appear in the log.
So you have the option to strip off the html root element in a custom pipeline (which I didn’t want to do) or you can extend the nice RawString class by a further constructor accepting a XLANGMessage as parameter. The code looks like this:
public RawString(XLANGMessage message, int bodyPartIndex)
if (message == null)
throw new ArgumentNullException("message");
using (var stream = (Stream)message[bodyPartIndex].RetrieveAs(typeof(Stream)))
using (var sr = new StreamReader(stream, true))
internalRepresentation = sr.ReadToEnd();
You can use it in this way in order to get your html from the result message. Afterwards you can easily access the html as string directly or maybe send it to a file.
If that helps feel free to leave a comment below….