]> Inner/Outer XHTML – or how to get truly original source of element in JavaScript? « MOleYArd (MOYA) blog

About

This blog mostly presents different approaches, methods and practices in software and web development, but also contains some "out of main topic" articles. MOYA (MOleYArd) products are presented here as well.

Follow

 


Valid XHTML 1.0 Transitional

Inner/Outer XHTML – or how to get truly original source of element in JavaScript?

There are some ways to get the original source of any HTML element in JavaScript. And they work fine in most situations. However, there are situations where these ways are not really sufficient and I especially mean not cross-browser enough.

innerHTML property is probably the most frequently used method, but it has some well-known drawbacks. First of all, innerHTML property does not contain (X)HTML written by original author of (X)HTML page. Instead, it contains some sort of browser specific Document Object Model string representation. Especially in IE this representation is quite ugly (missing quotes, capital letters of tags, …). And thus we encounter another problem – content of innerHTML property differs in different browsers!

But why is this important that it differs? Should we care? Maybe, we should. There might be situations when we may need the same value of innerHTML property for all browsers. There are always some tricks to get around this need and make it in other way, or simply make different solutions for different browsers. But wouldn’t it be nice to have function that not only returns same content in all modern major browsers, but moreover returns exactly the content that was created by author of webpage? Getting truly original source code is in fact another thing that might be sometimes useful.

So let’s show how to do this. I have written such a function for you.

The means to accomplish the most important initial building block of this task is XMLHttpRequest. We need to get the original source code of the whole page. Here is the function:

function originalPageSource()
{
	var httpRequest = new XMLHttpRequest();
	if (window.ActiveXObject)
		httpRequest = new ActiveXObject ("Microsoft.XMLHTTP");
	else if (window.XMLHttpRequest)
		httpRequest = new XMLHttpRequest();

	httpRequest.open('GET',document.location.href,false);
	httpRequest.send(null);
	if (httpRequest.status === 200)
	{
		return httpRequest.responseText;
	}
	else return false;
}

This first function will be contained within the main function called originalInnerXHTML. It takes two parameters. The first parameter element is the required (X)HTML element, of which HTML content we want to see. The first parameter includeElement is a boolean parameter and determines whether to include element itself in (X)HTML string. This is known as an outerHTML function or property, so I as well made originalOuterXHTML to not to be required to call function with two parameters for this task. So finally, here is the code or originalInnerXHTML:

function originalInnerXHTML (element, includeElement)
{
var elementXHTML = originalPageSource();
var elementIndex = 0;
var elementNodeName = element.nodeName;
var elements=document.getElementsByTagName(elementNodeName);

for (var i = 0; i < elements.length; i++)
{
	if (element === elements[i])
	{
		 elementIndex=i;
		 break;
	}
}
openingElementRegExp = new RegExp("<"+elementNodeName, "i");
closingElementRegExp = new RegExp("</"+elementNodeName, "i");
for(var i=0; i <= elementIndex; i++)
{
	elementXHTML=elementXHTML.substring
	(
		elementXHTML.search(openingElementRegExp)+1
	);
}

elementXHTML="<"+elementXHTML;
if
(
	element.hasChildNodes() == false &&
	(elementXHTML.indexOf("/") < elementXHTML.indexOf(">") ||
	elementXHTML.search(closingElementRegExp)==-1)
)
{
	if (includeElement)
	{
		elementXHTML=elementXHTML.substring(0, elementXHTML.indexOf(">")+1);
		return elementXHTML;
	}
	else return "";
}
elementXHTMLCopy=elementXHTML;

while
(
	elementXHTMLCopy.search(new RegExp(openingElementRegExp))
	< elementXHTMLCopy.search(new RegExp(closingElementRegExp))
)
{
	if (elementXHTMLCopy.search(openingElementRegExp)!=-1)
	{
		elementXHTMLCopy=elementXHTMLCopy.substring
		(
			elementXHTMLCopy.search(openingElementRegExp)
		);
		elementXHTMLCopy="<"+elementXHTMLCopy;
		elementXHTMLCopy=elementXHTMLCopy.substring
		(
			elementXHTMLCopy.search(closingElementRegExp)
			+ elementNodeName.length+3
		);
		elementXHTMLCopy="<"+elementNodeName+">"+elementXHTMLCopy;
	}
}

var endOffset = elementXHTMLCopy.length - elementXHTMLCopy.search(closingElementRegExp);
elementXHTML = elementXHTML.substring(0,elementXHTML.length - endOffset);
elementXHTML = elementXHTML+"</"+elementNodeName.toLowerCase()+">"

if (includeElement) return elementXHTML;
else
{
	elementXHTML=elementXHTML.substring
	(
		elementXHTML.indexOf(">")+1,elementXHTML.lastIndexOf("<")-1
	);
	return elementXHTML;
}

function originalPageSource ()
{
	var httpRequest = new XMLHttpRequest();
	if (window.ActiveXObject)
		httpRequest = new ActiveXObject("Microsoft.XMLHTTP");
	else if (window.XMLHttpRequest)
		httpRequest = new XMLHttpRequest();
	httpRequest.open('GET',document.location.href,false);
	httpRequest.send(null);
	if (httpRequest.status === 200)
	{
		return httpRequest.responseText;
	}
	else return false;
}
}

And finally source of originalOuterXHTML:

function originalOuterXHTML(element)
{
	return originalInnerXHTML(element, true);
}

Limitations

The main limitation lies in the fact that this function returns only the original source code, so it cannot show potential javascript source code modifications. After the original source code is changed by script, we often simply have to rely only on internal browser object model representation… But after all, it is programmer’s responsibility in exactly which way his or her code is changed, so this can be mostly resolved. I believe that getting exact original source code might be useful in quite many situations.

You can download JS file with these functions here.

Comments are closed.