]> How to convert relative URLs to absolute URLs in websites using PHP « MOleYArd (MOYA) blog

About

This blog mostly presents different approaches, methods and practices in software and web development, but also contains some "out of main topic" articles. MOYA (MOleYArd) products are presented here as well.

Follow

 


Valid XHTML 1.0 Transitional

How to convert relative URLs to absolute URLs in websites using PHP

You will probably not argue that relative URLs (Uniform Resource Locators) are very useful feature of websites, especially as for web creation process. If once some webpage had to move somewhere else (and it usually has to at least from developer’s localhost to production environment), it would be a tedious process to rewrite it, so using relative addresses is very convenient.

But there are situations when absolute URLs better fit our needs, or when they are even essential. A typical example is when you want to copy the source of some website or its part. There are relative URLs for images and included styles. Of course, you want to use it in another domain, so relative URLs are not what you want, if you don’t want to also copy all included content. Another example of usage is when you want to download that included content. Again, you need to have its absolute URL.

So the task is to convert all relative URL addresses to their absolute counterparts. But before I show you how to this, let’s discuss occurences and all possible forms of relative URLs. We will deal with HTTP(S) scheme URLs.

The most common occurrences of URLs in websites have these forms:

<img src="someurl.jpg" alt="someurl" />

<a href="subsite/index.html"></a>

<script type="text/javascript" src="scriptinurl.js"></script>

<link rel="stylesheet" type="text/css" href="styles.css" />

<div style="background: url('image.jpg')"></div>

<style type="text/css">
/*<![CDATA[*/
 @import url('url.css');
 p {background: url("image.jpg")}
/*]]>*/
</style>

So we may have URL as href or src attribute of an element, or we may have URL inside a CSS style within url(). These include the vast majority of included URLs. There is one more type – URL defined inside internal script (like dynamically replaced or inserted images). We will not deal with this situation, because it’s simply not always possible to cover it in an universal way. Such an URL is often built up from several parts (different variables, substrings, values returned from functions, …). There is no unified form in that case, but after reading this article you will be able to adjust presented solution for those special cases.

As for URLs as attributes, they may be surrounded by single or double quotes and between quotes and URL can be placed white spaces (browser will parse URL with leading spaces)

<element src="url"></element>
<element src=" url"></element>
<element src="url "></element>
<element src="   url   "></element>
<element src='url'></element>
<element src=' url'></element>
<element src='  url  '></element>

The same thing about white spaces applies to URLs inside CSS url(). Moreover, apart from single and double quotes, in the case of CSS URLs no quotes are required. So we have these options:

 url("someurl");
 url('someurl');
 url(someurl);

That’s all about possible occurrences, let’s move to possible forms of relative URLs.

1. Relative URL that does not begin with slashes, e.g. page.html, subsite/page.html, ~subsite/page.html, images/image.gif

This is the most common case. This kind of URL has a path that links URL from a directory of the current page. (current page’s directory is “hidden” root of URL)

2. Relative URL that begins with ./, e.g. ./page.html, ./subsite/page.html, ./~subsite/page.html, ./images/image.gif

This is the same as the first case with . denoting the current directory.

3. Relative URL that begins with ../, ../../, ../../../ and so on, e.g. ../page.html, ../../subsite/page.html, ../../image.gif

In this case ‘hidden’ root of URL is current page’s parent directory (in case of ../), or the parent directory of current page’s parent directory (in case of ../../) and so on.

4. Relative URL that begins with /, e.g. /page.html, /subsite/page.html, /images/image.gif

In this case ‘hidden’ root of URL is the real root directory of the server and it is irrelevant in which subdirectory is the current page.

5. Relative URL that begins with //, e.g. //www.webpage.com, //examplepage.net

This is almost an absolute path, but what is missing is an URI scheme (like http, or https). Actually, the practical use of this is might be when both standard nonsecure and secure protocol versions of a different domain site exist and we want to retain current site’s URI scheme. Yes, to say it explicitly, missing URI scheme is supplemented with current site’s URI scheme.

6. Relative URLs of all previous types that contain ../ and ./ scattered inside URL, e.g. subpage/../page.html, /subsite/././page.html, /images/../images/../images/./image.gif, //examplepage.net/page/../page/

This ‘type’ is not at all practical and kind of crazy and you should not encounter it often, but it is still valid to use these current and parent directories symbols in the other parts of an URL than in the beginning of an URL, so you have to take in mind the possibility of encountering this.


Now we know theoretical stuff about relative URLs, so let’s finally move on the conversion functions themselves.

The first function I will show you convert a relative URL to an absolute URL given that it has both the base absolute URL and the relative URL as parameters. We assume that we have already gained this relative URL from somewhere and now we just want to convert it. So here is the function and then the explanation follows:

function convertRelativeToAbsoluteURL($baseAbsoluteURL, $relativeURL)
{
   $relativeURL = trim($relativeURL);
   if (substr($relativeURL, 0, 7) !== 'http://' && substr($relativeURL, 0, 8) !== 'https://')
   {
    while (strpos($relativeURL, '/./') !== false)
    {
       $relativeURL=str_replace('/./','/',$relativeURL);
    }
    if (substr($relativeURL, 0, 2) === './') $relativeURL = substr($relativeURL, 2);
    $urlInfo = parse_url($baseAbsoluteURL);
    if ($urlInfo == false) return false;
    $urlBasePath = substr($urlInfo['path'], 0, strrpos($urlInfo['path'],"/"));
    $dirDepth = substr_count($urlInfo['path'], '/')-1;

    $dirDepthRel = substr_count(preg_filter('\'^((\\.\\./)+)(.*)\'', '$1', $relativeURL), '../');
    $relativeURL = preg_replace('\'^((\\.\\./)+)(.*)\'', '$3', $relativeURL);    

    for ($i=0; $i<$dirDepthRel; $i++)
    {
       $urlBasePath = substr($urlBasePath, 0, strrpos($urlBasePath,"/"));
    }
    $urlBase = $urlInfo['scheme'].'://'.$urlInfo['host'].$urlBasePath;

    do
    {
       $tempContent = $relativeURL;
       $relativeURL = preg_replace('\'^(.*?)(([^/]*)/\\.\\./)(.*?)$\'', '$1$4', $relativeURL);
    }
    while ($tempContent != $relativeURL);

    if(substr($relativeURL, 0, 2) === "//")
    {
       $relativeURL=$urlInfo['scheme'].':'.$relativeURL;
    }
    else if(substr($relativeURL, 0, 1) === "/")
    {
       $relativeURL=$urlInfo['scheme'].'://'.$urlInfo['host'].$relativeURL;
    }
    else
    {
       $relativeURL=$urlBase.'/'.$relativeURL;
    }
   }
   return $relativeURL;
}

I will not explain single used built-in PHP functions in the details, you can find complete explanations of them for example in PHP manual, the site you definitely know :) . However, I will cover regular expressions (regexps) functions a bit, as we used them a lot in presented functions.

The most convenient way of using regexps in PHP is the use of Perl-Compatible Regular Expressions (PCRE). There are some differences from POSIX regexps, but I will not bother with this. We will use three functions – preg_filter, preg_replace and preg_replace_callback. They are all quite similar – difference between preg_filter and preg_replace is that the latter returns original string, where replacements should have taken place, every time, while the first one if and only if some replacements occured, otherwise it returns NULL. I’ll tell you the difference between preg_replace and preg_replace_callback later. The first parameter is in all three cases the pattern to search for, the third is the string to search and replace. The second parameter is the replacement string (or callback function that returns the replacement string in the last case).

Every regexp that is passed to one of those functions is enclosed within quotes. And I do not mean standard PHP string quotes now. These are of course also necessary, but also regexp string inside them needs to be enclosed within quotes. I use single quotes. And because single quotes have that special meaning, if we need to put a single quote inside a regexp, we need to put a backslash in front of it. And not only in front of a single quote, but in front of every character that has a special meaning for regexp (like .,?, …) if we want to suppress this meaning. But backslashes have similar special meaning in PHP strings, too. And because any regexp as a PHP function argument is a string, we need to put another backslash in front of every backslash in this string. That’s the reason why you can see so many backslashes :) . So now you shouldn’t be confused.

So here is the meaning of some parts of regexps that we will use: (commented lines contain description)

$0 denotes the whole matched expression.

$0

If we enclose some parts of regexp within (), we create subexpressions. They are also called groups.

(subregexp1)(subregexp2)restofregexp(endofregexp)

We can then access these subexpressions using a dollar sign and the particular number.

$1$2$3

The special case of groups are so called “lookarounds” – you can learn more about them for example
from this tutorial http://www.regular-expressions.info/lookaround.html.

One special case of them is so-called “negative lookahead”, for example:

(?!http)

Previous regexp fragment denotes that it will not match expressions that contain http substring in given part of the whole expression.

And just for the sake of completeness (definitions mostly from wikipedia :) ):

. // matches any single character
[ ] // matches a single character that is contained within the brackets
[^ ] // matches a single character that is not contained within the brackets
^ // matches the starting position within the string
$ // matches the ending position of the string or the position just before a string-ending newline
* // matches the preceding element zero or more times
? // matches the preceding element zero or one time
+ // matches the preceding element one or more times

Let’s return to the description of presented function. At the beginning of the function we trim relative URL for the case it has some leading white space (for example if we got it from element’s href attribute with leading space), but we can remove it, if we want to preserve potential white spaces (but then trim action has to be moved to all the conditions later in the function that compare strings with the beginning of relative URL). Then we check if passed URL is not an absolute URL (if it is, we skill all following steps and return it). Then removal of single dots follows. Then we use parse_url function to get parts of URL (host, scheme, etc.).

Lines 16-22 resolve situations when URL begins with a series of double dots and forward slash (../, ../../, …). First we count number of these and according that number is then adjusted the base absolute URL by removing last number of directories from this URL. Lines 25-30 resolve a series of double dots and forward slash inside URL. The rest of function deals with remaining situations.


The first situation dealt with the most frequent potential use – simply single URL conversion. But we might need a more complex task – to replace all relative URLs inside some text. There is of course no universal solution for these, as there is no way to determine what actually is a relative URL and what is not, unless it is somehow denoted. Such a case are URLs inside CSS styles enclosed within url() (or url(”) orurl(“”)). For this task I wrote the function that use regular expressions more extensively. And I also used preg_replace_callback function, so now I will explain this function.

It’s essentially the same as preg_replace, but it uses the callback function that returns replacement string. $matches are passed as an parameter. $matches[0] represents $0 and $matches[1] subexpression $1, $matches[2] subexpression $2 and so on. This function is used when we need to make some changes on the whole matched expression or matched subexpressions or when we want to take some information from them and according them built up the replacement string.

The used callbacks were created with create_function function which has the definition of this callback function as a string parameter. Note that to access value from outside of it I used the global variable $callBackAbsoluteURL. You cannot use variables from outside the function that are not declared as global.

So here is the function that converts all relative URLs inside CSS content to their absolute URLs counterparts:

function convertCSSRelativeURLsToCSSAbsoluteURLs($baseAbsoluteURL, $cssContent)
{
   $urlInfo = parse_url($baseAbsoluteURL);
   if ($urlInfo == false) return false;
   $urlBasePath = substr($urlInfo['path'], 0, strrpos($urlInfo['path'],"/"));
   $urlBase = $urlInfo['scheme'].'://'.$urlInfo['host'].$urlBasePath;
   $dirDepth = substr_count($urlInfo['path'], '/')-1;
   global $callBackAbsoluteURL;
   $callBackAbsoluteURL = $baseAbsoluteURL;         

   $singleDotsCallback = create_function('$matches',
   '
     while (strpos($matches[0], "/./") !== false)
     {
        $matches[0]=str_replace("/./","/",$matches[0]);
     }
     if (substr($matches[2], 0, 2) === "./")
     {
        $matches[2] = substr($matches[2], 2);
        return $matches[1].$matches[2];
     }
     else return $matches[0];
   ');

   $doubleDotsCallback = create_function('$matches',
   '
     global $callBackAbsoluteURL;
     $baseAbsoluteURL=$callBackAbsoluteURL;
     $urlInfo = parse_url($baseAbsoluteURL);
     if ($urlInfo==false) return false;
     $urlBasePath = substr($urlInfo["path"], 0, strrpos($urlInfo["path"],"/"));
     $dirDepth=substr_count($urlInfo["path"], "/")-1;
     $dirDepthRel=substr_count($matches[2], "../");

     for ($i=0; $i<$dirDepthRel; $i++)
     {
        $urlBasePath = substr($urlBasePath, 0, strrpos($urlBasePath,"/"));
     }

     $urlBase = $urlInfo["scheme"]."://".$urlInfo["host"].$urlBasePath;
     $relativeURL=$urlBase."/".$matches[4];
     return $matches[1].$relativeURL;
   ');

   $cssContentAfterSingleDotCheck = preg_replace_callback
   (
    '\'(url\\(\s*[\\\'"]?\\s*)(.*?\\))\'',
    $singleDotsCallback, $cssContent
   );
   $cssContentAfterDoubleDotCheck = preg_replace_callback
   (
    '\'(url\\(\\s*[\\\'"]?\\s*)((\\.\\./)+)(?!\.\\./)(.*?\\))\'',
    $doubleDotsCallback, $cssContentAfterSingleDotCheck
   );
   do
   {
      $tempContent=$cssContentAfterDoubleDotCheck;
      $filteredCssContentAfterDoubleDotCheck = preg_filter
      (
       '\'(url\\(.*?)(([^/]*)/\\.\\./)(.*?\\))\'',
       '$1$4', $cssContentAfterDoubleDotCheck
      );
      if ($filteredCssContentAfterDoubleDotCheck != NULL) $cssContentAfterDoubleDotCheck= filteredCssContentAfterDoubleDotCheck;
   }
   while ($tempContent != $cssContentAfterDoubleDotCheck);
   $cssContentAfterMissingURISchemeCheck = preg_replace
   (
    '\'(url\\(\\s*[\\\'"]?\\s*)(//)(.*?\\))\'',
    '$1'.$urlInfo['scheme'].':$2$3', $cssContentAfterDoubleDotCheck
   );
   $cssContentAfterRootDirCheck = preg_replace
   (
    '\'(url\\(\\s*[\\\'"]?\\s*)(/)(.*?\\))\'',
    '$1'.$urlInfo['scheme'].'://'.$urlInfo['host'].'$2$3', $cssContentAfterMissingURISchemeCheck
   );
   $cssContentAfterNoSlashesRelativeURLCheck = preg_replace
   (
    '\'(url\\(\\s*[\\\'"]?\\s*)(((?!https?://).)*?\\))\'',
    '$1'.$urlBase.'/'.'$2', $cssContentAfterRootDirCheck
   );

   return $cssContentAfterNoSlashesRelativeURLCheck;
}

And finally some examples:

  $baseAbsoluteURL = "http://www.moya.sk/images/ble/";

  $relativeURL = "images/green.png";
  echo convertRelativeToAbsoluteURL($baseAbsoluteURL, $relativeURL);
  //Output: http://www.moya.sk/images/ble/images/green.png
  $relativeURL = "images/test/../green.png";
  echo convertRelativeToAbsoluteURL($baseAbsoluteURL, $relativeURL);
  //Output: http://www.moya.sk/images/ble/images/green.png
  $relativeURL = "/test/test/images/red.png";
  echo convertRelativeToAbsoluteURL($baseAbsoluteURL, $relativeURL);
  //Output: http://www.moya.sk/test/test/images/red.png
  $relativeURL = "/test/test/images./red.png";
  echo convertRelativeToAbsoluteURL($baseAbsoluteURL, $relativeURL);
  //Output: http://www.moya.sk/test/test/images./red.png
  $relativeURL = "//www.webpage.com/test/test/images/red.png";
  echo convertRelativeToAbsoluteURL($baseAbsoluteURL, $relativeURL);
  //Output: http://www.webpage.com/test/test/images/red.png
  $relativeURL = "//www.webpage.com/lll../..ddd/s../test/test/./images/red.png";
  echo convertRelativeToAbsoluteURL($baseAbsoluteURL, $relativeURL);
  //Output: http://www.webpage.com/lll../..ddd/s../test/test/images/red.png
  $relativeURL = "//www.webpage.com/././test/test/./../images/red.png";
  echo convertRelativeToAbsoluteURL($baseAbsoluteURL, $relativeURL);
  //Output: http://www.webpage.com/test/images/red.png
  $relativeURL = "./images/red.png";
  echo convertRelativeToAbsoluteURL($baseAbsoluteURL, $relativeURL);
  //Output: http://www.moya.sk/images/ble/images/red.png
  $relativeURL = "../../../../../../../test/green.png";
  echo convertRelativeToAbsoluteURL($baseAbsoluteURL, $relativeURL);
  //Output: http://www.moya.sk/test/green.png
  $relativeURL = "../test/green.png";
  echo convertRelativeToAbsoluteURL($baseAbsoluteURL, $relativeURL);
  //Output: http://www.moya.sk/images/test/green.png
  $relativeURL = "../test/hh/../green.png";
  echo convertRelativeToAbsoluteURL($baseAbsoluteURL, $relativeURL);
  //Output: http://www.moya.sk/images/test/green.png
  $relativeURL = ".././test/gsreen.png";
  echo convertRelativeToAbsoluteURL($baseAbsoluteURL, $relativeURL);
  //Output: http://www.moya.sk/images/test/gsreen.png
  $relativeURL = "   images/green.png   ";
  echo convertRelativeToAbsoluteURL($baseAbsoluteURL, $relativeURL);
  //Output: http://www.moya.sk/images/ble/images/green.png
  $relativeURL = " /images/green.png";
  echo convertRelativeToAbsoluteURL($baseAbsoluteURL, $relativeURL);
  //Output: http://www.moya.sk/images/green.png
  $relativeURL="http://www.moya.sk/images/ble/images/red.png";
  echo convertRelativeToAbsoluteURL($baseAbsoluteURL, $relativeURL);
  //Output: http://www.moya.sk/images/ble/images/red.png

 $testCSS=
  "
  #image1 {backround: url(images/green.png);}
  #image2 {backround: url(images/bla/../bla/green.png);}
  #image3 {backround: url(images/bla/././././././././bla/../green.png);}
  #image4 {backround: url(images/bla/bla/bla/../green.png);}
  #image5 {backround: url(images/bla/lop../bla/bla/.s./green.png);}
  #image6 {backround: url(images/green.png);}
  #image7 {backround: url(/test/test/images/red.png);}
  #image8 {backround: url('images/green.png');}
  #image9 {backround: url(' /test/test/images/red.png');}
  #image10 {backround: url('./images/red.png');}
  #image11 {backround: url(../test/ggreen.png);}
  #image12 {backround: url(../test/hh/../ggreen.png);}
  #image13 {backround: url('  ./images/red.png');}
  #image14 {backround: url(  ../.././test/green.png   );}
  #image15 {backround: url( '../../../test/test/images/green.png');}
  #image16 {backround: url( /test/test/images/red.png   );}
  #image17 {backround: url(' /test/test/images/red.png '  );}
  #image18 {backround: url(        ' /test/test/images/red.png '   );}"
  .
  '
  #image19 {backround: url("//www.webpage.com/././test/test/./../images/red.png");}
  #image20 {backround: url("//www.webpage.com/././test/test/../images/red.png");}
  #image21 {backround: url("//www.webpage.com/lll../..ddd/test/test/./images/red.png");}
  #image22 {backround: url("images/green.png");}
  #image23 {backround: url("../test/test/images/red.png");}
  #image24 {backround: url(   "  ./images/green.png " );}
  #image25 {backround: url(        " /test/test/images/red.png "   );}
  ';

  echo nl2br(convertCSSRelativeURLsToCSSAbsoluteURLs($baseAbsoluteURL, $testCSS));
  /*
  Output:
 #image1 {backround: url(http://www.moya.sk/images/ble/images/green.png);}
 #image2 {backround: url(http://www.moya.sk/images/ble/images/bla/green.png);}
 #image3 {backround: url(http://www.moya.sk/images/ble/images/bla/green.png);}
 #image4 {backround: url(http://www.moya.sk/images/ble/images/bla/bla/green.png);}
 #image5 {backround: url(http://www.moya.sk/images/ble/images/bla/lop../bla/bla/.s./green.png);}
 #image6 {backround: url(http://www.moya.sk/images/ble/images/green.png);}
 #image7 {backround: url(http://www.moya.sk/test/test/images/red.png);}
 #image8 {backround: url('http://www.moya.sk/images/ble/images/green.png');}
 #image9 {backround: url(' http://www.moya.sk/test/test/images/red.png');}
 #image10 {backround: url('http://www.moya.sk/images/ble/images/red.png');}
 #image11 {backround: url(http://www.moya.sk/images/test/ggreen.png);}
 #image12 {backround: url(http://www.moya.sk/images/test/ggreen.png);}
 #image13 {backround: url(' http://www.moya.sk/images/ble/images/red.png');}
 #image14 {backround: url( http://www.moya.sk/test/green.png );}
 #image15 {backround: url( 'http://www.moya.sk/test/test/images/green.png');}
 #image16 {backround: url( http://www.moya.sk/test/test/images/red.png );}
 #image17 {backround: url(' http://www.moya.sk/test/test/images/red.png ' );}
 #image18 {backround: url( ' http://www.moya.sk/test/test/images/red.png ' );}
 #image19 {backround: url("http://www.webpage.com/test/images/red.png");}
 #image20 {backround: url("http://www.webpage.com/test/images/red.png");}
 #image21 {backround: url("http://www.webpage.com/lll../..ddd/test/test/images/red.png");}
 #image22 {backround: url("http://www.moya.sk/images/ble/images/green.png");}
 #image23 {backround: url("http://www.moya.sk/images/test/test/images/red.png");}
 #image24 {backround: url( " http://www.moya.sk/images/ble/images/green.png " );}
 #image25 {backround: url( " http://www.moya.sk/test/test/images/red.png " );}
  */

You can download these functions here.

Comments are closed.