Tuesday, September 11, 2007

How do you construct absolute URL for links in the web page?

Suppose you crawl a page at http://cs.uga.edu/~krishna and one of the HREF links to contact.html, then how do you get the absolute URL for the contact.html ? We can say simply concatenate the original HREF to contact.html, but then it is not that simple. There might be cases where the href was "../../../../contact.html", we need to take into account those cases too. So the simplest way that I found was to use URL class in the java.net package. And the following code does the trick for us.

URL start = new URL("http://cs.uga.edu/~krishna");
URL newUrl = new URL(start,"contact.html"); //construct the absolute URL

Thats the way you construct absolute URL. Very simple, isn't it?

No comments: