Suppose you crawl a page at http://cs.uga.edu/~krishna and one of the HREF links to contact.html, then how do you get the absolute URL for the contact.html ? We can say simply concatenate the original HREF to contact.html, but then it is not that simple. There might be cases where the href was "../../../../contact.html", we need to take into account those cases too. So the simplest way that I found was to use URL class in the java.net package. And the following code does the trick for us.
URL start = new URL("http://cs.uga.edu/~krishna");
URL newUrl = new URL(start,"contact.html"); //construct the absolute URL
Thats the way you construct absolute URL. Very simple, isn't it?
No comments:
Post a Comment