Get HTML from page

This piece of code have followed me since 2003 and I have used it several times. It simply retrieves the text from a URL and returns it as a string. The code is usefull for reading RSS feeds or getting HTML from pages. I even used the code to stress test a certain page on my website.

public string GetPage(string url, NameValueCollection headers)
    string ret = "";
    System.Net.WebRequest myRequest = System.Net.WebRequest.Create(url);
    myRequest.PreAuthenticate = true;
    myRequest.Method = "GET";
    if (headers != null)
    System.Net.WebResponse myResponse = myRequest.GetResponse();
      Stream stream = myResponse.GetResponseStream();
      StreamReader streamreader = new System.IO.StreamReader(stream);
      ret = streamreader.ReadToEnd();
      return ret.Replace("\x00", "");
  catch (Exception ex)
    throw new Exception("Could not get HTML from " + url + ": " + ex.Message, ex);

Calling the code is very easy:

GetPage("", null);

About briancaos

Developer at Pentia A/S since 2003. Have developed Web Applications using Sitecore Since Sitecore 4.1.
This entry was posted in General .NET and tagged , . Bookmark the permalink.

One Response to Get HTML from page

  1. Small comment:
    Why not using string.Empty and throwing a WebException?

    Beside of that, very useful. Especially because Sitecore won’t allow you in the WebUtil to pass headers :).

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.