Home Page   |   Products   |   Customer Service   |   About Us   |   Contact Us   |   Search

Image Crawler - Source Code

 
Back to Image Crawler main page
 

Demo

Everything is like the Web Crawler. We'll begin at the NewPage event;

The WebPage object contains the XML inside. Use it.
We go over all IMG nodes and get the 'src' attribute of each one.

XmlNodeList list = page.XML.GetElementsByTagName("img");

string src = string.Empty;

foreach ( XmlNode node in list )
{
     src = node.Attributes["src"].Value;

     // Get absolute url
     src = Noviway.WebCrawler.Crawler.GetAbsoluteUrl( page.Url, src );
}

Afer we've got the sources of the images, we need to download them.

System.Net.HttpWebResponse response = null;

MemoryStream memStream = new MemoryStream();

Noviway.HTTPBrowser.Browser browser = new Noviway.HTTPBrowser.Browser();

if ( browser.Navigate( new Noviway.HTTPBrowser.Browser.Stage("GET", source, string.Empty ),
ref memStream, out response ) )
{
     string path = string.Format("{0}/{1}", "Your_Crawler_Name", Path.GetFileName( source ) );

     FileStream fs = File.Create( path );

     memStream.WriteTo( fs );

     fs.Close();
}

 
Share with others:   
 
  Webmaster: Eran Aharonovich © All rights reserved to Eran Aharonovich 2007