Home Page   |   Contact Us   |   Search

Email Crawler - Source Code

 
Back to Email Crawler main page
 

Demo

Everything is like the Web Crawler. We'll begin at the NewPage event;

The WebPage object contains the HTML inside. Get it.

string html = page.HTML;


Let's call the function ExtractEmailAddresses to do the job for us.
ExtractEmailAddresses fills in an array of emails:

ArrayList emails = new ArrayList();
Noviway.WebCrawler.Crawler.ExtractEmailAddresses(html, ref emails);

Inside look

ExtractEmailAddresses searches for '$' character.
When we find the '$ we go back and forth and look for forbidden email characters, like the
space character.

Accepted characters are:

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890_-.@

That's it. try it.




 
Share with others:   
 
  © All rights reserved to Eran Aharonovich 2009