Simple and extensible Java crawler.



All Visitors are runned on a multi-thread environment, so they MUST be Thread Safe. Don't know if your visitor is thread safe? Send us an email at the user list.


The only code you'll usually need to write is a implementation of net.vidageek.crawler.ContentVisitor . This interface provides two methods:


PageVisitor is a sub interface of ContentVisitor. Usually, you'll won't need to implement this since you can use an already implemented PageVisitor. Did I mention that you can compose these PageVisitors?