PageVisitor
PageVisitor is a sub interface of ContentVisitor. Usually, you'll won't need to implement this since you
can use an already implemented PageVisitor.
- DoesNotFollowVisitedUrlVisitor: Using this visitor you'll only visit each url once.
- DomainVisitor: This visitor forces crawler to not go outside the site domain.
- RejectAtDepthVisitor: Basically, you can consider the start page as depth=0, all pages linked from the start
as depth=1 and so on (yes, BFS). So you can configure how deep on the site the crawler will go.
Did I mention that you can compose these PageVisitors?