Modifier and Type | Field and Description |
---|---|
static java.lang.String[] |
contentTypes |
private java.lang.String[] |
linkTypes |
private java.lang.String |
regexAll |
private java.lang.String |
regexOneOf |
private java.lang.String |
startURL |
Constructor and Description |
---|
HTMLSurfer(java.lang.String[] linkTypes,
java.lang.String regexAll,
java.lang.String regexOneOf,
java.lang.String startURL)
Creates new CrawlerParser
|
Modifier and Type | Method and Description |
---|---|
java.lang.String[] |
getContentTypes()
Returns an array of String containing the content-types this
ContentHandler can process. |
void |
handle(CrawlerAccess crawler,
java.io.InputStream in,
int depth,
java.net.URL page)
Processes the given page.
|
private java.lang.String[] linkTypes
private java.lang.String regexAll
private java.lang.String regexOneOf
private java.lang.String startURL
public static final java.lang.String[] contentTypes
public HTMLSurfer(java.lang.String[] linkTypes, java.lang.String regexAll, java.lang.String regexOneOf, java.lang.String startURL)
public void handle(CrawlerAccess crawler, java.io.InputStream in, int depth, java.net.URL page)
Handler
InputStream
contains the data of an object that is
of the content-type this content handler accepts. May use the given
CrawlerAccess
object to add further pages to the queue of the
WebCrawler
that asked to process the page.handle
in interface Handler
crawler
- The call back object for the handler. Any objects built from
the content of the page can be sent to this.in
- The InputStream
of the page.depth
- The depth remaining depth. When reporting another page to
the queue, the depth of that page should be set to this depth-1.page
- The URL
of the page.public java.lang.String[] getContentTypes()
Handler
ContentHandler
can process.getContentTypes
in interface Handler
Copyright 2004-2015 Wandora Team