Modifier and Type | Field and Description |
---|---|
static java.lang.String[] |
contentTypes |
private OutputMapper |
fileNameMapper |
private java.lang.String |
inencoding |
private OutputMapper[] |
mappers |
private gnu.regexp.RE[] |
matchers |
private java.lang.String |
outencoding |
private java.lang.String[] |
substitutes |
Constructor and Description |
---|
HTMLSaveHandler() |
HTMLSaveHandler(java.lang.Object[] o) |
HTMLSaveHandler(gnu.regexp.RE[] matchers,
OutputMapper[] mappers,
OutputMapper fileNameMapper) |
HTMLSaveHandler(gnu.regexp.RE[] matchers,
java.lang.String[] substitutes,
OutputMapper[] mappers,
OutputMapper fileNameMapper,
java.lang.String inencoding,
java.lang.String outencoding) |
Modifier and Type | Method and Description |
---|---|
java.lang.String[] |
getContentTypes()
Returns an array of String containing the content-types this
ContentHandler can process. |
void |
handle(CrawlerAccess crawler,
java.io.InputStream in,
int depth,
java.net.URL page)
Processes the given page.
|
private gnu.regexp.RE[] matchers
private OutputMapper[] mappers
private java.lang.String[] substitutes
private OutputMapper fileNameMapper
private java.lang.String inencoding
private java.lang.String outencoding
public static final java.lang.String[] contentTypes
public HTMLSaveHandler()
public HTMLSaveHandler(java.lang.Object[] o)
public HTMLSaveHandler(gnu.regexp.RE[] matchers, OutputMapper[] mappers, OutputMapper fileNameMapper)
public HTMLSaveHandler(gnu.regexp.RE[] matchers, java.lang.String[] substitutes, OutputMapper[] mappers, OutputMapper fileNameMapper, java.lang.String inencoding, java.lang.String outencoding)
public void handle(CrawlerAccess crawler, java.io.InputStream in, int depth, java.net.URL page)
Handler
InputStream
contains the data of an object that is
of the content-type this content handler accepts. May use the given
CrawlerAccess
object to add further pages to the queue of the
WebCrawler
that asked to process the page.handle
in interface Handler
crawler
- The call back object for the handler. Any objects built from
the content of the page can be sent to this.in
- The InputStream
of the page.depth
- The depth remaining depth. When reporting another page to
the queue, the depth of that page should be set to this depth-1.page
- The URL
of the page.public java.lang.String[] getContentTypes()
Handler
ContentHandler
can process.getContentTypes
in interface Handler
Copyright 2004-2015 Wandora Team