| Modifier and Type | Field and Description |
|---|---|
static java.lang.String[] |
contentTypes |
private OutputMapper |
fileNameMapper |
private java.lang.String |
inencoding |
private OutputMapper[] |
mappers |
private gnu.regexp.RE[] |
matchers |
private java.lang.String |
outencoding |
private java.lang.String[] |
substitutes |
| Constructor and Description |
|---|
HTMLSaveHandler() |
HTMLSaveHandler(java.lang.Object[] o) |
HTMLSaveHandler(gnu.regexp.RE[] matchers,
OutputMapper[] mappers,
OutputMapper fileNameMapper) |
HTMLSaveHandler(gnu.regexp.RE[] matchers,
java.lang.String[] substitutes,
OutputMapper[] mappers,
OutputMapper fileNameMapper,
java.lang.String inencoding,
java.lang.String outencoding) |
| Modifier and Type | Method and Description |
|---|---|
java.lang.String[] |
getContentTypes()
Returns an array of String containing the content-types this
ContentHandler can process. |
void |
handle(CrawlerAccess crawler,
java.io.InputStream in,
int depth,
java.net.URL page)
Processes the given page.
|
private gnu.regexp.RE[] matchers
private OutputMapper[] mappers
private java.lang.String[] substitutes
private OutputMapper fileNameMapper
private java.lang.String inencoding
private java.lang.String outencoding
public static final java.lang.String[] contentTypes
public HTMLSaveHandler()
public HTMLSaveHandler(java.lang.Object[] o)
public HTMLSaveHandler(gnu.regexp.RE[] matchers,
OutputMapper[] mappers,
OutputMapper fileNameMapper)
public HTMLSaveHandler(gnu.regexp.RE[] matchers,
java.lang.String[] substitutes,
OutputMapper[] mappers,
OutputMapper fileNameMapper,
java.lang.String inencoding,
java.lang.String outencoding)
public void handle(CrawlerAccess crawler, java.io.InputStream in, int depth, java.net.URL page)
HandlerInputStream contains the data of an object that is
of the content-type this content handler accepts. May use the given
CrawlerAccess object to add further pages to the queue of the
WebCrawler that asked to process the page.handle in interface Handlercrawler - The call back object for the handler. Any objects built from
the content of the page can be sent to this.in - The InputStream of the page.depth - The depth remaining depth. When reporting another page to
the queue, the depth of that page should be set to this depth-1.page - The URL of the page.public java.lang.String[] getContentTypes()
HandlerContentHandler can process.getContentTypes in interface HandlerCopyright 2004-2015 Wandora Team