public class FindSubjectLocator extends AbstractWandoraTool implements WandoraTool, Handler, InterruptHandler
FindSubjectLocator crawls URL resources and tries to match each
found URL to the search pattern. If URL matches the search pattern topic is
given URL as the subject locator.
FindSubjectLocator is used to fix missing subject locators
for example.
This tool has NOT been tested.
| Modifier and Type | Field and Description |
|---|---|
private Wandora |
admin |
protected int |
browseCounter |
java.lang.String[] |
contentTypes |
private int |
crawlCounter |
private WebCrawler |
crawler |
protected int |
extractionCounter |
protected int |
foundCounter |
private java.lang.String |
startUrl |
private java.lang.String |
subjectLocatorString |
private java.util.HashMap<java.lang.String,java.lang.String> |
topicPatterns |
private java.lang.String |
urlPattern |
| Constructor and Description |
|---|
FindSubjectLocator()
Creates a new instance of FindSubjectLocator
|
FindSubjectLocator(Context preferredContext) |
| Modifier and Type | Method and Description |
|---|---|
void |
execute(Wandora admin,
Context context)
Runs the tool.
|
java.lang.String[] |
getContentTypes()
Returns an array of String containing the content-types this
ContentHandler can process. |
java.lang.String |
getDescription()
AdminToolManager views tool descriptions while user browses available
tools and build user customizable GUI elements such as Tools menu.
|
int[] |
getInterruptsHandled() |
java.lang.String |
getName()
Tools name represent the tool in UI unless the tool has been given
explicitly another GUI name.
|
void |
handle(CrawlerAccess crawler,
java.io.InputStream in,
int depth,
java.net.URL url)
Processes the given page.
|
void |
handleInterrupt(CrawlerAccess crawler,
int interrupt,
java.net.URL url) |
Topic |
isMyURL(java.net.URL url) |
void |
setupCrawler(java.lang.String startUrl) |
java.lang.String |
solveStartURL()
solveStartURL returns the URL where crawling is
started. |
java.lang.String |
solveURLPattern(Topic topic)
solveURLPattern returns pattern that is compared to each
URL. |
addUndoMarker, addUndoMarker, allowMultipleInvocations, clearAllThreads, clearThreads, clearThreads, clearToolLock, clearToolLock, clearToolLocks, configure, execute, execute, forceStop, forceStop, getContext, getCurrentLogger, getDefaultLogger, getHistory, getIcon, getLastLogger, getState, getThreads, getThreads, getToolMenuItem, getToolMenuItem, getTopicName, getType, hlog, initialize, interruptAllThreads, interruptThreads, interruptThreads, isConfigurable, isRunning, isRunning, lockLog, log, log, log, log, requiresRefresh, run, runInOwnThread, setContext, setDefaultLogger, setLogTitle, setProgress, setProgressMax, setState, setToolLogger, singleLog, singleLog, singleLog, solveContextTopicMap, solveNameForTopicMap, writeOptionsclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitconfigure, execute, execute, getContext, getIcon, getToolMenuItem, getType, hlog, initialize, isConfigurable, isRunning, log, log, log, log, requiresRefresh, setContext, setToolLogger, writeOptionsforceStop, getHistory, getState, lockLog, setLogTitle, setProgress, setProgressMax, setStateprivate Wandora admin
private WebCrawler crawler
private int crawlCounter
protected int extractionCounter
protected int foundCounter
protected int browseCounter
private java.lang.String urlPattern
private java.lang.String startUrl
private java.lang.String subjectLocatorString
private java.util.HashMap<java.lang.String,java.lang.String> topicPatterns
public final java.lang.String[] contentTypes
public FindSubjectLocator()
public FindSubjectLocator(Context preferredContext)
public java.lang.String getName()
AbstractWandoraToolgetName in interface WandoraToolgetName in class AbstractWandoraToolpublic java.lang.String getDescription()
AbstractWandoraToolgetDescription in interface WandoraToolgetDescription in class AbstractWandoraToolpublic java.lang.String solveStartURL()
solveStartURL returns the URL where crawling is
started.public java.lang.String solveURLPattern(Topic topic)
solveURLPattern returns pattern that is compared to each
URL.public void execute(Wandora admin, Context context)
WandoraToolexecute in interface WandoraToolpublic void setupCrawler(java.lang.String startUrl)
public Topic isMyURL(java.net.URL url)
public void handle(CrawlerAccess crawler, java.io.InputStream in, int depth, java.net.URL url)
HandlerInputStream contains the data of an object that is
of the content-type this content handler accepts. May use the given
CrawlerAccess object to add further pages to the queue of the
WebCrawler that asked to process the page.handle in interface Handlercrawler - The call back object for the handler. Any objects built from
the content of the page can be sent to this.in - The InputStream of the page.depth - The depth remaining depth. When reporting another page to
the queue, the depth of that page should be set to this depth-1.url - The URL of the page.public java.lang.String[] getContentTypes()
HandlerContentHandler can process.getContentTypes in interface Handlerpublic void handleInterrupt(CrawlerAccess crawler, int interrupt, java.net.URL url)
handleInterrupt in interface InterruptHandlerpublic int[] getInterruptsHandled()
getInterruptsHandled in interface InterruptHandlerCopyright 2004-2015 Wandora Team