|
|
Cheshire3 Objects: DocumentFactory |
DocumentFactories are the main means by which Documents are ingested into the system. Once the 'load' argument has been called, a DocumentFactory should be able to return, on request, one or more Documents. The way in which it does this will depend on the way in which it has been configured, and how 'load' was called. For example it may locate all documents, and cache them internally (e.g. for multiple XML documents within a single file), or it may crawl, locating and returning the documents one at a time (e.g. for many large files in a directory structure.)
At the present time, the following DocumentStreams are available for use by DocumentFactories.
Note Well: these are only intended for use by DocumentFactories, and are unlikely to behave correctly if called directly by users' script.
| Function | Parameters | Returns | Description |
|---|---|---|---|
| __init__ | session, config, parent | The constructer takes the config node for the object, and its parent (usually a database). | |
| load | session, ?data, ?cache, ?format, ?tagName, ?codec | Load the data provide (or use the configured default if not provided). The way the data is loaded is dependent on the other parameters (or their configured defaults if absent):
| |
| get_document | session, ?index | Document | Return the index'th document in the factory if index is provided, otherwise return the next document. |
| register_stream | session, format, class | Register the supplied class of DocumentStream with the document factory for the given format. This class will be used the next time 'load' is called with this format. |