Cheshire3 Objects: DocumentFactory

Description

DocumentFactories are the main means by which Documents are ingested into the system. Once the 'load' argument has been called, a DocumentFactory should be able to return, on request, one or more Documents. The way in which it does this will depend on the way in which it has been configured, and how 'load' was called. For example it may locate all documents, and cache them internally (e.g. for multiple XML documents within a single file), or it may crawl, locating and returning the documents one at a time (e.g. for many large files in a directory structure.)

Implementations
Capabilities

At the present time, the following DocumentStreams are available for use by DocumentFactories.
Note Well: these are only intended for use by DocumentFactories, and are unlikely to behave correctly if called directly by users' script.

API
FunctionParametersReturnsDescription
__init__session, config, parent The constructer takes the config node for the object, and its parent (usually a database).
load session, ?data, ?cache, ?format, ?tagName, ?codec   Load the data provide (or use the configured default if not provided). The way the data is loaded is dependent on the other parameters (or their configured defaults if absent):
  • cache - should documents be cached in memory: 0 = No, 1 = Locations cached but not documents, 2 = Yes
  • format - specifies how to treat the data parameter
  • tagName - The XML tag to treat as the document root
  • codec - specifies the codec to use to read the documents in
get_documentsession, ?indexDocumentReturn the index'th document in the factory if index is provided, otherwise return the next document.
register_streamsession, format, class Register the supplied class of DocumentStream with the document factory for the given format. This class will be used the next time 'load' is called with this format.