Textractor Enterprise 3e
Our expertise is accumulated in the Textractor Information Extraction toolkit, an environment for easy and rapid development of text classification and information extraction systems. Textractor learns superficial text interpretation from examples. A Textractor implementation is a custom-tailored blend of text classification, string extraction, and section labelling.
Text classification
Text classification means labeling text with a predefined set of categories. These codes provide a meta-representation of the meaning of the text, mapping the large variety of language onto a consistent controled vocabulary, that allows easier access to and management of the text later on. Texts are classified on the basis of a large automatically derived set of features. This, in contrast to many other keyword-based approaches, prevents the system from erroneously triggering on out-of-context occurences of meaningful words. Moreover, the set of relevant features is automatically derived from the training examples, and thus the amount of manual knowledge acquisition is minimized.
Textkernel Textractor Enterprise Components
- TextractorServer processes texts from a flat unstructured format to a XML structured format. According to an easily manageable configuration setting it commands a set of specialized classification and extraction components.
- Classifier. The component responsible for classification of (parts of) texts to a pre-defined set of codes.
- The Extractor is the component responsible for string extraction and section segmentation.
- Normalizer is the component responsible for mapping extracted strings to a pre-defined data vocabulary (Code Tables) and formatting the system's output to your data format.
- Training Toolkit. A suite of tools for annotation, training and tuning the system from example data.
- System Management Tools.

