T2K
From ToolCenter
"Knowledge discovery is the process of uncovering relationships in data previously unknown and extracting this knowledge from the data. Even using current data mining methods, understanding these data relationships can be a difficult task. Data stores in any given problem area are often huge, forcing decision-makers to construct complex queries to reflect the multiple dimensions of their problem domain. These decision-makers would benefit from tools that help highlight potential "information nuggets" and that help in the formation of the complex queries.
Often, a large percentage of these data stores is in the form of text. The T2K (Text to Knowledge) tool provides text mining and analysis capabilities that have been specially designed to operate in and capitalize upon the complexity of rich natural language domains of very large stores of text and multimedia documents.
T2K is a library of D2K modules that implements sophisticated algorithms for text analysis. Some of the types of functionality that are available include:
- Automated Document Clustering
- Automated Document Classification
- Integration with GATE
- Part-of-Speech Tagging
- Information Extraction
- Building Models for Very Large Document Stores
- Realtime Clustering of Very Large Document Stores
- Cluster Visualizations
- Automated Document Cleaning and Preparation
- Term Stemming, Phrase Extraction, Tokenization, and Parsing"