TextMin Logo

TextMin - Text Analysis

== TextMin ==

for Windows

(C) 2025 Luca Pavan

pavan@panservice.it

-------------------------
Description

TextMin is a comprehensive tool for text analysis. It works for Italian and English language.
It can be used by students, researchers, marketing professionals, or linguists and offers a wide range of features for in-depth text analysis.
TextMin analyzes texts of any size (it has been successfully tested with text files of approximately 350 MB).
TextMin is a suite of programs for the Command Prompt, executed through a batch procedure prepared by the program.
The user is required to write or paste the text to be analyzed into the program window, then press the "Analyze" button to start the analysis.
The result is a series of files stored in the \analysis folder.

It is also possible to analyze corpora of various sizes, provided that a corpus consists of a series of text files, which must be combined into a single file whose text is pasted into the program window.
To concatenate the text files of a corpus, one can execute the following Command Prompt command in the corpus folder:

copy /b *.txt corpus.txt

Then, using an editor that handles large files, such as Notepad++, copy and paste the text from corpus.txt into TextMin for analysis.

-------------------------
Installation

It is recommended to install TextMin in a folder other than C:\Program Files, so the program has write permissions on the disk.

-------------------------
Main Features

- Text Statistics
Provides detailed metrics such as:
number of words, characters, and sentences;
average word and sentence length;
number of unique words

- Readability Indexes
Returns the values of major readability indexes for Italian and English (for Italian, it also uses the Nuovo Vocabolario di Base [De Mauro, 2016]).

- Duplicate Detection
Identifies repeated words within individual sentences.

- Concordances
Finds words or sentences in the text.

- Lexical Analysis
Explores the vocabulary used, excluding a list of stopwords.
Divides the text in parts and shows some dispersion measures.

- Named Entity Recognition (NER)
Automatically extracts names of places, people, organizations, dates and other relevant entities. Based on a dictionary.

- Sentence Analysis
Relates sentences to punctuation, obtaining statistics.

- N-grams
N-grams from bigrams to 10-grams.

- Automatic Summarization
Generates a summary of variable length using word frequency.

- Sentiment Analysis
Determines the sentiment of the text (positive, negative, neutral). Based on dictionaries.

- Topic Analysis
Reveals the main themes of the text through topic categories. Based on dictionaries.

- Word cloud
Draws a cloud with words according to their frequency.
    

Download

Download TextMin_Setup.exe