Cmdline LFP
Computing Lexical Frequency Profiles on the Command Line
Cmdline LFP is a tool for computing Lexical Frequency Profiles (Laufer & Nation 1995) on the command line. This allows for batch processing controlled by scripts on all major Unix platforms and on Windows. Cmdline LFP is a shameless copy of a part of Paul Nations Range utility which is a GUI program for Windows only.
Cmdline LFP uses a special XML format for word lists. You can download the original Range word lists in this format below. It outputs tabulator separated values (similar to CSV) that can be imported into any spreadsheet software. Input is either plain text or vertical text that has already been tokenized. Furthermore, the tool is designed in a way that it can be used as a library by Java programmers. Source code and JavaDoc come along with the package.
Preview
Take a look at this screenshot.
Usage
- Unpack the downloaded files
- Unpack the word lists into the directory where the other files are
- Open a terminal window and cd to the directory where you put the files.
- On Unix, type
./lfp -t some-input-text.txt gsl-asl/*.xml - On Windows, type
lfp -t some-input-text.txt gsl-asl\*.xml
- On Unix, type
Requirements
This program requires Java 6 to be installed on your computer. On most OS X machines, this is already installed. On Windows, you probably have to do this manually. Linux distributions allow the installation of Java via the package manager.
Download
The original word list files were created by Paul Nation and converted into XML by me and myself. They are under no specific license.
The Cmdline LFP tool is available under the terms of the GNU General Public license.
- Cmdline LFP package for Windows (ZIP-File with runnable programs, source code, JavaDoc)
- Cmdline LFP package for Unix (tar.gz-File with runnable programs, source code, JavaDoc)
References
Laufer, B. and Nation, P. (1995). Vocabulary size and use: Lexical richness in L2 written production. Applied Linguistics, 16(3):307–322.