This is a research tool that allows users to calculate a variety of phonotactic metrics. These metrics are intended to capture how probable a word is based on the sounds it contains and the order in which those sounds are sequenced. For example, a nonce word like [stik] 'steek' might have a relatively high phonotactic score in English even though it is not a real word, because there are many words that begin with [st], end with [ik], and so on. In Spanish, however, this word would have a low score because there are no Spanish words that begin with the sequence [st]. A sensitivity to the phonotactic constraints of one's language(s) is an important component of linguistic competence, and the various metrics computed by this tool instantiate different models of how this sensitivity is operationalized.
The general use case for this tool is as follows:
The calculator computes a suite of metrics that are based on unigram/bigram frequencies (that is, the frequencies of individual sounds and the frequencies of adjacent pairs of sounds). This includes type- and token-weighted variants of the positional unigram/bigram method from Jusczyk et al. (1994) and Vitevitch and Luce (2004), as well as type- and token-weighted variants of standard unigram/bigram probabilities. See the About page for a detailed description of how these models differ and how to interpret the scores.
The UCI Phonotactic Calculator was developed by Connor Mayer (UCI), Arya Kondur (UCI), and Megha Sundara (UCLA). Please direct all inquiries to Connor Mayer (cjmayer@uci.edu).
If you publish work that uses the UCI Phonotactic Calculator, please cite the GitHub repository:
Mayer, C., Kondur, A., & Sundara, M. (2022). UCI Phonotactic Calculator (Version 0.1.0) [Computer software]. https://doi.org/10.5281/zenodo.7443706