Details of the message you wish to send us:

The LIWC2007 Application

Development and Psychometric Properties of LIWC2007

The ways that individuals talk and write provide windows into their emotional and cognitive worlds. Over the last three decades, researchers have provided evidence to suggest that people's physical and mental health can be predicted by the words they use (Gottschalk & Glaser, 1969; Rosenberg & Tucker, 1978; Stiles, 1992). More recently, a large number of studies have found that having individuals write or talk about deeply emotional experiences is associated with improvements in mental and physical health (e.g., Pennebaker, 1997; Smyth, 1997). Text analyses based on these studies indicate that those individuals who benefit the most from writing tend to use relatively high rates of positive emotion words, a moderate number of negative emotion words, and, most importantly, an increasing number of cognitive or thinking words from the first to last days of writing (e.g., Pennebaker & Francis, 1996; Pennebaker, Mayne, & Francis, 1997).

In order to provide an efficient and effective method for studying the various emotional, cognitive, structural, and process components present in individuals' verbal and written speech samples, we developed a text analysis application called Linguistic Inquiry and Word Count, or LIWC. The first LIWC application was developed as part of an exploratory study of language and disclosure (Francis, 1993; Pennebaker, 1993). As described below, the second version, LIWC2007, is an updated revision of the original application. It is best suited for Windows-based and Power Macintosh platforms. LIWC2007 applications are designed to analyze written text on a word by word basis, calculate the percentage words in the text that match each of up to 82 language dimensions, and generate output as a tab-delimited text file that can be directly read into application programs, such as SPSS for Windows, Excel, etc.


The LIWC2007 Framework

The LIWC2007 application contains within it a default set of word categories and a default dictionary that defines which words should be counted in the target text files. Note that the LIWC2007.EXE file is an executable file and cannot be read or opened. To avoid confusion in the subsequent discussion, text words that are read and analyzed by LIWC2007 are referred to as target words. Words in the LIWC2007 dictionary file will be referred to as dictionary words. Groups of dictionary words that tap a particular domain (e.g., negative emotion words) are variously referred to as subdictionaries or word categories.


The LIWC2007 Main Text Processing Module

LIWC2007 is designed to accept written or transcribed verbal text which has been stored as a text or ASCII file using any of the popular word processing software packages (e.g., WordPerfect or Word). LIWC2007 accesses a single file or group of files and analyses each sequentially, writing the output to a single file. Processing time for a page of single-spaced text is typically a fraction of a second on Pentium or PowerMacintosh computers. LIWC2007 reads each designated text file, one target word at a time. As each target word is processed, the dictionary file is searched, looking for a dictionary match with the current target word. If the target word matches the dictionary word, the appropriate word category scale (or scales) for that word is incremented. As the target text file is being processed, counts for various structural composition elements (e.g., word count and sentence punctuation) are also incremented.

With each text file, approximately 80 output variables are written as one line of data to a designated output file. This data record includes the file name, 4 general descriptor categories (total word count, words per sentence, percentage of words captured by the dictionary, and percent of words longer than six letters), 22 standard linguistic dimensions (e.g., percentage of words in the text that are pronouns, articles, auxiliary verbs, etc.), 32 word categories tapping psychological constructs (e.g., affect, cognition, biological processes), 7 personal concern categories (e.g., work, home, leisure activities), 3 paralinguistic dimensions (assents, fillers, nonfluencies), and 12 punctuation categories (periods, commas, etc). A complete list of the standard LIWC2007 scales is included in Table 1.


The Default LIWC2007 Dictionary

The LIWC2007 Dictionary is the heart of the text analysis strategy. The default LIWC2007 Dictionary is composed of almost 4,500 words and word stems. Each word or word stem defines one or more word categories or subdictionaries. For example, the word cried is part of five word categories: sadness, negative emotion, overall affect, verb, and past tense verb. Hence, if it is found in the target text, each of these five subdictionary scale scores will be incremented. As in this example, many of the LIWC2007 categories are arranged hierarchically. All anger words, by definition, will be categorized as negative emotion and overall emotion words. Note too that word stems can be captured by the LIWC2007 system. For example, the LIWC2007 Dictionary includes the stem hungr* which allows for any target word that matches the first five letters to be counted as an ingestion word (including hungry, hungrier, hungriest). The asterisk, then, denotes the acceptance of all letters, hyphens, or numbers following its appearance.

Each of the default LIWC2007 categories is composed of a list of dictionary words that define that scale. Table 1 provides a comprehensive list of the default LIWC2007 dictionary categories, scales, sample scale words, and relevant scale word counts.


LIWC2007 Dictionary Development

The selection of words defining the LIWC2007 categories involved multiple steps over several years. The initial idea was to identify a group of words that tapped basic emotional and cognitive dimensions often studied in social, health, and personality psychology. With time, the domain of word categories expanded considerably.

Step 1. Word Collection. In the design and development of the LIWC2007 category scales, sets of words were first generated for each category scale. Within the Psychological Processes category, for example, the emotion or affective subdictionaries were based on words from several sources. We drew on common emotion rating scales, such as the PANAS (Watson, Clark, & Tellegen, 1988), Roget's Thesaurus, and standard English dictionaries. Following the creation of preliminary category word lists, brain-storming sessions among 3-6 judges were held in which words relevant to the various scales were generated and added to the initial scale lists. Similar schemes were used for the other subjective dictionary categories.

Step 2. Judges' Rating Phases. Once the broad word lists were amassed, those words in the Psychological Processes and Personal Concerns and most in the Relativity (excluding verb tense) categories were then rated by three independent judges. In this phase of development, the judges were instructed to focus on both the inclusion and exclusion of words in each LIWC2007 Dictionary scale list. First, the judges indicated whether each word in the scale list should or should not be included on the particular scale in question. Second, they were instructed to include additional words they felt should be included in the scale. After the completion of the first judging phase, all category scale word lists were updated by the following set of rules: 1) a word remained on the scale list if two out of three judges agreed, 2) a word was deleted from the scale list if at least two of the three judges agreed it should be excluded, and 3) a word was added to the scale list if two out of three judges agreed. Due to the objective nature of elements in the Standard Language Dimensions category (e.g., articles, pronouns, prepositions), judges' ratings were not collected for the various scale lists in that category.

The second rating phase involved the discrimination of LIWC2007 category word elements. Judges were given category level alphabetized word lists (e.g., all Cognitive Process words) and asked first to indicate whether each word in the list should or should not be included in the high-level category in question. Second, judges were instructed to indicate in which, if any, of the mid-level scale lists the word should be included (e.g., Insight, Causation). Percentages of agreement for judges' ratings were acceptable for all LIWC2007 Category and scale lists (ranging from a low of 86% agreement for Optimism to 100% agreement for Relatives).

After completion of the second judging phase, all category scale word lists were updated by the following rules: 1) a word remained on the scale list if two out of three judges agreed and 2) a word was deleted from the scale list if at least two of the three judges agreed. The final percentages of judges' agreement for this second pass ranged from 93% agreement for Insight to 100% agreement for Eating, Metaphysical, Friends, Relatives, and Humans.

Step 3. Psychometric Evaluation. The initial LIWC judging took place in 1992-1994. A significant LIWC revision was undertaken in 1997 to streamline the original program and dictionaries. Text files from several dozen studies, totaling over 8 million words were analyzed using the 1997 version of LIWC as well as WordSmith, a powerful word count program used in discourse analysis. Original LIWC categories that were used at very low rates (less than 0.3 percent of words made up the category) or that suffered from consistently poor reliability or validity were omitted. Several new categories, including social processes, several personal concern categories, and the relativity dimensions, were added following the same stringent judge-based procedures described above (including both passes). Finally, once the entire new LIWC dictionary was assembled, any words that were not used at least 0.005 percent of the time in our previous text files or were not listed in Francis and Kucera's (1982) Frequency Analysis of English Usage were excluded.

Step 4. Updates and Expansions. The most recent version, LIWC2007, involved substantial updating of the dictionaries and modification in the dictionary structure. Drawing on over several hundred thousand text files made up of several hundred million words from both written and spoken language samples, we sought to identify common words and word categories not captured in the earlier LIWC versions. Examining the 2000 most frequently used words, a group of four judges individually and collectively agreed which new words and new word categories were appropriate for inclusion. Based on recent studies suggesting that function words are particularly relevant to psychological processes, we added the categories of Conjunctions, Adverbs, Quantifiers, Auxiliary Verbs, Commonly-used Verbs, Impersonal Pronouns, Total Function Words, and Total Relativity Words. In addition, third person pronouns were divided into 3rd person singular and 3rd person plural. Finally, a large group of punctuation marks have been added as separate categories.

For those who are familiar with LIWC2001, it will be clear that some of the original categories have been removed – primarily because these categories had consistently low base rates and were rarely used: Optimism, Positive Feelings, Communication Verbs, Other References, Metaphysical, Sleeping, Grooming, School, Sports, Television, Up, and Down. The category of Unique Words (also known as Type/Token ratio) has also been removed. This category typically correlates with word count at -.80. Note that an alternative default LIWC2001 dictionary is available.


LIWC2007's External Validity

Assessing the reliability and validity of text analysis programs is a tricky business. On the surface, one would think that you could determine the internal reliability of a LIWC scale the same way it is done with a questionnaire. With a questionnaire that taps anger or aggression, for example, participants complete a self-report asking a number of questions about their feelings or behaviors related to anger. Reliability coefficients are computed by correlating people's answers to the various questions. The more highly they correlate, the reasoning goes, the more the questions all measure the same thing. Voila! The scale is deemed internally consistent.

A similar strategy can be used with words. The LIWC Anger scale, for example, is made up of 184 anger-related words. In theory, the more people use one type of anger word in a given text, the more likely they should be to use other anger words in the same text. To test this idea, we can determine the degree to which people use each of the 184 anger words across a select group of text files and then calculate the intercorrelations of the word use. Indeed, in Table 1, we include these internal reliability statistics, including those of Anger where the alpha reliability ranges between .92 (binary method) and .55 (uncorrected) depending on how it is computed. The internal reliability statistics are based on the correlation between the occurrence of each word in a category with the sum of the other words in the same category. The binary method converts the usage of each of the single words within a given text into either a 0 (not used) or a 1 (used one or more times). The uncorrected method is based on the percentage of total words that each of the category words are used. The binary method has the potential to overestimate reliability based on the length of texts; the uncorrected method tends to underestimate reliability based on the highly variable base rates of word usage within any given category.

But be warned: the psychometrics of natural language use are not as pretty as with questionnaires. The reason is obvious once you think about it. Once you say something, you generally don't need to say it again in the same paragraph or essay. The nature of discourse, then, is we usually say something and then move on to the next topic. Saying the same thing over and over again is generally bad form.

Issues of validity are also a bit tricky. We can have people complete a questionnaire that assesses their general moods and then have them write an essay which we then subject to the LIWC program. We can also have judges evaluate the essay for its emotional content. In other words, we can get self-reported, judged, and LIWC numbers that all reflect a participant's anger. One of the first tests of the validity of the LIWC scales was undertaken by Pennebaker and Francis (1996) as part of an experiment in which first year college students wrote about the experience of coming to college. During the writing phase of the study, 72 Introductory Psychology students met as a group on three consecutive days to write on their assigned topics. Participants in the experimental condition (n = 35) were instructed to write about their deepest thoughts and feelings concerning the experience of coming to college. Those in the control condition (n = 37) were asked to describe any particular object or event of their choosing in an unemotional way. After the writing phase of the study was completed, four judges rated the participants' essays on various emotional, cognitive, content, and composition dimensions designed to correspond to selected LIWC Dictionary scales.

Using LIWC output and judges' ratings, Pearson correlational analyses were performed to test LIWC's external validity. Results, presented in Table 1, reveal that the LIWC scales and judges' ratings are highly correlated. These findings suggest that LIWC successfully measures positive and negative emotions, a number of cognitive strategies, several types of thematic content, and various language composition elements. As can be seen in Table 1, two LIWC-judge correlations are presented. The first, Judge 1, is based on overall ratings of the entire essay set (210 total essays across conditions). The second correlation, Judge 2, refers to the mean within-condition correlation - a much more stringent test of reliability. The level of agreement between judges' ratings and LIWC's objective word count strategy provides support for LIWC's external validity.


Base Rates of Word Usage

In evaluating any text analysis program, it is helpful to get a sense of the degree to which language varies across settings. Since 1986, we have been collecting text samples from a variety of studies – both from our own lab as well as from 28 others in the United States, Canada, and New Zealand. For purposes of comparison, six classes of text from 72 separate studies were analyzed and compared. As can be seen in Table 2, these analyses reflect the utterances of over 24,000 writers or speakers totaling over 168 million words. Overall, 29 samples are based on experiments were people were randomly assigned to write either about deeply emotional topics (emotional writing) or about relatively trivial topics such as plans for the day (control writing). Individuals from all walks of life – ranging from college students to psychiatric prisoners to elderly and even elementary-aged individuals – are represented in these studies. A third class of text was based on 113 highly technical articles in the journal Science published in 1997 or 2007. A fourth sample included 714,000 internet web logs, or blogs, from approximately 20,000 individuals who posted either on in 2004 or in the summer and fall of 2001. The fifth sample was based 209 novels published in English between 1700 and 2004. The American and British novels included best-selling popular books as well as more classic novels. Finally, we analyzed data from seven observational studies in which participants were tape-recorded while engaging in conversations with others. The speech samples ranged from transcripts of people wearing audio recorders over days or weeks, strangers interacting in a waiting room, to couples talking about problems, to open-air tape recordings of people in public spaces.

As can be seen in Table 3, the LIWC2007 version captures, on average, 86 percent of the words people used in writing and speech. Note that except for total word count and words per sentence, all means in Table 3 are expressed as percentage of total word use in any given speech/text sample. Simple oneway ANOVAs indicated that word usage was significantly different across the four settings for all of the word categories.