Automating Linguistic Based Cues For Deception Detection

Here is the general outline for reviewing a MIS paper and adding it to this site.


Automating Linguistic-Based Cues for Detecting Deception in Text-based Asynchronous Computer-Mediated Communication


The detection of deception is a promising but challenging task. A systematic discussion of automated Linguistics Based Cues (LBC) to deception has rarely been touched before. The experiment studied the effectiveness of automated LBC in the context of text-based asynchronous computer mediated communication (TA-CMC). Twenty-seven cues either extracted from the prior research or created for this study were clustered into nine linguistics constructs: quantity, diversity, complexity, specificity, expressivity, informality, affect, uncertainty, and nonimmediacy. A test of the selected LBC in a simulated TA-CMC experiment showed that: (1) a systematic analysis of linguistic information could be useful in the detection of deception; (2) some existing LBC were effective as expected, while some others turned out in the opposite direction to the prediction of the prior research; and (3) some newly discovered linguistic constructs and their component LBC were helpful in differentiating deception from truth.


deception - deception detection - linguistics based cue - computer-mediated communication - natural language processing

Authors' Bio (name, school)

Lina Zhou, University of Maryland Baltimore County (UMBC)
Judee K. Burgoon, University of Arizona
Jay F. Nunamaker, University of Arizona
Doug Twitchell, Illinois State University

Problem Statements/Phenomena

The increasing number emails sent over the internet simultaneously increases the number of deceptive messages. The massive number of emails cannot be manually screened for deceptions, and human tendency towards a truth bias lowers accuracy in detection. Tools that augment human deception detection would be quite valuable. In this project they compare truthful messages to deceptive messages to verify a number of reliable indicated that can later be built into software to automate detection. Natural Language Processing (NLP) is a research area that enables computers to analyze and generate languages that humans use naturally. Some mature NLP techniques allow for automatically identifying linguistic-based cues in texts.

Research Questions

Which linguistic based cues discriminate truthful asynchronous text messages from deceptice messages?

Theory Used or Developed

  • Interpersonal Deception Theory (IDT) - This paper summarizes IDT into six strategies that a deceiver employs: (1) quality manipulations, (2) quantity manipulations, (3) clarity manipulations, (4) relevance manipulations, (5) depersonalism manipulations, (6) image-and-relationship-protecting behavior
  • Criteria Based Content Analysis (CBCA) - Developed as one of the major elements of Statement Validity Assessment (SVA), a technique to determine the validity of child witnesses' testimonies in trials for sexual offenses. There are 19 criteria for CBCA which are divided into 4 groups: general characteristics, specific contents, motivation-related contents, and offense-specific elements.
  • Reality Monitoring (RM) - Truthful memory will differ in quality in recalling events than a memory that is made up. Vrij (2000) surveyed 11 deception studies that employed RM. The evidence is not strong for using RM as a deception detection technique.
  • Scientific Content Analysis (SCAN) - Given an adult written statement, SCAN is able to discriminate between criminal investigation statements of questionable validity and those that are probably accurate (Driscoll 1994). Some of the indicators are lack of memory, missing links, connections, spontaneous corrections, and pronouns.
  • Verbal Immediacy (VI) - refers to verbal and nonverbal behaviors that create a sense of psychological closeness or distance.
  • Linguistic Based Cues (LBC)

Hypothesis, Independent Variables, Dependent Variables

H1 - Deceptive senders display higher (a) quantity (b) expressivity (c) positive affect (d) informality (e) uncertainty (f) nonimmediacy, and less (g) complexity (h) diversity (i) specificty of language in their messages than do truthful senders.

H2 - Deceptive senders display higher (a) quantity (b) expressivity (c) positive affect (d) informality (e) uncertainty (f) nonimmediacy, and less (g) complexity (h) diversity (i) specificty of language in their messages than their respective receivers.


Natural Language Processing
a research area that enables computers to analyze and generate languages that humans use naturally


Method Type

Experiment, 2 X 2 repeated measures design varying experimental condition (deceptive, truth) and dyad role (sender, receiver)


Sixty participants were assigned randomly to an experimental condition. The task was a modified version of the Desert Survival Problem (Lafferty and Eady 1974). One participant would deceive the other in solving this task. Emails could be sent from any web-enabled computer.

Subject and Selection Criteria

Freshman, sophomore, junior, and seniors recruited from an MIS class at U of A

Sample Size


Measuring Instrument

27 LBCs were used. An NLP tool called iSkim was used for named entity extraction. CueCal, another tool, derived the value for each individual cue based on iSkim's output.

Major Findings

Deceivers displayed higher quantity of words, verbs, noun phrases and sentences. Their messages were more expressive than their partners and they appeared more informal. Deceptive subjects displayed less diversity at the lexical and content level. They used nonimmediate and uncertain language in the form of less self-reference, more group references, more modal verbs, and more modifiers. Their messages were less complex.

Discussion Summary & Author Recommendations

A computational approach is a valid one. Given a list of computerized cues, deception detection could become available to the layperson. Future research includes conducting experiments like this one with differing tasks.

Why paper is important? Why paper is cited?

This paper explains many important theories related to deception detection. Its findings differ from those of other studies such as the deceivers using more words, verbs, and sentences.

APA Reference

Persistent Link to Library

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-Share Alike 2.5 License.