Crystal Clear Ethics Blog: Investigating Research: “An Empirical Research Study of the Efficacy of Two Plagiarism-Detection Applications”

About the Research Study…

In the last couple of posts I examined both philosophical theories behind personal decision making and flaws in the educational system that could serve as outside influences. The webinar I reviewed clearly shifted the blame for unintentional academic honesty violations to higher education institutions. I touched on the importance of taking an introspective look at library practices, but to know where to improve I felt that I first needed a foundational knowledge of the current efforts taking place. Thus, for this post I decided to examine a popular method of discouraging dishonesty—plagiarism detection software.

Every college student has probably encountered “anti-cheating” software at some point, if only as a warning of its use in the classroom. Technology has changed both the way that students cheat (buying papers off the internet, copying and pasting sources from online articles, etc.) and the way that instructors choose to catch dishonest acts. I felt that such a universally widespread method needed further investigation. Do these programs actually discourage dishonesty among intentional cheaters? Do the programs themselves actually work or do they flag a lot of false positives? How much should educators or librarians rely on this software? Is one program better than another at detecting honesty violations? Is the use of technology to discourage or detect cheating an effective practice or should some other method be considered? These were the questions I intended to answer when I started this research journey.

Key Highlights:

For the purposes of this posting, I chose a study by Hill and Page (2009) that compared two of the most widely-used plagiarism detection programs on the market—SafeAssign and Turnitin. The researchers asked for student volunteers who submitted papers they claimed were 100% plagiarism-free. Using these works as a baseline, the researchers purposely added in material from online sources. Their additions were copied word-for-word to give the software the best chance at detecting a violation. The new and “improved” documents were then run through both programs to see what would be detected and what would be falsely flagged.

As it turns out, there was a significant difference beween the two programs. Turnitin was the far superior software with 82.4% of the plagiarized material being detected, compared to SafeAssign’s 61.9%. Additionally, Turnitin was much less likely to identify false positives, triggering 9.9% of the time compared to more than double that rate (23.6%) from SafeAssign. Interestingly, neither program was able to detect 100% of the plagiarized content despite the authors using fairly well-known sources that were very likely to be in program’s data bank.

One of the reasons the study authors pointed to regarding SafeAssigns poor efficacy rate was the fact that this program did not attempt to look for quotation marks, thus, it was likely to flag things that were legitimately cited by the original author. Still, this does not explain the lack of detection of the word-for-word items, nor does normalizing this factor mean that the two programs are even. It seems that there is still a major discrepancy among the most commonly-used detection programs.

Graphs created by the blog author.

Brief Reflections:

I suppose what bothered me most about the findings of this study was the fact that neither program was able to detect all of the instances of plagiarized work, even when it was copied directly from the source material. Some of what the software did end up detecting was legitimately cited. It surprised me that one in every ten students in a classroom could be flagged for plagiarism where none exists! This software seems too error prone to rely on, but perhaps that is not the point. Could it be that the actual intention is to act as a deterrent by the mere mention of it? I remember as a child my mother would often threaten to “turn the car around” if I continued to misbehave in the back seat. It was likely a threat without teeth, but it got me to behave myself because the possibility of being punished was there. Is a similar tactic intended when librarians and instructors employ the use of cheating detection software in the classroom? If so, does it actually deter dishonest behavior if the student knows that the program is prone to error?

Hill and Page (2009) mention that perhaps these programs should be viewed as tools instead (in one instance they even compare them to a doctor’s scalpel). While anti-cheating software does have the possibility of (potentially) saving time by pointing out egregious examples, human insight is still needed to determine whether plagiarism has actually occurred. It seems that computers do not have the intelligence (yet) to determine this definitively.

The main take away from this article was the importance of recognizing that detection software (or other cheating deterrents for that matter) is not meant to replace information literacy instruction. Librarians and educators must still put their efforts into explaining why a responsible use of information is important. This includes lessons about how to properly give someone their due credit. One of my favorite parts of the article features a teacher who was using software detection programs in a different way. Instead of using them for enforcement, he instead encouraged his students to submit several drafts and see where they had failed to cite so that they could correct their behavior and learn from it. Personally, I feel that this is the best “ethical” use of these types of programs. It encourages honesty because it allows the student to try and fail without punishment.

Discussion Questions:

Do you feel that the use of cheating detection programs is ethical, given that they are so prone to error? Is it a necessary deterrent? What do you think the best uses of these types of programs are? Do you use them (or would you use them) in your own library?

See the full report about the study behind the cut!

Introduction

Advancements in technology have provided new ways for students to plagiarize works and cheat on class assignments. In an effort to reduce the likelihood of academic dishonesty in an increasingly digital age, many professors and academic librarians have turned to plagiarism detection software. However, the nature of these programs means that they can be prone to error and trigger false positives, implicating a student when there was no clear intention of academic dishonesty.

This manuscript attempts to carefully analyze a research study entitled “An Empirical Research Study of the Efficacy of Two Plagiarism-Detection Applications” by Hill and Page (2009) in which the authors examined the accuracy of two of the most popular detection programs. The paper will also discuss the ethical implications of educators relying on detection software to deter cheating in academic environments.

Underlying Research Questions

According to Hill and Page (2009), the impetus behind their research study was a group of university faculty members who wished to learn more about their options for purchasing detection software. As the campus library was often seen as the central focal point of promoting information literacy and academic honesty, librarians were asked to provide consultation. The two library researchers began by polling nearby institutions to determine what the most popular programs were and what they felt their success rates were with the software they had chosen. Hill and Page found from speaking to these surrounding colleges that “there was no way of knowing how much plagiarized material was not detected or erroneously detected. For example, results indicating a paper contained 20 percent plagiarized content would not show whether the original student paper actually contained 20 percent plagiarized material” (p. 170). Additionally, they discovered from examining other research that there had not been any studies that established control groups when attempting to determine the efficacy of detection programs.

The underlying problem the authors were attempting to solve was determining which detection program should be purchased for the university. Their first research question was whether one popular platform outperformed another in terms of accuracy. For their research purposes they limited their study to two popular programs, Turnitin and SafeAssign. Additionally, from their discussion with colleagues at surrounding institutions, their second research question focused on the nature of the errors the programs were making. In other words, “Was some material simply missed [by the detection software], or was it falsely judged as plagiarized?” (p. 171).

Research Theory/Hypothesis

Judging from the article, Hill and Page (2009) did not adopt a central, guiding theory when developing their research study. Likewise, because the researchers’ institution had never had any kind of cheating detection software, they did not wish to adopt a hypothesis regarding the efficacy of one program over another. The study was exploratory in nature and was designed to use empirical data as the determinant of each program’s efficiency. The literature review section within the article examined several previous research studies. The authors found that there were very few studies that measured and compared the rate of false positives and correct detection in a numerical fashion. Most of the studies the authors cited concluded that plagiarism detection software, while a useful deterrent, was prone to making errors.

Research Sample

Hill and Page (2009) selected a sample set of twenty papers that had been turned in to graduate and undergraduate courses and were verified by each author as being free of plagiarized content. The papers contained between 714 and 4,570 words and spanned a number of years and different topics. The researchers chose a relatively small sample size rather than a large set of randomly-chosen papers because they wanted a benchmark that contained verifiably original materials. By selecting these types of papers, they could introduce their own elements of plagiarism and test how well the programs detected these errors without confounding elements present. They separated the documents into four groups with five documents assigned to each group. The first group in the series was the control group wherein the verifiably original papers went unaltered. The second group had additional content copied directly from webpages found in a Google search. The third group had additional word-for-word quotations from web databases like GoogleScholar, Wikipedia, and PubMed. Finally, the fourth group contained content that was taken from articles found in academically-owned databases like ProQuest.

Study Methodology

After separating the sample set into different groups, Hill and Page (2009) pasted content from topically-related websites and articles into each document word-for-word without attempting to paraphrase or change the order of the words in the selected quote. The authors took recorded counts for each document before and after manipulation. They calculated their own plagiarism percentage by subtracting the word count after from the count before and divided this by the number after times 100 (in other words: (after -before)/after * 100). Overall the researchers found that after alteration about 15% of each paper’s content was plagiarized. Hill and Page then submitted these documents to both Turnitin and SafeAssign to compare the programs’ percentage rates with those that had been calculated by the researchers. The rate that came back from the software programs was further compared by dividing the suspected rate by the actual calculated rate and multiplying this by 100 to establish an accuracy rate (suspected/actual*100). The accuracy rate was averaged for all twenty tests to gain a greater understanding of each program’s overall efficacy level.

The authors stated that they were careful to account for outside variables when selecting the quotes used in each sample. So as not to give either software program an unfair advantage, they asserted that they purposely did not select material from online “paper mills” as neither detection software explicitly listed which websites they had indexed. Additionally, the researchers did not copy content from printed text, as detection materials often have difficulty finding elements of plagiarism from sources that are not also in an electronic format.

Findings

Through analyzing their percentage data, Hill and Page (2009) discovered that Turnitin was far superior to SafeAssign when it came to detecting elements of plagiarism at an average rate of 82.4% compared to 61.9% of the elements detected. Additionally, Turnitin produced far fewer false positives, triggering only 9.9% of the time compared to SafeAssign that would provide incorrect results 23.6% of the time. Neither software program was able to detect 100% of the inputted plagiarized content during any of the tests the authors conducted.

The authors addressed the possible discrepancy in false positive rates by explaining that Turnitin had a built-in option for ignoring quoted materials and automatically assuming these elements were properly cited from a source, whereas SafeAssign lacked this feature. Interestingly, some of the essays submitted to SafeAssign were judged to be over 100% plagiarized, perhaps because the program was picking up on the original paper author’s legitimately cited quotations. These outliers were normalized, but it still subtracted greatly from SafeAssign’s efficacy rate.

Drawn Conclusions and Ethical Implications

When examining the article, it was apparent that there were a few glaring issues with the research study’s design. First, as the author conceded, the sample size was very small, using only twenty papers and two platforms. A sample of this size makes it very difficult to generalize the findings to detection software as a whole. Second, the researchers copied the passages they chose word-for-word and did not attempt to study the software under realistic conditions where a student might be likely to paraphrase or rearrange the word order of copied materials. Finally, the authors did not use the shared institutional database feature, which would have given them access to previous student papers and helped them measure the platforms’ sensitivity to self-plagiarism (from items turned in to previous courses) and the stealing of materials from other students.

Assuming the findings discovered by the researchers are correct, at best, cheating detection programs falsely flag research papers about 10% of the time. This figure means that potentially one in every ten students in a class (if not more, depending on the platform) would have their assignments flagged for plagiarism where none exists. That said, is it ethically responsible for instructors and information literacy librarians to rely on cheating detection programs when they are prone to such rates of error? According to Hill and Page (2009) some of the studies cited in their literature review concluded that “the whole situation [is] bad for higher education… Not only are [software detection programs] ineffective, but some of the products/services promote a real lack of trust and resentment between professor and student…” (p. 171). Corey Turner (2014), an author at NPR, agreed with this position adding that the use of these programs alone for detection purposes was ethically irresponsible because they were not only prone to error, but they did not help an instructor determine if the plagiarism was intentional or simply a lack of information literacy skills. By focusing on the “ends” rather than the “means,” a teacher or librarian could potentially miss an opportunity for addressing gaps in knowledge.

Instead, both Hill and Page and Turner advised using software detection programs as yet another tool in one’s “educator tool belt,” rather than relying on them to help determine a student’s grade. One teacher in Turner’s (2014) article suggested “The software is a scalpel… students [can] use Turnitin on rough drafts, so they can learn from their mistakes. No penalty. No trip to the dean's office.” The instructors in Turner’s article also noted that these programs are meant to be time savers, merely indicating what may be similar to content that exists elsewhere rather than providing a percentage of the paper that was outright plagiarized. Turner asserted that onus should still be on the instructor to look at the links the software provides to determine if the content was used in a malicious manner. “The assumption that the detection of plagiarism serves as “proof” is incorrect, as the results still require interpretation… Faculty and staff training is essential to avoid potential negative assumptions arising from documents flagged for plagiarized content,” Hill and Page (2009, p. 178) added. Additionally, Hill and Page noted that while programs like SafeAssign and Turnitin could serve as deterrents merely from their mere mention in class, “The use of these tools does not negate the need for instructors to educate their students about the rules regarding proper citation and usage of others’ materials within their scholarly work” (p.179). Thus, though plagiarism detection software is prone to high rates of error, professors and librarians can be ethical about their use of it in the classroom if they recognize these programs for what they are (time-saving tools, not a replacement for instruction) and are willing to put in the time to investigate claims and teach more responsible ways of using information.

Post References:

Hill, Jacob D., and Elaine Fetyko Page. 2009. “An Empirical Research Study of the Efficacy of Two Plagiarism-Detection Applications.” Journal of Web Librarianship 3: 169-181.

Crystal Clear Ethics Blog

Pages

Thursday, April 23, 2015

Investigating Research: “An Empirical Research Study of the Efficacy of Two Plagiarism-Detection Applications”

No comments:

Post a Comment