I have been exploring what you can learn from an objective, mathematical analysis of documents like the gospels to get an idea of their core subject matter, and how that compares to other documents. For the next step, I would like to compare two Biblical gospels to a Gnostic gospel. (Given time, I'd like to compare each of the Biblical gospels to each of the alternative gospels, but I have to start somewhere.)
Summary of Results
Gospel of Mark and Gospel of Truth: 10% shared emphasis match
Gospel of John and Gospel of Truth: 22% shared emphasis match if "sin" and "deficiency" are considered different things; that would be a 23% match if "sin" and "deficiency" were considered matching.
The less-precise estimates, the "Shared Word Estimates", were 7/48 for the Gospel of Mark and Gospel of Truth, and 12/44 for the Gospel of John and the Gospel of Truth (or 13/44, if we consider "sin" and "deficiency" a match). I'm curious whether there is a bigger gap between the two kinds of estimates in some circumstances, though the methods I'm developing are still a little bit new for me to have a real feel for the differences there.
I chose two different Biblical gospels for the comparison to a Gnostic gospel because I was fairly sure that would highlight some features of the documents. The Gospel of Mark sticks more closely to telling events, while the Gospel of John reflects more on the perceived meanings, while still narrating some events. The Gospel of Truth does not narrate Jesus' life; in fact "Jesus" does not show up in the key words list, while it is first in all four of the Biblical gospels. But the Gospel of Truth is reflective in nature, pondering over the perceived meaning of things. These differences show up in that the Gospel of Truth is more similar to the Gospel of John than the Gospel of Mark. (It is still less similar than, say, the Torah is to the combined gospels.) I'd remind the reader that the mathematical tools are fairly new and are still being calibrated; we'd have to look at a good number of documents to see what normal ranges might be and get a clearer idea how to interpret the numbers.
For the Gospel of Mark, there are only 7 words contributing to the match to the Gospel of Truth, words that are high frequency in both documents: son, spirit, gave, things, called, father, truth. These are fairly generic words (for example, "thing" could be anything), and none of the matches is as much as 3% of the high-frequency words in Mark's gospel.
For the Gospel of John, there are 12 words contributing to the
match to the Gospel of Truth: father, son, truth, comes, things, light, spirit, whom, gave, himself, speak, called. While the Gospel of John's top word, "Jesus", is not important in the Gospel of Truth, the second-place word from the Gospel of John, "father", is the first-place word in the Gospel of Truth. The three words "father", "son", and "truth" account for roughly half of the "matched emphasis" of these documents. There is a question whether "sin" and "deficiency" should be considered a match; it makes roughly a 1% difference in the match rate.
I suspect this kind of study -- where I select certain documents and run comparisons -- is more interesting to me than the reader, but I have a few more comparisons in mind before I'll be able to make the point clearer. If you'll bear with me patiently, I think the end result will be more generally useful. I'm hoping that, shortly, this post will be like any long and complex math problem where you "show your work". That is, the work itself is mostly shown so that, when you get to the ending conclusions where people ask themselves, "Is that right?", everybody interested can see exactly how you got those results, and check it for themselves. It is necessary to show the work, and have a completely objective and transparent method -- especially if the objective results are not always in line with conventional wisdom.