Thursday, January 31, 2013

Picture this: what the Biblical gospels have in common

As I begin to show the point of the recent series with all the data analysis, I'd like to translate some of those endless statistics into pictures. That should make the point more visible and easier to understand without having to refer back to charts full of statistics.

Because my Venn diagram skills are basic, I kept this chart basic as well: it shows the ten most-used keywords in each of the Biblical gospels. (It's tempting to try to pack in more information. The circles could change size to reflect the size of the original document, or the words might change size or color. Tempting, but for the moment it's overkill.)

This picture gives you a high-level overview of the four Biblical gospels. They have a lot in common, and all four of them share the same most-used keyword: Jesus. They also each have areas that are uniquely their own.

As I go forward, I need to add a few more word clouds to the ones available here on the blog, and then I intend to put together some additional pictures to illustrate what you really see when you compare certain things objectively.

Sunday, January 27, 2013

The Gospels, the Tao, and the Analects: Comparison

There are many reasons we might want to compare two documents to see how much they cover the same material. We have looked at the Biblical gospels in comparison to each other, and to one of Paul's letters. We have compared the combined gospels to the Torah. We have looked at how the Biblical gospels compare to a Gnostic Gospel. Here we take it to the next step: What do we see when we compare the gospels to the texts of other religions?

While I eventually want to analyze far more texts than these, I started by comparing two Biblical gospels (Mark and John) to two eastern texts (the Tao Te Ching and the Analects of Confucius). Full disclosure: I'm fond of both the Tao and the Analects, and am starting here because I am glad for a chance to re-read them and review them again. I considered writing up the comparisons separately for the Analects and the Tao, but there is more that comes to light when the comparisons are reviewed side-by-side.

Summary of Results 

First, comparisons of two Gospels and the Tao
Gospel of Mark and the Tao: 7% shared emphasis (or 10% if "teachers" and "sages" are matched)
Gospel of John and the Tao: 9%

Next, comparisons of two Gospels and the Analects
Gospel of Mark and the Analects: 22% shared emphasis
Gospel of John and the Analects: 18% shared emphasis

For those interested, comparison of the Tao with the Analects:
16% shared emphasis, or 20% if "Master" and "sages" are matched.


The Gospels and the Tao have so low a match that it barely registers. The match between Mark and the Tao is the result of only 5/48 words from Mark's keywords list: people, teachers, things, called, and heaven. Again, the match between John and the Tao is the result of only 5/44 words from John's keywords list: world, life, things, people, called. We may know that both are on the general topic of teaching people about life, the world, and heaven -- a very high-level, summarized type of common ground.

The Analects, on the other hand, have a noticeably higher match. For Mark and the Analects, there are 9/48 words matched: man, asked, people, replied, things, heard, called, heaven, others. For John and the Analects, there are 9/44 words matched: man, asked, love, replied, heard, things, people, speak, called.

So the reason the Analects is more similar to the Biblical gospels is mainly from the basic framework of the documents: the Analects, like the gospels, narrate someone's teachings through their conversations with others. I would wonder whether there would be a similar patten found for any writings that record dialogue-style conversations, especially teachings.

For "called", it should be mentioned that a word may have more than one meaning, and a next-generation version of this tool would eventually need to take that into account. A disciple may be "called" by Jesus, and an act may be "called" virtuous, without "called" really meaning the same thing. That is to say, this version of the tool may slightly miss its estimate since it does not have that kind of precision yet.

For a little more perspective, when we compare the Tao to the Analects, we find 8/50 keywords matched: people, virtue, called, things, heaven, wish, state, words. If we consider "sages" and "Master" as a match -- which is debatable -- that would be 9/50 keywords matched.

The Tao and Analects share some things with each other that they do not share with Mark or John. To take one example, they share an emphasis on "virtue". Anyone who has read the gospels knows that human "virtue" is found under different words; it is not necessarily easy to say which is the closest match. Do we compare the call to be "righteous" or "perfect" or "holy"? Or do we note that the gospels take a different approach from discussing the hypothetical man of virtue? These are not questions I will pretend to answer in a mathematical analysis of word frequencies. There are some kinds of questions that the mathematical analysis may answer; for others, it simply brings to our attention other areas that deserve a look.

The Tao and the Analects were written in different languages than the Biblical gospels; the comparisons have all been done from English translations. (Beyond that, they also came from different cultures and were speaking to different contexts.) I don't apologize for comparing them in English since eventually we have to find a common platform on which to compare them. Part of the job will be to keep that common platform from distorting the picture, no matter which common platform is chosen. The original languages and cultures will need to remain part of the picture.

Tuesday, January 22, 2013

Word Cloud: Analects of Confucius

created at

Sunday, January 20, 2013

The Biblical Gospels and a Gnostic Gospel: Comparison

I have been exploring what you can learn from an objective, mathematical analysis of documents like the gospels to get an idea of their core subject matter, and how that compares to other documents. For the next step, I would like to compare two Biblical gospels to a Gnostic gospel. (Given time, I'd like to compare each of the Biblical gospels to each of the alternative gospels, but I have to start somewhere.)

Summary of Results

Gospel of Mark and Gospel of Truth: 10% shared emphasis match
Gospel of John and Gospel of Truth: 22% shared emphasis match if "sin" and "deficiency" are considered different things; that would be a 23% match if "sin" and "deficiency" were considered matching.

The less-precise estimates, the "Shared Word Estimates", were 7/48 for the Gospel of Mark and Gospel of Truth, and 12/44 for the Gospel of John and the Gospel of Truth (or 13/44, if we consider "sin" and "deficiency" a match). I'm curious whether there is a bigger gap between the two kinds of estimates in some circumstances, though the methods I'm developing are still a little bit new for me to have a real feel for the differences there.


I chose two different Biblical gospels for the comparison to a Gnostic gospel because I was fairly sure that would highlight some features of the documents. The Gospel of Mark sticks more closely to telling events, while the Gospel of John reflects more on the perceived meanings, while still narrating some events. The Gospel of Truth does not narrate Jesus' life; in fact "Jesus" does not show up in the key words list, while it is first in all four of the Biblical gospels. But the Gospel of Truth is reflective in nature, pondering over the perceived meaning of things. These differences show up in that the Gospel of Truth is more similar to the Gospel of John than the Gospel of Mark. (It is still less similar than, say, the Torah is to the combined gospels.) I'd remind the reader that the mathematical tools are fairly new and are still being calibrated; we'd have to look at a good number of documents to see what normal ranges might be and get a clearer idea how to interpret the numbers.

For the Gospel of Mark, there are only 7 words contributing to the match to the Gospel of Truth, words that are high frequency in both documents: son, spirit, gave, things, called, father, truth. These are fairly generic words (for example, "thing" could be anything), and none of the matches is as much as 3% of the high-frequency words in Mark's gospel.

For the Gospel of John, there are 12 words contributing to the match to the Gospel of Truth: father, son, truth, comes, things, light, spirit, whom, gave, himself, speak, called. While the Gospel of John's top word, "Jesus", is not important in the Gospel of Truth, the second-place word from the Gospel of John, "father", is the first-place word in the Gospel of Truth. The three words "father", "son", and "truth" account for roughly half of the "matched emphasis" of these documents. There is a question whether "sin" and "deficiency" should be considered a match; it makes roughly a 1% difference in the match rate.

Moving Forward

I suspect this kind of study -- where I select certain documents and run comparisons -- is more interesting to me than the reader, but I have a few more comparisons in mind before I'll be able to make the point clearer. If you'll bear with me patiently, I think the end result will be more generally useful. I'm hoping that, shortly, this post will be like any long and complex math problem where you "show your work". That is, the work itself is mostly shown so that, when you get to the ending conclusions where people ask themselves, "Is that right?", everybody interested can see exactly how you got those results, and check it for themselves. It is necessary to show the work, and have a completely objective and transparent method -- especially if the objective results are not always in line with conventional wisdom.

Friday, January 11, 2013

Beyond the New Testament: Comparing the Biblical Gospels to the Torah

For our next step, I compared the gospels to the Torah. That is, I compared the combined texts of Matthew, Mark, Luke, and John to the combined texts of Genesis, Exodus, Leviticus, Numbers, and Deuteronomy.

The Short Version of the Results

Shared Word Estimate (12/46) = 26% or (11/46) = 24%, depending on whether or not we recognize a match between the word "Israelites" in the Torah and the word "Jews" in the New Testament. We're working with differences in language, history, and culture; you could argue that decision either way. So the results are presented both ways.

Shared Emphasis Estimate: 25% or 24%, again depending on whether "Israelites" and "Jews" are considered a match.

Different than Comparisons within the New Testament

Between the Torah and the Biblical Gospels, we have a match on emphasis that is roughly 24-25%. It is only slightly less than the match rate between Paul's letter to the Romans and the Gospel of Luke, though again far less than matches among the gospels.

Many of the differences are so expected that I will mention them without much comment. While the gospels discuss Jesus and his disciples, the Torah discusses the patriarchs and Moses. The context is different: the gospels have "Jerusalem" or the "house", while the Torah has "Egypt" and "tent". The language of ancient ritual sacrifice has a set of words that are common in the Torah but not in the gospels: offering, altar, sin, blood, fire, holy, gold, burnt, grain, animal. ("Sin" is at the low end of common words in the individual gospels of Matthew and John, but does not make the common words list of the combined gospels. The words specific to the ancient sacrifices are not common in any of the gospels.)

When we take a closer look at the areas that are in common, we see most of the shared emphasis coming from the words "God" and "Lord", "father" and "son", "man" and "people" -- and "priests". Most of those matches call for a closer look and further thought.

If "Lord" generally means "God" in the Torah, but sometimes "God" and sometimes "Jesus" in the gospels, then do those words mean the same thing and should those words really match? A Christian might say yes, a Jew might say no, and both might agree that this difference is critical. It is important not to take the mathematical studies in a way that depends on someone's personal viewpoint; it defeats the purpose of a mathematical review. Someone who is an analyst first might note that both documents have an emphasis on "Lord" as an important figure, while allowing that the different authors may have different ideas about who exactly the Lord is. And that should be a fair observation from anyone's point of view.

If "father" and "son" are used in numerous accounts of family and genealogies in the Torah, does it really count that they are also common words in the gospels, where the words might mean God and Jesus? Then again, the gospels also contain two Jewish-style genealogies. Never mind for the moment whether they match each other or whether you believe either of them; the same could be said of other genealogies. The point for a computerized word comparison is not whether you can post the family tree on a genealogy site; the point is that some of the structure of the gospels of Matthew and Luke -- the structure that includes a genealogy -- is part of a convention that goes back to the Torah. So the commonness of "father" and "son" language in the gospels is not entirely foreign to the Torah, and has some of its roots in the Torah. God is also referred to as "father" in the Torah (see Deuteronomy 32:6, though I have certainly not yet made a complete check to find all the references.) So the references to God as "father" in the gospels are not completely unheard of in the Torah, and may in some part trace back to a tradition found in the Torah. For matches like this, as they say, the truth is complicated.

Moving Forward

This comparison also brings to light some more features of the tools being used. The more different two documents are, the more questions that arise about even the matches that are found. Here the gospels are written in a culture that was deliberately trying to live according to the Torah and pattern their religious thoughts after the Torah, so the gospels and the Torah were bound to have some similarities. The similarity of the gospels to the Torah was roughly on the same scale as a similarity of Luke to the letter to the Romans, though for different reasons.

As this series goes on with a few more examples, we will likely see a few comparisons that show only a slight relationship if any. When documents are largely different, there is another question that comes to light: are the documents similar enough to warrant a comparison? I won't presume to answer that question so soon, with only a few comparisons completed so far. It may be useful to look at two completely unrelated documents at some point to see if the analysis can detect that that lack of relationship.

Wednesday, January 09, 2013

Beyond the Biblical gospels: comparing Luke and Romans using mathematical models

Thanks to all for your patience while I enjoyed the holidays with lots of family time and remarkably little blogging or research. :)

In this post we take our next step with the mathematical models, and it begins to show different kinds of results. To this point we have been looking at the Bible's four gospels: Matthew, Mark, Luke, and John. To expand our horizons a little, the next document I'd like to consider is Paul's letter to the Romans. It is an early letter within the Christian church, it has been vital in the formation of Protestant Christianity. In modern times the question has become more pointed: Did Paul stay with the direction laid out by Jesus, or was Paul responsible for a change of course? I will not presume to answer that question here, but I will point out some promising pieces of objective information that come to light with this kind of mathematical review.

To compare this letter to a gospel, then, I chose the Gospel of Luke. Since Luke was a companion of Paul's, I thought it could be a productive place to begin.

The Short Version of the Results

Shared Word Estimate (13/52) = 25%
Shared Emphasis Estimate: 27%

Much Different than Gospel-to-Gospel Comparisons

For the first time, all of the matching methods show less than a 50% match -- and here the match is significantly less than 50%. While the gospels consistently had a shared emphasis estimate higher than 50%, Paul's letter to the Romans matches Luke at roughly half that level.

There are several kinds of differences that are immediately seen. We will start at the top of the list with the most common word. The gospels all had the same word as the most common word: Jesus. The letter to the Romans has a different most-common word: God. In fact, "Jesus" doesn't appear until #8 on the list in Romans. However, "Christ" appears higher on the list than "Jesus".

What do we make of the fact that "Jesus" is the common way to speak of Jesus in the gospels, but "Christ" is more common in Paul's letter to the Romans? The word "Christ" does not appear on the common-words list of any of the four Biblical gospels. To be sure, even if the word "Christ" is not prominent in the gospels, still the idea that Jesus is the Christ is well-known from the gospels. They all make a point to explain that Jesus is the Christ, and to demonstrate it. In the gospels, the time when Peter identifies Jesus as the Christ is portrayed as a key teaching, and so is the moment at Jesus' trial where the political leaders ask whether Jesus is the Christ. The Gospel of John even explains that the reason the book was written is "that you may believe that Jesus is the Christ". So the concept of "Christ" is an important idea in the gospels, even though the word is not used often. We could say that calling Jesus by the title "Christ" shows the next stage of logical development after those gospel accounts. That is, calling Jesus "Christ" shows a prior acceptance of those teachings about Jesus. It is, in a way, summary-level talk, to call Jesus by the title of Christ. The gospels are interested in explaining and demonstrating that Jesus is the Christ; for the epistle to the Romans, this has already been explained to the readers' satisfaction and is now part of the foundation on which they build. So here we have a new kind of difference: a difference about the level of detail being used or the logical progression of ideas, whether something is demonstrated or already given. It is a difference in the level of the conversation, and in the starting point of the discussion.

But to what extent is it discussing the same subject matter? The action from the gospels, the physical settings and the people who first heard Jesus are not a large part of the picture in Paul's letter. The book of Romans does not commonly speak of "crowds" and "disciples", or "Peter" and "Mary", or "Jerusalem" and the "house", or "asked" and "answered" in the way that the Gospel of Luke commonly does. The actions from Jesus' life are not being narrated in his letter; the letter is a different type of material. Paul does have some interaction with people in his letter, but he interacts with the people that he expects to read his letter. So while there is no "crowd" in Paul, instead we have Paul's trademark where "greet" is on the common word list in Romans, and there is a small crowd reading the letter. (Anyone who reads a few of Paul's letters will notice that he spends a certain amount of time on personal greetings. We know many early Christians by name because Paul greeted them by name in his letters.)

Still, the differences go deeper. The gospels are all biographies, or we might say the fourth gospel is a memoir and reflection on Jesus' life. As records of Jesus' life, all four gospels share the same most common word: "Jesus". The letter to the Romans, on the other hand, has "God" as the most common word, then "sin" and "law". To be sure, "sin" and "law" are discussed in the gospels -- but not always enough to make the most-common-words list. For "sin" we may remember conversations about sins being forgiven. For "law", there are records of discussions between Jesus and other people over the interpretation of the law. Questions come up about matters of divorce, or tax, or ritual hand-washing, or which are the most important commandments, or a case of capital punishment, or whether certain religious leaders could claim that Jesus was morally in the wrong for performing miracles to heal people on the Sabbath, as it was a kind of work. So "sin" and "law" both have a presence in the gospels, either directly or by example. Paul discusses these ideas at a summary level, where "sin" and "law" are often abstractions. The same might be said of "faith" and "grace". These words are commonly used in Romans where Paul discusses them in a relatively abstract way. In the gospels these same words "faith" and "grace" are not often used directly, but are instead shown in living action.

But the major differences are not limited to the fact that Paul is more abstract, while the gospels show Jesus in action. By Paul's leading words in Romans ("God", "law", "sin"), we see Paul also trying to put Jesus in a context that his readers might know. He explains Jesus against a background familiar to his fellow Jews, back in his day when the Temple still stood in Jerusalem and sacrifices were still offered daily, where people made pilgrimages for the Torah's decreed feasts, where Torah-based Jewish legal courts had some degree of legal authority and might have jurisdiction over some cases, where someone might comment publicly about a lack of morals if someone failed to perform a ritual washing before a meal, where breaking the Sabbath might lead to a formal legal inquiry. We see Paul struggling with the question: For a Jew like him or many of his readers -- learning that Jesus is the Messiah and that the Messiah is about God's love, about grace and mercy, about good news and life -- what does that mean for their old understanding of law and sin? What does that mean for their ideas about righteousness before God?

We also see Paul spending some effort discussing "Jews" and "Gentiles", "Israel" and being "circumcised". What does it mean that even Gentiles are now included in a new covenant with God? What does it mean that Gentiles have a righteousness before God that did not come from the Law of Moses? What does that mean for whether the Law of Moses should apply to Gentiles? On the one hand, if the Gentiles do not need to be circumcised to be in the New Covenant, then is circumcision still relevant? On the other hand, if Gentiles are now numbered among God's chosen people -- which previously had meant Israel -- then is there still any advantage in being a Jew? Paul considers the implication that God has made a covenant for all people through Jesus; and Paul seems to have something of an identity crisis on what it means to be Jewish now, in light of God opening the gate wide to all nations. For him and his concept of his beloved Jewish nation's role in the world, it is not an easy transition to go from being an only child to being firstborn among many. This emphasis raises the question: to what extent was the letter to the Romans about universal themes for all people of all times, and to what extent was that letter meant to speak to the existential crisis of Judaism that Paul saw in God's new covenant for all nations? (In a few places Paul seems defensive of the special role of his people, and mentions several times that things are "first for the Jew" and then for the Gentile. I should mention that Paul's letter to the Romans is not the only philo-Semitic writing in the New Testament. I have become curious whether anyone has actually studied the philo-Semitism of the New Testament. In my readings through the materials, philo-Semitism seems far more prominent than any supposed "anti-Semitism", which is not surprising since most of the authors were themselves Jewish.)

A few advantages of the mathematical comparisons
  • We may be able to determine whether something is rightly a classified as a "gospel" by whether it is mainly focused on Jesus. It may also matter whether the action/narration words, setting, and character names are still in a prominent place. 
  • We may be able to tell that a document is "next generation" material (from a logical point of view) if it starts by assuming that Jesus is the Christ, as shown by a high usage of the word "Christ" compared to "Jesus".
  • We may need to look for relationships between key words -- like between "Jesus" and "Christ" -- where the difference shows that a historically earlier viewpoint is now taken as "given".
  • To interpret the findings correctly, we may need to look for detail v. summary types of differences, or specifics compared to abstractions, like Jesus' kind encounters with various people as opposed to Paul's mention of "grace" or "mercy".
  • The details of the differences between two documents can show, objectively, where the focus of an author lies and bring out themes that might be missed otherwise.