Tuesday, December 18, 2012

Comparing Mark and Luke with Mathematical Models

I promise there is a point to these document comparisons. I haven't yet calculated all of the comparisons that I intend, but I have read the documents in question, and I have no doubt that an objective, computer-based comparison like this will turn up interesting results. In the meantime, I did notice a few things when comparing Mark with Luke that might interest the general reader.

The short version of the results

Shared Word Estimate 65%
Shared Emphasis Estimate 64%*
* The originally listed number of 53% had some problems where, for word pairs with large differences, the shared word value might be less than the smaller of the two numbers or even negative. This number should be a more solid reflection of what is shared between the two documents.

There is less similarity between Mark and Luke than we previously saw between Mark and Matthew. In the notes on the Shared Emphasis Estimate, I'll include some notes on where the differences are found.

Notes on the Shared Word Estimate

Mark is a shorter document and has 48 words included in the high-frequency word list, which is limited to words that would make at least a 1% difference in the total as discussed previously. Of those 48 words, 31 are also in Luke's high-frequency words list calculated in the same way. So 31/48 = 65%, rounded to the nearest whole number. Again, since the percentages involved are already effectively rounded when we leave out low-frequency words, it does not seem warranted to use a lot of decimals in the percentage.

Notes on the Shared Emphasis Estimate

Again, the two highest-frequency words are the same between the two documents: "Jesus" and "man". And again Luke's list is broader than Mark's: it contains 52 words in the high-frequency list. When we look at where the differences occur, there are some points of interest.

When comparing Mark's top words to Luke's, there are 17 that are not on Luke's top words list: son, around, anyone, mother, Peter, boat, hands, eat, days, others, sitting, truth, twelve, chief, evil, James, and looked. Then there are the words emphasized noticeably less in Luke than in Mark: Jesus (though still by far the top word) and disciples. Some of the less-used words are related: Peter, twelve, James, and disciples. There seems to be noticeably less emphasis on the disciples in Luke than in Mark. That is consistent with early accounts that Luke was a companion of Paul's, showing less interaction with Jesus' disciples than is found in Mark.

I have noticed one problem with the calculations up to this point: the original calculation can cause two shared words to have a negative net effect, if the difference between the frequencies is larger than the original frequency itself. It may give more accurate results to simply use the smaller of the two frequency scores for the words in question, which may be 0 if the word is not found in the second document. At any rate I will finish up a few more sample comparisons before trying any updates to the calculation.

No comments: