Thursday, February 07, 2013

Word Cloud: Qur'an

Allah(2803) Allah's(127) Moses(173) Pharaoh(85) account(79) angels(87) apostle(76) apostles(107) behold(194) believers(112) book(198) bring(115) brought(91) call(148) children(81) clear(145) command(89) companions(83) created(151) day(510) death(76) deeds(161) earth(406) evil(265) exalted(96) faith(232) fear(272) find(92) fire(187) follow(137) forgiveness(75) full(152) garden(75) gardens(78) gave(92) god(82) good(264) grace(78) great(76) grievous(87) guidance(122) hands(83) hearts(157) heavens(199) judgment(133) knowledge(175) land(87) leave(115) life(213) light(77) lord(935) man(207) men(364) merciful(108) mercy(162) message(114) messenger(171) nay(115) night(103) order(107) path(94) penalty(217) people(508) place(84) power(190) prayer(92) punishment(131) put(96) qur'an(80) receive(85) reject(177) rejected(98) remember(77) reward(156) righteous(108) send(86) servants(87) set(75) show(85) sign(88) signs(324) soul(95) souls(78) time(101) true(109) truth(310) turn(208) turned(78) unbelievers(172) understand(83) verily(310) wisdom(79) witness(95) women(95) word(85) work(75) world(80) worship(136) wrong(114)

Technical notes:

There were some unexpected obstacles in getting this word cloud.
  1. Unfortunately, the Qur'an seems to be too large for my previous free word cloud generator. This has potential to be important, since they may use different formulas to detect word frequency, for example controlling for singular/plural. I may need to re-check the new word cloud generator used here with some of my previous texts to make sure that it gives comparable results. 
  2. From the results of this word cloud generator, I did have to clear out two-word phrases, since that is not currently in our list of things we analyze. 
  3. The translation that I used, though done in modern times, used "King James" English. This was also preventing the word cloud generator from behaving in its normal way, since a word cloud generator will typically filter out words like "you" and "your" and "will" as not really being the most important words in the text, but have no such automatic filtering for the older equivalents like "ye" and "thy" and "wilt". Some careful searching and replacing was done to the text to modernize the language, at least for items common enough to affect the word cloud results. In this way the results were more comparable to all the other texts, which used modern English translations.
  4. The translation also had a large number of words in parenthetical comments, in the way that usually identifies words that are not in the original text but were instead added by the translator. There were a surprising number of such parenthetical words, enough that it would be worthwhile to repeat the exercise with a different translation. 
  5. There were some instances where the text contained abbrevations (such as "a.l.m" for "Alif la mem"). This could potentially throw off the word counts, though it does not occur often enough to throw off the results by much. 
  6. The text used in the current analysis does not include the invocation that the website printed at the top of each surah ("In the name of Allah, Most Gracious, Most Merciful") because that phrase was presented as distinct from the regular text of the surah. That is to say, it was set apart in a separate typeface and a different color, and was above the first numbered verse of each surah. For that reason the invocation was not included as part of the text that was analyzed. I think the text might best be analyzed both with and without the invocation as part of the official text, to see the difference. When given two options, I generally like to see them both. The invocation has only a few words that are not routinely screened out by the analysis filters (name, Allah, most, gracious, merciful), and "Allah" was already the most-common keyword. While I favor more analysis, this current decision would not affect the most-common keyword.
To sum up, it took more hands-on involvement than was ideal before I had results that were reasonably clean, and still there were some specific items that might need a second look. These results should be considered a rough estimate until such issues are researched more completely.

