No words for my reality
I’ve been running a 5,066-entry wordlist though an algorithm designed to rank the words in descending order of fitness according to a gauntlet of tests. Each word is assigned a net score between 0 and 1 that indicates how well it is able to filter distractors (unrelated terms) from targets (related terms).
I first ran my wordlist using a table of relatedness values obtained from Factiva. I also ran it using VGEM, an algorithm developed at CogWorks. VGEM still uses the Factiva data, but the relatedness of two terms is based on each term’s relatedness to a handful of dimension words.
| Factiva word | score | VGEM word | score | ||
|---|---|---|---|---|---|
| 1. | portrait | 0.1 |
democratic | 0.331102 |
|
| 2. | flu | 0.1 |
meanwhile | 0.329729 |
|
| 3. | chorus | 0.009 |
governor | 0.327759 |
|
| 4. | purple | 0.008896 |
saying | 0.326760 |
|
| 5. | bless | 0.008888 |
opposition | 0.323261 |
|
| 6. | sheep | 0.008860 |
tomorrow | 0.318894 |
|
| 7. | thriller | 0.008571 |
crisis | 0.318799 |
|
| 8. | knee | 0.008571 |
victory | 0.317104 |
|
| 9. | mask | 0.008333 |
democracy | 0.313899 |
|
| 10. | lottery | 0.008182 |
fell | 0.309087 |
Patterns?
Perhaps I’m imagining things, but the top words in each result set have some similar themes. Here are some I’ve picked from the straight Factiva set, in order (though I have skipped several words):
flu, bless, thriller, twilight, superior, ending, bald, sting, scream, unhappy, predator, amuse, scare, tired, worry, laugh, horror, hurt, wet, cry, scary, sick, illness, defeat, dawn, bee, monster, quit, sudden, hurry, hell, hairy, flame
Doesn’t seem like the most cheerful list of words. Compare to the VGEM words:
democratic, governor, opposition, tomorrow, crisis, victory, democracy, fell, rising, regime, parliament, boss, constitution, corruption, deputy, winning, advertiser, poverty, ruling, petrol, presidential, commissioner, disaster, spokesman
These are extremely government-oriented, though also with a fairly negative slant. One explanation is that the Factiva data is based on an aggregation of news sources, and those terms probably show up in news articles frequently.
Perhaps these lists together suggest something about American news: we focus on despair, plight, corruption, and bees.
And now some trivia
The word that appears first in both lists is “sugar.”
According to Factiva, “conditioner” is four times as important as “shampoo.” VGEM says it’s only twice as important.
Factiva seems to have a lot to do with bees (probably thanks to their recent panic-inducing disappearance), ranking “sting” and “bee” highly. It also suggests a snack recipe of coke, cereal, and onion dip with an ounce of sugar, grapes, and milk. (Though it continues to suggest that the same recipe led a prince to become violently sick, followed by the words “bathroom, sudden, smooth, hurry, hell”.)
Even-toed ungulates get more representation than they deserve: sheep (and separately, lambs), goats, and cows show up as the top animals (along with lions, worms, and butterflies).