Elemental Words

Last night, my colleague Matthew Hall tweeted

With the recent news of the 7th row of the periodic table being filled I figured this would be a good time to follow up on Matthews request and identify such elemental words.

There are a lot of word lists available online. Being an ex-Scrabble addict, the OSPD came to mind. So using the SOWPODS word list of 267,751 words I put together a quick Python program to identify words that can be constructed from 1- and 2-letter element symbols. (The newly confirmed elements – Uut, Uuo, Uup & Uus – don’t occur in any English words). Importantly, 2-letter elements should exist in a contiguous fashion. This means that a word like ABRI (a shelter) is not an elemental word since it contains Boron & Iodine, but the A and R are not contiguous and so wouldn’t correpsond to Argon. (It could also contain Bromine and Iodine but then the remaining A doesn’t match any element).

The code below takes 4.1s 2.0s to process SOWPODS and identifies 19,698 40,989 “elemental words”. Thanks to Noel O’Boyle for suggesting the use of a regex and directly extracting matches (so avoiding looping over individual words) and Rich Lewis for generating output in element-case.

 12345678910111213141516171819 from __future__ import print_function import sys, re if len(sys.argv) != 2:     print('Usage: code.py WORD_LIST_FILE_NAME')     sys.exit(0)     wordlist = sys.argv[1] words = open(wordlist, 'r').read() print('Dictionary has %d words' % (len(re.findall('\n', words)))) with open('elements.txt', 'r') as eles:     elems = {e.lower(): e for e in eles.read().split() if e != ''} valid_w = re.findall('(^(?:'+'|'.join(elems.keys())+')+?\$)', words, re.I|re.M) print('Found %d elemental words' % (len(valid_w))) pattern = re.compile('|'.join(elems.keys())) elementify = lambda s: pattern.sub(lambda x: elems[x.group()], s) with open('elemental-%s' % (wordlist), 'w') as o:     for w in valid_w:         o.write(elementify(w)+"\n")

Just for fun I also extracted all the titles from Wiktionary, irrespective of language. That gives me a list of 2,726,436 words to examine. After 35s 20s I got 148,211 370,724 “elemental words”.

You can find the code along with the element symbol list and input files in this repository

Update: Thanks to Noels’ suggestion of a regex, I realized my initial implementation had a bug and did not identify all elemental words in a dictionary. The updated code now does, and does it 50% faster

Update:Thanks to Rich Lewis for providing a patch to output matching words in element-case (e.g., AcOUSTiCAl)