Text Mining in Diplomatic History: Nuclear War, Disarmament, and Detente

Here is a Google N-Grams search including the terms ‘nuclear war,’ ‘disarmament,’ and ‘detente.’

Google N-Grams searches for instances of the search-terms in the Google library database, which includes millions of scanned volumes, journals, and newspapers from libraries and universities across the world. I set the search parameters to begin in 1945: the birth of the nuclear age.

The visualisation of this search is particularly interesting, as within its three lines a number of historical events and themes are represented. The rising Cold War tensions that culminated in the 1962 Cuban Missile Crisis, for example, are shown by the increasing frequency of ‘disarmament’ and ‘nuclear war’ in the Google library database over the late 1950s and early 1960s.

This N-Gram search seems to graphically represent the correlation between rising Cold War tensions and support for more conciliatory policies, including disarmament. (It should be noted that instances of ‘disarmament’ in raw numbers exceeds those of ‘nuclear war’ because there are a number of terms that describe atomic warfare, whereas the policy of disarmament is a more fixed term.)

The term ‘detente’ alongside ‘nuclear war’ and ‘disarmament’ demonstrates the lowering of Cold War tensions from the early to late 1970s. A sharp increase in ‘nuclear war’ instances in 1980 clearly pinpoints the decisive end point of the period of detente, at least as far as language in text is concerned. (The lag in the ‘detente’ curve is explained by the fact that the policy was still discussed in written text, and these discussions are reflected in the ‘detente’ N-Gram frequency.)

The focus of my own research is the 1975 to 1981 period, during which the Committee on the Present Danger sought to educate Americans about the supposed “dangers of detente.” The work of the group and its pro-defense allies is represented in the increasing frequency of ‘nuclear war’ after 1975, when the CPD formed. I would suggest that as the Committee on the Present Danger and its allies conducted their activities discussions of nuclear war – here reflected in Google’s scanned texts – increased.

N-Grams cannot prove that the CPD itself was successful in its aims, but it does illustrate that the conversation about Cold War policies altered in the late 1970s. Alternative archival evidence suggests the Committee on the Present Danger did prompt discussion of nuclear war within the Carter Administration, culminating in the release of PD-59 in July 1980.

The beginning of Ronald Reagan’s and Mikhail Gorbachev’s summitry diplomacy is shown within the significant drop-off in instances of ‘nuclear war’ and ‘disarmament’ after 1986. Could this N-Grams search add credence to the suggestion that the Cold War was all but over after the Reykjavik Summit in 1986? To be sure, significant diplomatic and political hurdles remained, but the language of the cold war – two highly significant terms within it at least – were  falling out of use from this date. There is clearly room for debate about causation, but it is a debate worth conducting.

Ultimately, N-Grams by itself proves little. It is a useful tool, however, to visualise the change over time in the use of the specialised language surrounding a historical issue. “Change over time” is what historians do and I am excited at how I can use N-Grams in my own research.

  1. Mark says:

    I include three in my introduction of my thesis. I like them. They are good for showing change in language over time.

  2. Pascal Venier says:

    This is a really fascinating use of data mining for a diplomatic history research project. I have had great fun plotting some terms of relevance to my own research on the pre-1914 and could not help but note that the searches are case sensitive and therefore the results are slightly different if we use both disarmament and Disarmament, and the different spellings of detente (detente, Detente, détente and Détente), which can be combined in searches as in “detente+Detente+détente+Détente). I was wondering about the difference between the different corpora: there is a English corpus, a British English corpus and an American corpus, do they overlap or documents have been indexes separately. To explain this better: does the English corpus also include the British corpus and the American corpus. Looking at trends in the longer term, I cannot help but note that such a term as rapprochement, is quite frequent also.

