Thursday, December 19, 2013

Quick Hit: ASEEES Paper Topics

I'm hoping to have something a little more in depth about Soviet dissertations and/or quantifying the topics of plagiarized papers in Russia. In the meantime, I have a bit of data from the conferences of the Association for Slavic, East European, and Eurasian Studies (ASEEES, the former AAASS), the main scholarly association for Russian and Eurasian studies.

What I did for this was parse the records of the national ASEEES/AAASS conventions from 2004 to 2012 and count the words that appear, not counting very common words (stopwords). I don't think the results will shock anyone but they are interesting to see. Russian, Soviet and Russia are by far the three most common words, which reflects the dominance of Russian studies in ASEEES. However, I was a little surprised that Poland and Polish were the 14th and 21st most common. Last observation and then I will post the hundred most common words in the data: Stalin > Dostoevsky > Tolstoy > Pushkin > Putin.

Word Total Frequency
1 RUSSIAN 881 1.58%
2 SOVIET 761 1.37%
3 RUSSIA 639 1.15%
4 WAR 444 0.80%
5 POLITICS 240 0.43%
6 POLITICAL 228 0.41%
7 NEW 228 0.41%
8 IDENTITY 204 0.37%
9 WORLD 203 0.36%
10 STATE 195 0.35%
11 WOMEN 181 0.33%
12 CULTURE 179 0.32%
13 NATIONAL 172 0.31%
14 POLAND 161 0.29%
15 LIFE 159 0.29%
16 HISTORY 158 0.28%
17 LATE 158 0.28%
18 IMPERIAL 155 0.28%
19 EARLY 147 0.26%
20 JEWISH 146 0.26%
21 POLISH 144 0.26%
22 CASE 143 0.26%
23 EUROPE 142 0.25%
24 CENTURY 141 0.25%
25 SOCIAL 140 0.25%
26 UNION 133 0.24%
27 LITERATURE 132 0.24%
28 SOCIALIST 131 0.24%
29 UKRAINE 128 0.23%
30 REVOLUTION 125 0.22%
31 LANGUAGE 122 0.22%
32 EUROPEAN 122 0.22%
33 CENTRAL 121 0.22%
34 EAST 121 0.22%
35 POST-SOVIET 120 0.22%
36 POLICY 119 0.21%
37 MEMORY 118 0.21%
38 UKRAINIAN 117 0.21%
39 CONTEMPORARY 113 0.20%
40 CULTURAL 111 0.20%
41 ART 107 0.19%
42 POETRY 102 0.18%
43 GENDER 101 0.18%
44 EMPIRE 100 0.18%
45 PUBLIC 100 0.18%
46 MOSCOW 99 0.18%
47 ECONOMIC 97 0.17%
48 SOCIETY 97 0.17%
49 COMMUNIST 95 0.17%
50 CINEMA 95 0.17%
51 SPACE 94 0.17%
52 POWER 93 0.17%
53 HUNGARY 91 0.16%
54 VIOLENCE 90 0.16%
55 CZECH 89 0.16%
56 LITERARY 88 0.16%
57 WRITING 86 0.15%
58 ROLE 86 0.15%
59 CHILDREN 85 0.15%
60 RELATIONS 84 0.15%
61 FILM 84 0.15%
62 AMERICAN 80 0.14%
63 GREAT 79 0.14%
64 STALIN 79 0.14%
65 INTERNATIONAL 79 0.14%
66 DOSTOEVSKY 78 0.14%
67 EASTERN 78 0.14%
68 ERA 75 0.13%
69 RELIGIOUS 74 0.13%
70 TOLSTOY 74 0.13%
71 FOREIGN 74 0.13%
72 PUSHKIN 71 0.13%
73 LOCAL 71 0.13%
74 READING 70 0.13%
75 GERMAN 70 0.13%
76 NATION 69 0.12%
77 COLD 68 0.12%
78 WESTERN 68 0.12%
79 YUGOSLAVIA 68 0.12%
80 HUNGARIAN 66 0.12%
81 CIVIL 66 0.12%
82 1930S 66 0.12%
83 CITY 66 0.12%
84 FAMILY 66 0.12%
85 MAKING 65 0.12%
86 PUTIN 64 0.11%
87 REVOLUTIONARY 64 0.11%
88 II 64 0.11%
89 CHURCH 63 0.11%
90 MODERN 63 0.11%
91 ASIA 62 0.11%
92 MUSIC 62 0.11%
93 SLAVIC 62 0.11%
94 PERSPECTIVE 62 0.11%
95 STALINIST 62 0.11%
96 PARTY 62 0.11%
97 HISTORICAL 61 0.11%
98 DEVELOPMENT 61 0.11%
99 NATIONALISM 61 0.11%
100 CONSTRUCTION 60 0.11%

Friday, November 1, 2013

Soviet and Post-Soviet History Dissertation Database

For the last month or so I have been putting together a database of the titles, authors and other data on history dissertations written in the Soviet Union and post-Soviet countries. I am now ready to make it available as a web application here. When I have some time, I am planning to posts a little about the topics in the database, the geography of the database, the sources used or possibly something like the great chronological analysis of history dissertations that Ben Schmidt posted last spring (part one here). (I hesitate just because I strongly suspect that the turning point years for dissertations written in the Soviet period will follow the Communist Party's official chronology.)

Before I post a little about what is included in the database, I should put out a little crowdsourcing call: I used a great Python module called Goslate, a Google Translate parser, to generate the English titles of the dissertations. On the whole I think the translations are understandable, if not elegant. But in some cases the translations are just wrong. For example, L'viv translated automatically as Lions. This is where you come in. For each entry you can click on the English title and suggest a new translation. I'm also happy to include other corrections if there are errors.

Here is some information about the sources and general contours of the information in the database:

Sources: I came across listings for Soviet history dissertations in Voprosy istorii a few years ago and thought it would be a cool project to play around with if I could turn the listings into a database. I had the digitizing help of a former student, Aya Bara and EastView's digital archive of the journal. The journal's listings amount to 6,589 candidate and doctoral dissertations from 1945 to 1966 and another 909 doctoral dissertations from 1976 to 1986. The listings includes dissertations from all of the disciplines included under the broader Higher Qualifications Commission (VAK) code for history--Soviet history, "general history" (i.e., non-Soviet), archaeology, ethnography and so on.

The second source I used was Dissercat, a website that sells recent dissertations written in former-Soviet and nearby countries (e.g., Mongolia--thanks for the heads up, Kyle Marquardt). I do not endorse using that site to purchase dissertations. I would instead recommend contacting the scholar whose work you would like to read, if not taking a trip to the Russian State Library the next time you are in Moscow. However, the Dissercat entry is worth checking out, since it often includes the dissertation's introduction, conclusion and bibliography. All told, Dissercat gave 10,868 dissertations on Eurasian history and another 7,420 in other VAK disciplines. These entries begin in the early 1980s (with a handful before) and they include dissertations from as late as this year (2013).

I do not know how complete the listings are. My impression is that Voprosi istorii included all of the dissertations for those years it covered. The Dissercat listings are good for the 2000s but spottier for earlier years, since it is mostly interested in posting dissertations it can sell.

Years: The database includes 2,973 dissertations from 1945 to 1952, 3,622 from 1953 to 1964, 619 (mostly doctoral and mostly from later in the period) from 1965 to 1982, 1,394 from 1983 to 1991, 4,266 from 1992 to 1999 and 12,865 from 2000 to the present. Another forty-seven had no date but are probably from the post-1991 period.

Degree: Most of the dissertations (21,436) are for the lower, candidate degree. The remainder (4,350) are doctoral degrees.

Cities: I should note that the cities listed are the place of work of the author in the Voprosi istorii listings that include a city. In the Dissercat listings, the city is probably where the dissertation was defended. Overall, there are roughly 200 different cities listed as being associated with these dissertations. About a quarter are from Moscow (6,187) and the next largest number are from Petersburg/Leningrad (1,764). But the largest number are of those that did not list a city at all (6,590). This is especially unfortunate because it includes most of the listings from 1945-1964.

Institutions: Instead of giving cities of defense/work, many of the the earlier listings included the institutions whereas the later listings mostly do not. Of those 6,885 that do list an institution, Moscow State University (MGU) has the largest number--1,028. MGU is followed by the Academy of Social Sciences of the Communist Party (AON) with 810, Leningrad State University with 519 and the Institute of History of the Academy of Sciences of the USSR with 457.

And just for fun, here are the top sixty-five substantive words (prepositions, articles and words like "late" or "early" removed) that occur in the English translations:

Word Occurrences % of Total
1 russian 2973 0.76%
2 development 2650 0.68%
3 russia 2242 0.57%
4 party 2015 0.52%
5 history 1955 0.50%
6 policy 1930 0.49%
7 soviet 1904 0.49%
8 war 1829 0.47%
9 struggle 1790 0.46%
10 political 1774 0.45%
11 first 1731 0.44%
12 during 1702 0.44%
13 region 1594 0.41%
14 formation 1462 0.37%
15 state 1348 0.35%
16 historical 1340 0.34%
17 social 1338 0.34%
18 activities 1295 0.33%
19 relations 1270 0.33%
20 1917 1245 0.32%
21 materials 1227 0.31%
22 movement 978 0.25%
23 xviii 944 0.24%
24 province 902 0.23%
25 great 866 0.22%
26 education 854 0.22%
27 socialist 839 0.21%
28 communist 820 0.21%
29 cultural 792 0.20%
30 national 765 0.20%
31 siberia 744 0.19%
32 revolution 715 0.18%
33 culture 712 0.18%
34 economic 699 0.18%
35 world 698 0.18%
36 experience 678 0.17%
37 military 677 0.17%
38 republic 667 0.17%
39 system 662 0.17%
40 role 647 0.17%
41 ussr 621 0.16%
42 patriotic 617 0.16%
43 western 612 0.16%
44 problems 612 0.16%
45 1918 608 0.16%
46 life 601 0.15%
47 organization 600 0.15%
48 industry 588 0.15%
49 east 587 0.15%
50 public 579 0.15%
51 volga 576 0.15%
52 foreign 572 0.15%
53 population 556 0.14%
54 central 555 0.14%
55 north 540 0.14%
56 bolshevik 535 0.14%
57 1920 527 0.14%
58 urals 527 0.14%
59 moscow 526 0.13%
60 government 524 0.13%
61 power 514 0.13%
62 union 499 0.13%
63 caucasus 495 0.13%
64 society 485 0.12%
65 historiography 474 0.12%

Sunday, October 13, 2013

Slavic-specific resources for digital scholarship

I've just had a lesson come out on automatic transliteration of Cyrillic sources in The Programming Historian so I thought that I would devote this post to shameless self promotion. Then I decided I should also write a little about some of the tools I use to build databases from web information and create visualizations. I'll pay particular attention to resources that I have found useful for Russian/Eurasian history.

The bulk of my programming I do in a language called Python. Compared to other languages like Java or C, Python syntax is closer to natural language, making it easy to understand. The open-source community has put together many modules for Python, some of which are quite powerful and useful. There is a great Python full-length course available free at Udacity (Computer Science 101) and Code Academy has good interactive lessons for mastering syntax (not only for Python, by the way). The Programming Historian has lessons for Python and other tools geared toward humanities scholars who would like to learn specific skills (e.g., counting the words in a set of documents or downloading a set of web pages) without learning an entire language.

What makes Python indispensable for me is its ability to extract data easily from web pages. Using a parsing module called Beautiful Soup, you can, for example, go through all three million names in Memorial's list of gulag victims and generate a table (and then a map and a blog post) of the sources of the entries in about a dozen lines of code. The Programming Historian has two lessons that deal with Beautiful Soup.

These are general tools for generating and manipulating data--so what is there that is field specific? There are a couple tools I have found especially useful for my work with sources in Russian. The first is that a transliteration module I wrote for Python and wrote up for The Programming Historian. One of the challenges of doing digital scholarship in Russian or other languages that use Cyrillic characters is that computers like the American Standard Code for Information Interchange (ASCII), the set of characters based on the English alphabet. For this reason, I developed code that takes a block of text and transliterates whatever characters are in Cyrillic into Latin characters using the modified Library of Congress standard historians write with. It is also possible to use a program like this for working with other alphabets. I've found this module useful in my own work, especially for transliterating large numbers of names for a non-Russian reading audience or for a non-Russian reading computer.

A tool I use often is geocoding with both Google's and Yandex's geocoding application programming interfaces (API). Using JavaScript, you can create dynamic webpages with maps by coding the locations into the page itself, by having users input the location or by using a database (e.g., Google Fusion Tables [update 6/2021: Fusion Tables is was discontinued. ArcGIS is what I am using now for easy mapping of lots of points.). Through either service, you can use a Python module to access the latitudes and longitudes of locations. In general, I have found that Yandex (which I use for Python with the Yandex-Maps module) will be more reliable and provide the coordinates for more locations within the former Soviet Union.  However, Google (which I use for Python with the PyGeocoder module) is better elsewhere. Once I have these locations, it is easy to upload them to a Fusion Table and place them on a webpage through the Google Maps API.

The last tool--related to the previous--is Google's Geocharts, one of the many charts available through Google Charts. Again, accessing this involves JavaScript. It is mostly cutting and pasting code in Google's tutorials but you can do some more interesting things if you know how to read JavaScript. And JavaScript is a nice thing to know anyway and can be learned quickly at Code Academy. Geocharts generates a density map or a marker map where the size of data is correlated to color or size of the marker, respectively. What makes it especially useful for Russian/Eurasian studies (and area studies in general) is that it can create a map of the world by country, by region (e.g., Eastern Europe) or by country. The Russia map includes the former Soviet Union and at least parts of all the former Eastern European satellites, which makes it a quite useful tool for displaying regional data. Moreover, it can break down the map by province. The problem with Geocharts is that it is quite inflexible. Getting a chart that includes, for example, both Russia and Germany means losing the ability to display province level data for either country. All the same, what makes Geocharts an amazing tool is that it requires just a little more than basic HTML. The GIS software I am familiar with requires a lot more effort to get something as useful as Geocharts.

In sum, there are some great tools out there for learning how to put your computer to better use. Some of these tools are built for English and it can be frustrating trying to do digital scholarship in another language. However, there are ways to get around Cyrillic difficulties and with tools like Geocharts the geography of Eurasia makes it a more useful tool than it is for other regions. The tools I posted here are just the ones I have been using so I'd be interested if anyone else has found anything else out there for dealing with Russia/Eurasia data.

Friday, September 27, 2013

Applause!

Lately I have been thinking about using topic modeling for a project on commemoration of WWII using a large database of documents.* But along the lines of working with large databases of text, I have been impressed by Google's Ngrams, a tool that searches through the Google Books corpus to find the frequency of a word's use for each year. The Russian language corpus is quite good and it can provide some broad insights for historians. In particular I looked up an Ngram for "applause" after a day working with stenographic records from the Stalin era (see chart here):


Graph using three-year smoothing, meaning that results are averaged over the surrounding three years on 
each side.  For example, 1937 is actually the average frequency from 1934-1940, inclusive.

Anyone who has worked with Soviet-era documents or who has seen Soviet propaganda films knows that whenever important figures--above all Stalin--walked into a room, they would be greeted invariably with "applause," "thunderous applause," or (for the biggest names) "thunderous, continuous applause." Sometimes it seemed to me that the life of a party leader in the 1930s must have been a nightmare, surrounded constantly by deafening cheers. And the graph for applause does show a peak in the use of the word in the second half of the 1930s. But it also shows a spike in applause occurs under Khrushchev that then falls off and picks up again in the late Brezhnev era.

So what does this mean? This graph is registering the percentage of the total corpus that "applause" makes up. At the apex of each Soviet leader's authority the frequency of applause peaks. To me, it speaks to the way that applause reflected the certainty of authority. At times when the political hierarchy in the Soviet Union was unclear, who should be applauded was also unclear. Thus, during the post-Stalin and post-Khrushchev collective leaderships, who exactly should get how much applause was an open question.  In transitional times you didn't want to applaud someone (or probably more accurately, to describe that person being applauded in an account of a meeting) who might fall out of power. What this graph reflects is not approval, but agreement about the state of the political hierarchy. If we think about it this way, the question that comes up is why there was so little applause in the late 1940s, when Stalin was firmly in charge. My theory is that as Stalin aged he made fewer official appearances at meetings and therefore there were fewer records of applause.

This example shows that Ngrams on its own isn't enough without knowledge of the period that might provide an explanatory framework.  It also is not especially useful for trying to register the comparative relevance of ideas over time. Applause tops out at a little more than .001 percent of the total corpus in the 1930s. This seems insignificant compared to words like "love" (ranging from about .012 percent before the revolution to .045 percent lows in the 1930s and 1970s) or "death" (~.009 pre-revolution to .003 in the 1970s, with a big spike during World War II). But there is no way to register the relative importance of concepts related to "love" or "death" or "applause" from these numbers. Even using just one word, we have to be wary of changes in discursive practices. Just because "applause" doesn't appear very frequently until the late nineteenth century, it does not mean that the tsars had no authority. It simply signifies that the words that symbolized authority in written language differed from the Soviet period. (Look perhaps at "solemnly"?) Nonetheless, when the usage is consistent over a period of time, Ngrams seems like a useful tool for teaching or even for conceptualizing research.


*Topic modeling uses computer algorithms to sort through large databases of documents and generate sets of words (topic models) that generally appear in the same documents. The allure of this method is that if there was coherence in a large enough database (hundreds or thousands of documents), it seems possible to find some broad connections between documents in that data pool and find avenues for further research. The most popular tool now for topic modeling is called MALLET.  It works well out of the box and because there is a graphical user interface available, it has a lot of potential as an introduction to topic modeling for humanities students. There is a easy to follow tutorial by Shawn Graham, Scott Weingart, and Ian Milligan on installing and getting started with MALLET at The Programming Historian.  Another tool is called Paper Machines, an extension for Zotero.  I am trying to use Gensim, a module that has the advantage of being in the programming language that I like the best (Python) and will be able to handle Cyrillic in Unicode. For more on topic modelling, see the Journal of Digital Humanities. In particular see Ben Schmidt's article, which is an excellent examination of topic modeling's limitations.

Monday, August 26, 2013

Who cares about Stalinist repression? Commemorative databases and regional historical memory

Among of the key movements that came out of the Gorbachev period were those dedicated to remembering the victims of Stalinist repression.  Although organizations like Memorial or the Sakharov Center for Human Rights have since moved to advocate generally for human rights, one of their main functions remains commemoration of the victims of Stalinism, especially by collecting and publishing lists of victims.  Memorial has been particularly active in this regard, putting together a database of almost three million names.  But its list of victims (available here) was for the most part not compiled by the central Memorial organization originally.  Instead, it was researched by regional affiliates and other local groups.  What I am doing today is looking at what regions have been finding these names as a rough assessment of where interest continues in commemorating Stalinist repression in Russia today.

Memorial's database includes not only people who were repressed during the Great Terror (1937-38) but also those exiled during collectivization or deported as members of the so-called "punished peoples" during World War II.  It is an incredible resource.  And Memorial's data (from this database and others) has been used for mapping the sites of the Stalinist repression pretty extensively at gulagmaps.org.  For every entry, Memorial gives as much information as it has about the victim, including name, birthplace, date of arrest or exile and so on.  But for this post, I am really interested in the source of the information itself--who collected the data on these people.

Memorial compiled this list on a decentralized basis.  Each regional affiliate of Memorial or local historical society collected the names of victims and published these names as "kniga pamiati" or a "commemorative book."  And yet not every region contributes equal numbers of victims and some regions contribute none at all.  The introduction to the project explains some of the motivations and limitations of the project in the regions:

"In Russia, the process of collecting and publishing regional commemorative books remains the affair of the regions themselves.  The country has no state program to memorialize the victims of political repression.  There are no normative acts governing the preparation and publication of commemorative books, nor a standardized methodology or criteria for the collection of this data.  Therefore the preparation of the books is varies.  In some places the books are prepared and published by the local administration or various institutions involved in one way or another with [formal legal] rehabilitation..., elsewhere by academic and cultural associations, and elsewhere they are published through the efforts of society with minimal or no support from regional authorities."

Of course, the collection and publication of this data does not correspond perfectly to the interest in commemorating the victims of Stalinism, and that there are other factors involved.  However, I am working from the assumption that there is a general correlation between the number of victims a territory commemorates and the interest in commemoration in those territories.  I think this would generally hold true even in territories where the administration is less amenable to the project.  For example, in regions where official financing for commemorative projects has been minimal, it is likely that the administration is not especially interested in facilitating the release of victims' names but also that the population is not interested in lobbying for these projects.

Using Python, I counted the number of listings for each collection and added collections from the same province together.  Most were easy to assign to a province, although a couple stumped me.  (Does anyone know who put out the databases Pol'skie zakliuchennye vorkutinskikh lagerei or Pol'skie spetspereselentsy v Arkhangel'skoi obl.? They account for almost all of the 67,000 entries I couldn't tie to a region.)  In total, I counted 2,644,774 names from regional historical associations with the other 300,000 published in Belarus, Kazakhstan, Kyrgyzstan, Ukraine and Uzbekistan.  With the data geocoded, I put it on a density map:


It's awfully small here with the provincial breakdown so I would recommend going to the bigger map, especially because I included the data and key for the geocodes there.  (A limitation of Google's Geochart when working with provinces is that it demands the International Organization for Standardization's code rather than the proper name of a province.)  What I expected when I thought of running this test was that Moscow and Petersburg would be the two big hot spots whose commemorative organizations would publish the names of lots of victims.  This result made sense to me because those cities have more resources than other areas and my impression that there is a larger presence of anti-Stalinist cultural elites (e.g., in Memorial or the Sakharov Center).  And the shear size of the population is bigger in Moscow and Petersburg than elsewhere, although I planned to account for that by normalizing the number of victims per capita as of the 2010 Russian census. Yet the regions putting out the largest number of names (and even more so on a per capita basis) were not Moscow and Petersburg but rather places like Tomsk, Komi and Chechnia.  In general, they have one or both of two qualities:
  1. Autonomous ethnic republics and territories
  2. Territories with large (and maybe just as important, infamous) Gulag camp complexes (see the map from 1931-1941 from gulagmaps.org below)



I was surprised by this result at first but I think it makes sense in a lot of ways.  The ethnic republics where the most names of victims of Stalinism have been published are naturally the republics from which large numbers of the titular nationality were deported.  The two leaders are Chechnia and Kalmykia, whose titular nationalities were expelled during World War II.  And of course the contemporary troubles in the Caucasus also contribute to its being of special interest.  But even places like Tatarstan and Bashkortostan, where the titular nationalities were not systematically targeted with repression as far as I know, the numbers listed in their commemorative books are relatively high.  Of the eighty-one territorial units, they ranked fourth and sixth in absolute numbers and fourteenth and nineteenth in per capita numbers, respectively.  I would suggest two explanations for the high numbers in the autonomous republics.  The first is that Stalinist repression has become a touchstone for differentiating a republic's identity from the rest of Russia.  The second is that most of the Caucasians exiled during WWII from these territories went back once freed.

In provinces with high numbers and per capita proportions, connections between Memorial and the police seem to be playing a crucial role in names being published.  In Tomsk the Administration of Internal Affairs contributed 216,926 (!) names of the 240,256 total names listed from that territory.  I tried tracking down information on the connection there but found nothing concrete about it.  My feeling is that in general and in this particular case, it probably reflects the intensity of the efforts in the region to gain legal restitution for the politically repressed.  These efforts are related to the activity of local Memorial affiliates and similar organizations.  (The Tomsk Memorial website is great, by the way, and shows the dedication of that chapter.) [Update 6/2021: This site is now, unfortunately, off the internet. My amateur internet sleuthing suggests that the ownership of the URL lapsed and it was bought by a company trying to sell it back to the original owners. Who knows? A version of the site from 2013 is available at Internet Archive here.]  But they also reflect that many people sent to those areas remained there after their release, whether by choice or compulsion.

What is most interesting to me about these results is that they seem to reflect the site of settlement after repression and not the site of arrest.  In a sense, it makes a nice companion piece to the map of the Great Terror in Moscow that I posted earlier.  That data set allowed me to ask what areas were hit hardest by the terror.  In the bigger list of victims lets me ask what areas still remember the the terror and other Stalinist repressive campaigns.  It makes sense that those areas that have the strongest efforts to commemorate the victims of Stalinist repression are those where the largest number of affected people live today.  It reminds us of the impact those policies had and continue to have on the families of people repressed under Stalin and the regions where they live.

Tuesday, August 20, 2013

Amateur Demography: Human Sex Ratio Edition

I have been toying around with a little Python module that I wrote to turn a spreadsheet with data over time into a density map by region using Google Geochart.  It also codes in the ability to change the map for different data sets with JavaScript.  This is a little dangerous since I found tons of demographic data at the website of the Institute of Demography at the Higher School of Economics. (Self promotion alert: I will be starting a postdoc in history at HSE next month.)  It has tons (thousands) of tables from every census conducted in Russia and the Soviet Union, which makes it a great candidate for mapping.

Ideally, a map will tell a little story and unfortunately some of the data sets don't tell very interesting ones.  The variation over time is minimal and expected.  But there are other data sets that turn into really nice visuals.  Here is one of them--the proportion of men to women in the USSR by republic from 1926 to 1989 (change the category with the dropdown menu):


In the map, fewer men means lighter coloring and a very high ratio of men to women would display as dark red.  I set the scale to start close to the lowest value (.782 men to women in 1959 in Estonia) and the high end near the highest value (1.2 men to women in Turkmenistan in 1926).  

The obvious change I expected was in the first post-war census (1959).  The war was hard on the entire population but it absolutely destroyed the cohort of men who were of ordinary service age, especially at the start of the war.  Catherine Merridale in Ivan's War cited a almost unimaginable statistic that 90 percent of men born in 1921 died in the war.  And you can really see this by comparing the map from 1939 and 1959.  The entire Soviet Union bleeds out in the 1959 census but the contrast is especially strong in the areas where the war was fought.

One other thing popped out at me - the contrast of Russia, Ukraine and Belorussia with the Central Asian republics. (Also, Kazakhstan and Kyrgyzstan were autonomous republics of the Russian Soviet Republic until 1936 and so they do not show up in the 1926 census.  Same with the Baltics and Moldova until 1959.)   I don't know why the ratio of men to women was comparatively (and sometimes actually) high in Central Asia.  The possibility that came to mind was that there was under-reporting of the female population that was part of the general disenfranchisement of the female population in those countries.  Over time, the difference between the republic began to even out, which could suggest that programs to include women in the social-politic culture of Central Asia had some effect (or at least made them legible as citizens to state authorities).  If there are any Central Asianists who have thoughts on this I would be interested in a more informed opinion.  

More of these kinds of maps forthcoming in the next few weeks.