For the last post I created a map with the birthplaces of Soviet prisoners in Germany based on a German-Russian database. I did this using a Python module called Geocoder. There are three nice features of this module compared to others I have used. First, other modules throw errors if Google (or Yandex etc) cannot find the location. Instead, Geocoder creates an empty object. When processing thousands of locations, not having to restart the script after an error is a big plus. Second, it interfaces with all of the major GIS services (Google, Yandex etc.) and it is easy to code. Third, it somehow gets around using an API key (i.e., registering with Google etc. and remembering the twenty digit encrypted key) in its queries. For novice programmers who need to Geocode lots of places, this is a great module.
The danger of using an automated script to geocode, though, is that Google and Yandex don't know where everything is, and might even give bad results. Companies have different strategies to providing results. Yandex is aggressive about providing coordinates for a query compared to Google. For example, in the last post for Soviet prisoners, I had about 270,000 locations I wanted to find, mostly in the former USSR. I ran a set through Google and Yandex, with the latter pulling results for way more. I assumed that Yandex has better GIS data for the former Soviet Union so I had it do the entire list. It pulled most of the results, something like 250,000. The problem, though, is that Yandex aggressively autocorrects. For a village named Koromenskaia, Yandex assumed I meant Kolomenskaia, a metro station in Moscow. In other cases, Yandex understood Village X, Voronezh Province as Voronezh Province and geocoded to the center of that province.
For a map of monuments uploaded to the memorial cataloging website Pomnite-Nas, I used Google and was fortunate to have only a few hundred that, when placed on a map, were clearly inaccurate. I corrected those few hundred--painstaking work but possible with that number. But with the prisoner map, it was unclear how many listings were inaccurate, if not totally incorrect. With tens of thousands of incorrect listings, it was impossible to know or correct.
I still think Yandex is worth using for people working on post-Soviet republics. For example, I just used Yandex to find a place called Miagit in Magadan province after Google failed. But geocoders should exercise caution when using any of these services or they might find themselvs with false results.
The danger of using an automated script to geocode, though, is that Google and Yandex don't know where everything is, and might even give bad results. Companies have different strategies to providing results. Yandex is aggressive about providing coordinates for a query compared to Google. For example, in the last post for Soviet prisoners, I had about 270,000 locations I wanted to find, mostly in the former USSR. I ran a set through Google and Yandex, with the latter pulling results for way more. I assumed that Yandex has better GIS data for the former Soviet Union so I had it do the entire list. It pulled most of the results, something like 250,000. The problem, though, is that Yandex aggressively autocorrects. For a village named Koromenskaia, Yandex assumed I meant Kolomenskaia, a metro station in Moscow. In other cases, Yandex understood Village X, Voronezh Province as Voronezh Province and geocoded to the center of that province.
For a map of monuments uploaded to the memorial cataloging website Pomnite-Nas, I used Google and was fortunate to have only a few hundred that, when placed on a map, were clearly inaccurate. I corrected those few hundred--painstaking work but possible with that number. But with the prisoner map, it was unclear how many listings were inaccurate, if not totally incorrect. With tens of thousands of incorrect listings, it was impossible to know or correct.
I still think Yandex is worth using for people working on post-Soviet republics. For example, I just used Yandex to find a place called Miagit in Magadan province after Google failed. But geocoders should exercise caution when using any of these services or they might find themselvs with false results.