Back in 2017, Matthew Cooper wrote a post discussing the problems with language maps and outlined some ways to make better maps of language distributions.
One major issue with most modern maps of languages is that they often consist of just a single point for each language – this is the approach that WALS and glottolog take. This works pretty well for global-scale analyses, but simple points are quite uninformative for region scale studies of languages. Points also have a hard time spatially describing languages that have disjoint distributions, like English, or languages that overlap spatially. […]
One reason that most language geographers go for the one-point-per-language approach is that using a simple point is simple, while mapping languages across regions and areas is very difficult. An expert must decide where exactly one language ends and another begins. The problem with relying on experts, however, is that no expert has uniform experience across an entire region, and thus will have to rely on other accounts of which language is prevalent where. […]
I believe that, thanks to greater computational efficiency offered by modern computers and new datasets available from social media, it is increasingly possible to develop better maps of language distributions using geotagged text data rather than an expert’s opinion. In this blog, I’ll cover two projects I’ve done to map languages—one using data from Twitter in the Philippines, and another using computationally-intensive algorithms to classify toponyms in West Africa.