Tuesday, September 19, 2017

Arguments from authority, and the Cladistic Ghost, in historical linguistics


Arguments from authority play an important role in our daily lives and our societies. In political discussions, we often point to the opinion of trusted authorities if we do not know enough about the matter at hand. In medicine, favorable opinions by respected authorities function as one of four levels of evidence (admittedly, the lowest) to judge the strength of a medicament. In advertising, the (at times doubtful) authority of celebrities is used to convince us that a certain product will change our lives.

Arguments from authority are useful, since they allow us to have an opinion without fully understanding it. Given the ever-increasing complexity of the world in which we live, we could not do without them. We need to build on the opinions and conclusions of others in order to construct our personal little realm of convictions and insights. This is specifically important for scientific research, since it is based on a huge network of trust in the correctness of previous studies which no single researcher could check in a lifetime.

Arguments from authority are, however, also dangerous if we blindly trust them without critical evaluation. To err is human, and there is no guarantee that the analysis of our favorite authorities is always error proof. For example, famous linguists, such as Ferdinand de Saussure (1857-1913) or Antoine Meillet (1866-1936), revolutionized the field of historical linguistics, and their theories had a huge impact on the way we compare languages today. Nevertheless, this does not mean that they were right in all their theories and analyses, and we should never trust any theory or methodological principle only because it was proposed by Meillet or Saussure.

Since people tend to avoid asking why their authority came to a certain conclusion, arguments of authority can be easily abused. In the extreme, this may accumulate in totalitarian societies, or societies ruled by religious fanatism. To a smaller degree, we can also find this totalitarian attitude in science, where researchers may end up blindly trusting the theory of a certain authority without further critically investigating it.

The comparative method

The authority in this context does not necessarily need to be a real person, it can also be a theory or a certain methodology. The financial crisis from 2008 can be taken as an example of a methodology, namely classical "economic forecasting", that turned out to be trusted much more than it deserved. In historical linguistics, we have a similar quasi-religious attitude towards our traditional comparative method (see Weiss 2014 for an overview), which we use in order to compare languages. This "method" is in fact no method at all, but rather a huge bunch of techniques by which linguists have been comparing and reconstructing languages during the past 200 years. These include the detection of cognate or "homologous" words across languages, and the inference of regular sound correspondence patterns (which I discussed in a blog from October last year), but also the reconstruction of sounds and words of ancestral languages not attested in written records, and the inference of the phylogeny of a given language family.

In all of these matters, the comparative method enjoys a quasi-religious authority in historical linguistics. Saying that they do not follow the comparative method in their work is among the worst things you can say to historical linguists. It hurts. We are conditioned from when we were small to feel this pain. This is all the more surprising, given that scholars rarely agree on the specifics of the methodology, as one can see from the table below, where I compare the key tasks that different authors attribute to the "method" in the literature. I think one can easily see that there is not much of an overlap, nor a pattern.

Varying accounts on the "comparative methods" in the linguistic literature

It is difficult to tell how this attitude evolved. The foundations of the comparative method go back to the early work of scholars in the 19th century, who managed to demonstrate the genealogical relationship of the Indo-European languages. Already in these early times, we can find hints regarding the "methodology" of "comparative grammar" (see for example Atkinson 1875), but judging from the literature I have read, it seems that it was not before the early 20th century that people began to introduce the techniques for historical language comparison as a methodological framework.

How this framework became the framework for language comparison, although it was never really established as such, is even less clear to me. At some point the linguistic world (which was always characterized by aggressive battles among colleagues, which were fought in the open in numerous publications) decided that the numerous techniques for historical language comparison which turned out to be the most successful ones up to that point are a specific method, and that this specific method was so extremely well established that no alternative approach could ever compete with it.

Biologists, who have experienced drastic methodological changes during the last decades, may wonder how scientists could believe that any practice, theory, or method is everlasting, untouchable and infallible. In fact, the comparative method in historical linguistics is always changing, since it is a label rather than a true framework with fixed rules. Our insights into various aspects of language change is constantly increasing, and as a result, the way we practice the comparative method is also improving. As a result, we keep using the same label, but the product we sell is different from the one we sold decades ago. Historical linguistics are, however, very conservative regarding the authorities they trust, and our field was always very skeptical regarding any new methodologies which were proposed.

Morris Swadesh (1909-1967), for example, proposed a quantitative approach to infer divergence dates of language pairs (Swadesh 1950 and later), which was immediately refuted, right after he proposed it (Hoijer 1956, Bergsland and Vogt 1962). Swadesh's idea to assume constant rates of lexical change was surely problematic, but his general idea of looking at lexical change from the perspective of a fixed set of meanings was very creative in that time, and it has given rise to many interesting investigations (see, among others, Haspelmath and Tadmor 2009). As a result, quantitative work was largely disregarded in the following decades. Not many people payed any attention to David Sankoff's (1969) PhD thesis, in which he tried to develop improved models of lexical change in order to infer language phylogenies, which is probably the reason why Sankoff later turned to biology, where his work received the appreciation it deserved.

Shared innovations

Since the beginning of the second millennium, quantitative studies have enjoyed a new popularity in historical linguistics, as can be seen in the numerous papers that have been devoted to automatically inferred phylogenies (see Gray and Atkinson 2003 and passim). The field has begun to accept these methods as additional tools to provide an understanding of how our languages evolved into their current shape. But scholars tend to contrast these new techniques sharply with the "classical approaches", namely the different modules of the comparative method. Many scholars also still assume that the only valid technique by which phylogenies (be it trees or networks) can be inferred is to identify shared innovations in the languages under investigation (Donohue et al. 2012, François 2014).

The idea of shared innovations was first proposed by Brugmann (1884), and has its direct counterpart in Hennig's (1950) framework of cladistics. In a later book of Brugmann, we find the following passage on shared innovations (or synapomorphies in Hennig's terminology):
The only thing that can shed light on the relation among the individual language branches [...] are the specific correspondences between two or more of them, the innovations, by which each time certain language branches have advanced in comparison with other branches in their development. (Brugmann 1967[1886]:24, my translation)
Unfortunately, not many people seem to have read Brugmann's original text in full. Brugmann says that subgrouping requires the identification of shared innovative traits (as opposed to shared retentions), but he remains skeptical about whether this can be done in a satisfying way, since we often do not know whether certain traits developed independently, were borrowed at later stages, or are simply being misidentified as being "shared". Brugmann's proposed solution to this is to claim that shared, potentially innovative traits, should be numerous enough to reduce the possibility of chance.

While biology has long since abandoned the cladistic idea, turning instead to quantitative (mostly stochastic) approaches in phylogenetic reconstruction, linguists are surprisingly stubborn in this regard. It is beyond question that those uniquely shared traits among languages that are unlikely to have evolved by chance or language contact are good proxies for subgrouping. But they are often very hard to identify, and this is probably also the reason why our understanding about the phylogeny of the Indo-European language family has not improved much during the past 100 years. In situations where we lack any striking evidence, quantitative approaches may as well be used to infer potentially innovated traits, and if we do a better job in listing these cases (current software, which was designed by biologists, is not really helpful in logging all decisions and inferences that were made by the algorithms), we could profit a lot when turning to computer-assisted frameworks in which experts thoroughly evaluate the inferences which were made by the automatic approaches in order to generate new hypotheses and improve our understanding of our language's past.

A further problem with cladistics is that scholars often use the term shared innovation for inferences, while the cladistic toolkit and the reason why Brugmann and Hennig thought that shared innovations are needed for subgrouping rests on the assumption that one knows the true evolutionary history (DeLaet 2005: 85). Since the true evolutionary history is a tree in the cladistic sense, an innovation can only be identified if one knows the tree. This means, however, that one cannot use the innovations to infer the tree (if it has to be known in advance). What scholars thus mean when talking about shared innovations in linguistics are potentially shared innovations, that is, characters, which are diagnostic of subgrouping.

Conclusions

Given how quickly science evolves and how non-permanent our knowledge and our methodologies are, I would never claim that the new quantitative approaches are the only way to deal with trees or networks in historical linguistics. The last word on this debate has not yet been spoken, and while I see many points critically, there are also many points for concrete improvement (List 2016). But I see very clearly that our tendency as historical linguists to take the comparative method as the only authoritative way to arrive at a valid subgrouping is not leading us anywhere.

Do computational approaches really switch off the light which illuminates classical historical linguistics?

In a recent review, Stefan Georg, an expert on Altaic languages, writes that the recent computational approaches to phylogenetic reconstruction in historical linguistics "switch out the light which has illuminated Indo-European linguistics for generations (by switching on some computers)", and that they "reduce this discipline to the pre-modern guesswork stage [...] in the belief that all that processing power can replace the available knowledge about these languages [...] and will produce ‘results’ which are worth the paper they are printed on" (Georg 2017: 372, footnote). It seems to me, that, if a discipline has been enlightened too much by its blind trust in authorities, it is not the worst idea to switch off the light once in a while.

References
  • Anttila, R. (1972): An introduction to historical and comparative linguistics. Macmillan: New York.
  • Atkinson, R. (1875): Comparative grammar of the Dravidian languages. Hermathena 2.3. 60-106.
  • Bergsland, K. and H. Vogt (1962): On the validity of glottochronology. Current Anthropology 3.2. 115-153.
  • Brugmann, K. (1884): Zur Frage nach den Verwandtschaftsverhältnissen der indogermanischen Sprachen [Questions regarding the closer relationship of the Indo-European languages]. Internationale Zeischrift für allgemeine Sprachewissenschaft 1. 228-256.
  • Bußmann, H. (2002): Lexikon der Sprachwissenschaft . Kröner: Stuttgart.
  • De Laet, J. (2005): Parsimony and the problem of inapplicables in sequence data. In: Albert, V. (ed.): Parsimony, phylogeny, and genomics. Oxford University Press: Oxford. 81-116.
  • Donohue, M., T. Denham, and S. Oppenheimer (2012): New methodologies for historical linguistics? Calibrating a lexicon-based methodology for diffusion vs. subgrouping. Diachronica 29.4. 505–522.
  • Fleischhauer, J. (2009): A Phylogenetic Interpretation of the Comparative Method. Journal of Language Relationship 2. 115-138.
  • Fox, A. (1995): Linguistic reconstruction. An introduction to theory and method. Oxford University Press: Oxford.
  • François, A. (2014): Trees, waves and linkages: models of language diversification. In: Bowern, C. and B. Evans (eds.): The Routledge handbook of historical linguistics. Routledge: 161-189.
  • Georg, S. (2017): The Role of Paradigmatic Morphology in Historical, Areal and Genealogical Linguistics. Journal of Language Contact 10. 353-381.
  • Glück, H. (2000): Metzler-Lexikon Sprache . Metzler: Stuttgart.
  • Gray, R. and Q. Atkinson (2003): Language-tree divergence times support the Anatolian theory of Indo-European origin. Nature 426.6965. 435-439.
  • Harrison, S. (2003): On the limits of the comparative method. In: Joseph, B. and R. Janda (eds.): The handbook of historical linguistics. Blackwell: Malden and Oxford and Melbourne and Berlin. 213-243.
  • Haspelmath, M. and U. Tadmor (2009): The Loanword Typology project and the World Loanword Database. In: Haspelmath, M. and U. Tadmor (eds.): Loanwords in the world’s languages. de Gruyter: Berlin and New York. 1-34.
  • Hennig, W. (1950): Grundzüge einer Theorie der phylogenetischen Systematik. Deutscher Zentralverlag: Berlin.
  • Hoenigswald, H. (1960): Phonetic similarity in internal reconstruction. Language 36.2. 191-192.
  • Hoijer, H. (1956): Lexicostatistics. A critique. Language 32.1. 49-60.
  • Jarceva, V. (1990): . Sovetskaja Enciklopedija: Moscow.
  • Klimov, G. (1990): Osnovy lingvističeskoj komparativistiki [Foundations of comparative linguistics]. Nauka: Moscow.
  • Lehmann, W. (1969): Einführung in die historische Linguistik. Carl Winter:
  • List, J.-M. (2016): Beyond cognacy: Historical relations between words and their implication for phylogenetic reconstruction. Journal of Language Evolution 1.2. 119-136.
  • Makaev, E. (1977): Obščaja teorija sravnitel’nogo jazykoznanija [Common theory of comparative linguistics]. Nauka: Moscow.
  • Matthews, P. (1997): Oxford concise dictionary of linguistics . Oxford University Press: Oxford.
  • Rankin, R. (2003): The comparative method. In: Joseph, B. and R. Janda (eds.): The handbook of historical linguistics. Blackwell: Malden and Oxford and Melbourne and Berlin.
  • Sankoff, D. (1969): Historical linguistics as stochastic process . . McGill University: Montreal.
  • Weiss, M. (2014): The comparative method. In: Bowern, C. and N. Evans (eds.): The Routledge Handbook of Historical Linguistics. Routledge: New York. 127-145.

Monday, September 11, 2017

A network of political parties competing for the 2017 Bundestag


Many elections now have some sort of online black box that allow you to see which political party or candidate has the highest overlap with your own personal political opinions. This is intended to help voters with their decisions. However, the black boxes usually lack any documentation regarding how different are the viewpoints of the competing parties / candidates. Exploratory data analysis via Neighbour-nets may be of some use in these cases.

As a European Union citizen (of German and Swedish nationality) I am entitled to live and work in any EU country. I currently live in France, but I cannot vote for the parliament (Assemblée nationale) and government (M. Le Président) that affects my daily life, and decides on the taxes, etc, that I have to pay. However, I’m still eligible to vote in Germany (in theory; in practice it is a bit more complex).


The next election (Budestagswahl) is closing in for the national parliament of the Federal Republic of Germany, the Bundestag (equivalent to the lower house of other bicameral legislatures). To help the voters, a new Wahl-O-Mat (described below) has been launched by the Federal Institute of Political Education (Bundeszentrale für politische Bildung, BPB). This is a fun thing to participate in, even if you have already made up your mind about who to vote for.

Each election year, the BPB develops and sends out a questionnaire with theses (83 this year) to all of political parties that will compete in the election. The parties can answer with ‘agree’, ‘no opinion / neutral’, or ‘don’t agree’ for each thesis. The 38 most controversially discussed political questions have been included in the Wahl-O-Mat, and you can also answer them for yourself. As a final step, you can choose eight of the political parties competing for the Bundestag, and the online back box will show you an agreement percentage between you and them in the form of a bar-chart diagram.

But as a phylogeneticist / data-analyst, I am naturally sceptical when it comes to mere percentages and bar charts. Furthermore, I would like to know how similar the parties’ opinions are to each other, to start with. An overview is provided, with all of the answers from the parties, but it is difficult to compare these across pages (each page of the PDF lists four parties, in the same order as on the selection page). The Wahl-O-Mat informs you that a high fit of your answers with more than one party does not necessarily indicate a closeness between the parties — you may, after all, be agreeing with them on different theses.

This means that the percentage of agreement between me and the political parties would provide a similarity measure, which I can use to compare the political parties with each other. But how discriminatory are my percentages of agreement (from the larger perspective)?

A network analysis

There are 33 parties that are competing for seats in the forthcoming Bundestag, one did not respond. Another one, the Party for Health Research (PfHR — a one-topic party) answered all 36 questions with 'neutral'. However, the makers of the Wahl-O-Mat still had to include it; and since that party provided no opinion on any of the questions, I scored 50% agreement with them (since I answered every question with 'yes' or 'no') — this is more than with the Liberal Party (because we actually disagree on half of the 38 questions). This is a flaw in the Wahl-O-Mat. If you say 'yes' (or 'no') to a thesis that the party has no opinion on, then it is counted as one point, while two points are awarded for a direct match. However, it does not work the other way around — having no opinion on any question brings up a window telling you that your preference cannot be properly evaluated.

Because of this, I determined my position relative to the political parties using a neighbour-net. The primary character matrix is binary, where 0 = ‘no’, 1 = ‘yes’ and ‘?’ stands for no opinion (neutral), compared using simple (Hamming) pairwise distances. So, if two parties disagree for all of the theses their pairwise distance will be 1. If there is no disagreement, the pairwise distance will be 0. Since the PfHR has provided no opinion, I left it out (ie. its pairwise distances are undefined).

Fig. 1 Neighbour-net of German political parties competing in the 2017 election (not including me). Parties of the far-left and far-right are bracket, for political  orientation. Parties with a high chance to get into the next Bundestag (passing the 5% threshold) are in bold. [See also this analysis by The Political Compass, for comparison].

The resulting network (Figure 1) is quite fitting: the traditional perception of parties (left-wing versus right-wing) is well captured. Parties, like the ÖDP (green and conservative), that do not fit into the classic left-right scheme are placed in an isolated position.

The graph reveals a (not very surprising) closeness between the two largest German political parties, the original Volksparteien (all-people parties): the CDU/CSU (centre-right, the party of the current Chancellor) and the SPD (centre-left). The SPD is the current (and potentially future) junior partner of the CDU/CSU, its main competitor. According to the graph, an alternative, more natural, junior partner of the CDU/CSU would be the (neo-)liberal party, the FDP.

The parties of the far-right are placed at the end of a pronounced network stem — that is they are the ones that deviate most from the consensus shared by all of the other parties. They are (still) substantially closer to the centre-right parties than to those from the (extreme) left. However, the edge lengths show that, for example, a hypothetical CDU/CSU–AfD coalition (the AfD is the only right-wing party with a high chance to pass the 5% threshold) would have to join two parties with many conflicting viewpoints. That is, regarding their answers to the 38 questions, in general the CSU appears to be much closer to the AfD than to it's sister party, the CDU.

Regarding the political left, the graph depicts its long-known political-structure problem: there are many parties, some with very unique viewpoints (producing longer terminal network edges); but overall there is little difference between them. The most distinct parties in this cluster are the Green Party (Die Grünen) and the Humanist Party (Die Humanisten), a microparty promoting humanism (see also Fig. 2).

Any formal inference is bound by its analysis rules, which may represent the primary signal suboptimally. The neighbour-net is a planar graph, but profiles of political parties may require more than two dimensions to do a good job. So let's take a look at the underlying distance matrix using a ‘heat map’ (Figure 2).

Fig. 2 Heat-map based on the same distance matrix as used for inferring the neighbour-net in Fig. 1. Note the general similarity of left-leaning parties and their distinctness to the right-leaning parties.

We can see that the Left Party (Die Linke) and the Bündnis Grundeinkommen (BGE), a single-topic party founded to promote a basic income without conditions, don’t disagree in any point, and that the declining Pirate Party (flagged as social-liberal on Wikipedia) has turned sharp left. The Party for Animal Protection (Tierschutzpartei) and the Party of Vegetarians and Vegans (V3) should discuss a merger; whereas the Alliance for Animal Protection (Tierschutzallianz) is their more conservative counter-part, being much closer to e.g. the CDU/CSU.

We can also see that the party with the highest agreement with the SPD is still the Greens (Die Grünen). Furthermore, although the FDP and the Pirate Party have little in common, the Humanist Party (Die Humanisten) may be a good alternative when you’re undecided between the other two. [Well, it would be, if in Germany each vote counts the same, but the 5% threshold invalidates all votes cast for parties not passing the threshold.] The most unique party, regarding their set of answers and the resulting pairwise distances, is a right-wing microparty (see the network above) supporting direct democracy (Volksabstimmung).

Applications such as the Wahl-O-Mat are put up for many elections, and when documented in the way done by the German Federal Institute of Political Education, provide a nice opportunity to assess how close are (officially) the competing parties, using networks.

PS. For our German readers who are as yet undecided: the primary character matrix (NEXUS-formatted) and related files can be found here.

Tuesday, September 5, 2017

SPECTRE: a suite of phylogenetic tools for reticulate evolution


Recently, the Earlham Institute, in the UK, released a set of software tools that are of relevance to this blog — SPECTRE. These tools are described in a forthcoming paper:
Sarah Bastkowski, Daniel Mapleson, Andreas Spillner, Taoyang Wu, Monika Balvočiūte and Vincent Moulton (2017) SPECTRE: a Suite of PhylogEnetiC Tools for Reticulate Evolution.

This is a toolkit rather than simple-to-use program, meaning that the various analyses exist as separate entities that can be combined in any way you like. More importantly, new analyses can be added easily, by those who want to write them, which is not the case for more commonly used programs like SplitsTree. This way, the analyses can also be incorporated into processing pipelines, rather than only being used interactively.

Apart from the usual access to data files (including Nexus, Phylip, Newick, Emboss and FastA formats), the following network analyses are currently available:
NeighborNet, NetMake, QNet, SuperQ, FlatNJ, NetME
The program also outputs the networks, of course. Here is an example of the SPECTRE equivalent of a NeighborNet analysis from a recent blog post (where the network was produced by SplitsTree, and then colored by me).


Running the program(s) is relatively straightforward, once you get things installed. Installation packages are available for OSX, Windows and Linux.

Sadly, for me installation was tricky, because SPECTRE requires Java v.8, which is unfortunately not available for OSX 10.6 (which runs on most of my computers). Even getting Java v.8 installed on the one computer I have with a later version of OSX was not easy, because installing a Java Runtime Environment (the JRE download file) from Oracle does not update the Java -version symlinks or add Java to the software path — for this I had to install the full Java Development Kit (the JDK download file). Sometimes, I hate computers!