Archive-name: sci-lang-faq Version: 2.1 Last-modified: 7 Jun 1993 Written by Michael Covington (mcovingt@athena.cs.uga.edu) Maintained by Mark Rosenfelder (markrose@spss.com) changes this month: Added a warning to section 8 that the list of languages isn't exhaustive; and expanded section 12, to underline the equality of dialects and add some caveats about mutual intelligibility. These changes were prompted by a discussion about whether Scots is a different language than English. I've removed some wording that implied it was not, but otherwise I've sidestepped the issue. However, I am still open to suggestions. A bit more discussion of the political implications (see the last paragraph of sec. 12) would be welcome; any volunteers? NOTE: This FAQ file is short. Many good books and many important ideas are left unmentioned. All readers should be aware that linguistics is a young science and that linguists rarely agree 100% on anything. PHONETIC SYMBOLS IN ASCII: The scheme formerly described in this FAQ is superseded by Evan Kirshenbaum's ASCII/IPA system, which is explained in separate postings. =============================================================================== CONTENTS 1. What is sci.lang for? 2. What is linguistics? 3. Does linguistics tell people how to speak or write properly? 4. What are some good books about linguistics? 5. How did language originate? 6. What is known about prehistoric language? 7. What do those asterisks mean? 8. How are present-day languages related? 9. Why do Hebrew and Yiddish [etc.] look alike if they aren't related? 10. How do linguists decide that languages are related? 11. What is Noam Chomsky's transformational grammar all about? 12. What is a dialect? (Relation between dialects and languages.) 13. Are all languages equally complex, or are some more primitive than others? 14. What about artificial languages, such as Esperanto? =============================================================================== 1. What is sci.lang for? Discussion of the scientific or historical study of human language(s). Note the "sci." prefix. The main concern here is with _facts_ and theories accounting for them. For advice on English usage, see alt.usage.english or misc.writing. For casual chatter about other languages see soc.culture.. Like all "sci." newsgroups, sci.lang is not meant to substitute for a dictionary or even a college library. If the answer to your question can be looked up easily, then do so rather than using the net. If you don't have a library, then ask away, but explain your situation. =============================================================================== 2. What is linguistics? The scientific study of human language, including: Phonetics (physical nature of speech) Phonology (use of sounds in language) Morphology (word formation) Syntax (sentence structure) Semantics (meaning of words & how they combine into sentences) Pragmatics (effect of situation on language use) Or, carving it up another way: Theoretical linguistics (pure and simple: how languages work) Historical linguistics (how languages got to be the way they are) Sociolinguistics (language and the structure of society) Psycholinguistics (how language is implemented in the brain) Applied linguistics (teaching, translation, etc.) Computational linguistics (computer processing of human language) Some linguists also study sign language, non-verbal communication, animal communication, and other topics peripheral to ordinary language. =============================================================================== 3. Does linguistics tell people how to speak or write properly? No. Linguistics is descriptive, not prescriptive. Linguistics can often supply facts which help people arrive at a recommendation or value judgement, but the recommendation or value judgement is not part of linguistic science itself. =============================================================================== 4. What are some good books about linguistics? (These are cited by title and author only. Full ordering information can be obtained from BOOKS IN PRINT, available at most bookstores and at even the smallest public libraries.) CAMBRIDGE ENCYCLOPEDIA OF LANGUAGE, by David Crystal (1987) is a good place to start if you are new to this field. LANGUAGE, by Edward Sapir (1921), is a readable survey of linguistics that is still worthwhile despite its age. AN INTRODUCTION TO LANGUAGE, by Fromkin and Rodman (1974), is one of the best intro linguistics survey texts. (Read it!) There are many others. CAMBRIDGE TEXTBOOKS IN LINGUISTICS (a series) consists of good, modestly priced introductions to all the areas of linguistics. Any encyclopedia will give you basic information about widely studied languages, alphabets, etc. =============================================================================== 5. How did language originate? Nobody knows. Very little evidence is available. See however D. Bickerton, ROOTS OF LANGUAGE (1981). =============================================================================== 6. What is known about prehistoric language? Quite a lot, if by "prehistoric" you'll settle for maybe 2000 years before the development of writing. (Language is many thousands of years older than that.) Languages of the past can be recovered by comparative reconstruction from their descendants. The comparative method relies mainly on pronunciation, which changes very slowly and in highly systematic ways. If you apply it to French, Spanish, and Italian, you reconstruct late colloquial Latin with a high degree of accuracy; this and similar tests show us that the method works. Also, if you use the comparative method on unrelated languages, you get nothing. So comparative reconstruction is a test of whether languages are related (to a discernible degree). The ancient languages Latin, Greek, Sanskrit, and several others form a group known as "Indo-European." Comparative reconstruction from them gives a language called Proto-Indo-European which was spoken around 2500 B.C. Many Indo-European words can be reconstructed with considerable confidence (e.g., *ekwos 'horse'). The grammar was similar to Homeric Greek or Vedic Sanskrit. Similar reconstructions are available for some other language families, though none has been as thoroughly reconstructed as Indo-European. =============================================================================== 7. What do those asterisks mean? Either of 2 things. An unattested, reconstructed word (such as Indo-European *ekwos); or an ungrammatical sentence (such as *Himself saw me). =============================================================================== 8. How are present-day languages related? [--Scott DeLancey] This is an INCOMPLETE list of some of the world's language families. More detailed classifications can be found in Voegelin and Voegelin, CLASSIFICATION AND INDEX OF THE WORLD'S LANGUAGES (1977), and M. Ruhlen, A GUIDE TO THE WORLD'S LANGUAGES (1987). (Note: Ruhlen's classification recognizes a number of higher-order groups which most linguists regard as speculative). A language family is a group of languages that have been proven to have descended from a common ancestral language. Branches of families likewise represent groups of languages with a more recent common ancestor. For example, English, Dutch, and German have a common ancestor which we label Proto-West-Germanic, and thus belong to the West Germanic branch of Germanic. Icelandic and Norwegian are descended from Proto-North Germanic, a separate branch of Germanic. All the Germanic languages have a common ancestor, Proto-Germanic; farther back, this ancestor was descended from Proto-Indo- European, as were the ancestors of the Italic, Slavic, and other branches. Not all languages are known to be related to each other. It is possible that they are related but the evidence of relationship has been lost; it's also possible they arose separately. It is likely that some of the families listed here will eventually turn out to be related to one another. While low-level close relationships are easy to demonstrate, higher-order classification proposals must rely on more problematic evidence and tend to be controversial. Recently linguists such as Joseph Greenberg and Vitalij Shevoroshkin have attracted attention both in linguistic circles and in the popular press with claims of larger genetic units, such as Nostratic (comprising Indo-European, Uralic, Altaic, Dravidian, and Afroasiatic) or Amerind (to include all the languages of the New World except Na-Dene and Eskimo-Aleut). Most linguists regard these hypotheses as having a grossly insufficient empirical foundation, and argue that comparisons at that depth are not possible using available methods of historical linguistics. This list isn't intended to be exhaustive, even for families like Germanic and Italic. Nor is it the last word on what's a "language"; see question 12. Note: English is not descended from Latin. English is a Germanic language with a lot of Latin vocabulary, borrowed from French in the Middle Ages. INDO-EUROPEAN GERMANIC North Germanic: Icelandic, Norwegian, Danish, Swedish East Germanic: Gothic (extinct) West Germanic: English, Dutch, German, Yiddish ITALIC Osco-Umbrian: Oscan, Umbrian (extinct languages of Italy) Latin and its modern descendants (Italian, Spanish, Portuguese, Catalan, Rumanian, French, etc.) CELTIC P-Celtic: Welsh, Breton, Cornish Q-Celtic: Irish, Scots Gaelic, Manx The extinct languages of Gaul and parts of central Europe were also Celtic HELLENIC: Greek (ancient and modern) SLAVIC: Russian, Bulgarian, Polish, Czech, Serbo-Croatian, etc. (not Rumanian or Albanian) BALTIC: Lithuanian and Latvian INDO-IRANIAN Indic: Sanskrit and its modern descendants (Hindi-Urdu, Gypsy (Romany), Bengali, etc.) Iranian: Persian (ancient and modern), Pashto (Afghanistan), others ALBANIAN: Albanian ARMENIAN: Armenian TOKHARIAN (an extinct language of NW China) HITTITE (extinct language of Turkey) AFRO-ASIATIC SEMITIC: Arabic, Hebrew (not Yiddish; see above), Aramaic, Amharic and other languages of Ethiopia CHADIC: languages of northern Africa, e.g. Hausa CUSHITIC: Somali, other languages of eastern Africa EGYPTIAN: Ancient Egyptian BERBER: languages of North Africa NIGER-KORDOFANIAN: includes most of the languages of sub-Saharan Africa. Most of the languages are in the NIGER-CONGO branch; the most widely known subgroup of N-G is BANTU (Swahili, Zulu, Xhosa, etc.) URALIC (=FINNO-UGRIAN) Finnish, Estonian, Saami (Lapp), Hungarian, and several languages of central Russia MONGOL: Mongolian, Buryat, Kalmuck, etc. TURKIC: Turkish, Azerbaijani, Kazakh, and other languages of Central Asia Some linguists group the Mongol and Turkic families together as ALTAIC. Rather more controversially, some add Korean and Japanese to this group. It has been claimed that URALIC and ALTAIC are related (as URAL-ALTAIC), but this idea is not widely accepted. DRAVIDIAN: languages of southern India, including Tamil, Telugu, etc. SINO-TIBETAN SINITIC: Chinese (several "dialects", actually distinct languages: Mandarin, Wu (Shanghai), Min (Hokkien [Fujian], Taiwanese), Yue (Cantonese), Hakka, Gan, Xiang TIBETO-BURMAN: Tibetan, Burmese, various languages of Burma, China, India, and Nepal AUSTROASIATIC MON-KHMER: Vietnamese, Khmer (Cambodian), and various minority and tribal languages of Southeast Asia MUNDA: tribal languages of eastern India AUSTRONESIAN Malay-Indonesian, other languages of Indonesia (Javanese, etc.) Philippine languages: Tagalog, Ilocano, Bontoc, etc. Aboriginal languages of Taiwan (Tsou, etc.) Polynesian languages: Hawaiian, Maori, Samoan, Tahitian, etc. Micronesian: Chamorro (spoken in Guam), Yap, Truk, etc. Malagasy (spoken in Madagascar) Most of these languages fall in a branch called MALAYO-POLYNESIAN JAPANESE: A number of linguists argue that Japanese is ALTAIC; others, that it is most closely related to AUSTRONESIAN, or that it represents a mixture of AUSTRONESIAN and ALTAIC elements. TAI-KADAI: Thai, Lao, and other languages of southern China and northern Burma. Possibly related to AUSTRONESIAN. An outdated hypothesis that TAI is part of SINO-TIBETAN is still often found in reference works and introductory texts. AUSTRALIA: the Aboriginal languages of Australia are conservatively classified into 26 families, the largest being PAMA-NYUNGAN, consisting of about 200 languages originally spoken over 80-90% of Australia. A large number of language families are found in North and South America. There are numerous proposals which group these into larger units, some of which will probably be demonstrated in time. To date no New World language has been proven to be related to any Old World family. The larger North American families include: ESKIMO-ALEUT: two Eskimo languages and Aleut. ATHAPASKAN: most of the languages of Alaska and northwestern Canada, also includes Navajo and Apache. Eyak (in Alaska) is related to Athapaskan; some linguists put these together with Tlingit and Haida in a NA-DENE family. ALGONQUIAN: most of Canada and the Northeastern U.S., includes Cree, Ojibwa, Cheyenne, Blackfoot IROQUOIAN: the languages of NY state (Mohawk, Onondaga, etc.) and Cherokee SIOUAN: includes Dakota/Lakhota and other languages of the Plains and Southeast U.S. MUSKOGEAN: Choctaw, Alabama, Creek, Mikasuki (Seminole) and other languages of the southeast U.S. UTO-AZTECAN: a large family in Mexico and the Southwestern U.S., includes Nahuatl (Aztec), Hopi, Comanche, Paiute, etc. SALISH: languages of Washington and British Columbia HOKAN: languages of California and Mexico; a controversial grouping PENUTIAN: languages of California and Oregon; also controversial Work on documentation and classification of South American languages still has a long way to go. Generally recognized families include: ARAWAKAN, TUCANOAN, TUPI-GUARANI (including Guarani, a national language of Paraguay), CARIBAN, ANDEAN (including Quechua and Aymara) LANGUAGE ISOLATES: A number of languages around the world have never been successfully shown to be related to any others-- in at least some cases because any related languages have long been extinct. The most famous isolate is Basque, spoken in northern Spain and southern France; it is apparently a survival from before the Indo-Europeanization of Europe. =============================================================================== 9. Why do Hebrew and Yiddish Japanese and Chinese Persian and Arabic look so much alike if they aren't related? Distinguish LANGUAGE from WRITING SYSTEM. In each of these cases one language has adopted part or all of the writing system of an unrelated language. (To a Chinese, English and Finnish look alike, because they're written in the same alphabet. Yet they are not historically related.) =============================================================================== 10. How do linguists decide that languages are related? [--markrose] When linguists say that languages are related, they're not just remarking on their surface similarity; they're making a technical statement or claim about their history-- namely, that they can be regularly derived from a common parent language. Proto-languages are reconstructed using the comparative method. The first stage is to inspect and compare large amounts of vocabulary from the languages in question. Where possible we compare entire _paradigms_ (sets of related forms, such as the those of the present active indicative in Latin), rather than individual words. The inspection should yield a set of regular sound correspondences between the languages. By regular, we mean that the same correspondences are consistently observed in identical phonetic environments. Finally, _sound changes_ are formulated: language-specific rules which specify how the original common form changed in order to produce those observed in each descendent language. Applying the comparative method to the Romance languages, we might find 'I sense' Sard /sento/ French /sa~/ Italian /sento/ Spanish /sjEnto/ 'sleep' /sonnu/ /som/ /sonno/ /suEn^o/ 'hundred' /kentu/ /sa~/ /tSento/ /sjEnto/ 'five' /kimbe/ /sE~k/ /tSinko/ /sinko/ 'I run' /kurro/ /kur/ /korro/ /korro/ 'story' /kontu/ /ko~t@/ /konto/ /kuEnto/ and hundreds of similar examples. We see some correspondences-- (1) Sard /s/ French /s/ Italian /s/ Spanish /s/ (2) /k/ /s/ /tS/ /s/ (3) /k/ /k/ /k/ /k/ but they seem to conflict: does Sard /k/ correspond to Spanish /s/ or /k/? Does French /s/ correspond to Italian /s/ or /tS/? In fact we will find that the correspondences are regular, once we observe that (2) is seen before a front vowel (i or e), while (3) is seen in other environments. Alternations within paradigms, such as It. /diko/ 'I say' vs. /ditSe/ 'says', will help us make and confirm such generalizations. We may interpret these now-regular correspondences as indicating that an initial /s/ in the proto-language has been retained in all four languages, and likewise initial /k/ in Sard; but that /k/ changed to /s/ or /tS/ in the other languages in the environment of a front vowel. Actually, this process is iterative. For instance, at first glance we might think that German _haben_ and Latin _habere_ 'have' are obvious cognates. However, after noting the regular correspondence of German h to Latin c, we are forced to change our minds, and look to _capere_ 'seize' as a better cognate for _haben_. Thus, similarity of words is only a clue, and perhaps a misleading one. Linguists conclude languages are related, and thus derive from a common ancestor, only if they find *regular* sound correspondences between them. To complicate things, derivations may be obscured by irregular changes, such as dissimilation, borrowing, or analogical change. For instance, the normal development of Middle English _kyn_ is 'kine', but this word has been largely replaced by 'cows', formed from 'cow' (ME _cou_) on the analogy of word-pairs like stone : stones. Analogy often serves to reduce irregularities in a language (here, an unusual plural). _Borrowing_ refers to taking words from other languages, as English has taken 'search' and 'garage' from French, 'paternal' from Latin, 'anger' from Old Norse, and 'tomato' from Nahuatl. How do we know that English doesn't derive from French or Nahuatl? The latter case is easy to eliminate: regular sound correspondences can't be set up between English and Nahuatl. But English has borrowed so heavily from French that regular correspondences do occur. Here, however, we find that the French borrowings are thickest in government, legal, and military domains; while the basic vocabulary (which languages borrow less frequently) is more akin to German. Paradigmatic correspondences like sing/sang/sung vs. singen/sang/gesungen also help show that the Germanic words are inherited, the French ones borrowed. =============================================================================== 11. What is Noam Chomsky's transformational grammar all about? Several things; it really comprises several layers of theory: (1) The hypothesis that much of the structure of human language is inborn ("built-in") in the human brain, so that a baby learning to talk only has to learn the vocabulary and the structural "parameters" of his native language -- he doesn't have to learn how language works from scratch. This is well supported and widely believed; main evidence consists of: - The fact that babies learn to talk remarkably well from what seems to be inadequate exposure to language; it can be shown in detail that babies acquire some rules of grammar that they could never have "learned" from what is available to them, if the structure of language were not partly built-in. - The fact that the structure of language on different levels (vocabulary, ability to connect words, etc.) can be lost by injury to specific areas of the brain. - The fact that there are unexpected structural similarities between all known languages. For detailed exposition see Cook, CHOMSKY'S UNIVERSAL GRAMMAR (1988), and Newmeyer, GRAMMATICAL THEORY: ITS LIMITS AND POSSIBILITIES. (2) The hypothesis that to adequately describe the grammar of a human language, you have to give each sentence at least two different structures, called "deep structure" and "surface structure", together with rules called "transformations" that relate them. This is hotly debated. Some theories of grammar use two levels and some don't. Chomsky's original monograph, SYNTACTIC STRUCTURES (1957), is still well worth reading; this is what it deals with. (3) Chomsky's name is associated with specific flavors of transformational grammar. The model elaborated over the last few years is called GB (government and binding) theory; however, Chomsky's 1992 paper on Minimalism contains significant departures from earlier work in GB. Bill Turkel (bill@hivnet.ubc.ca) runs a mailing list on Minimalism; e-mail him for more information. (4) Some people think Chomsky is the source of the idea that grammar ought to be viewed with mathematical precision. (Thus there are occasional vehement anti-Chomsky polemics such as THE NEW GRAMMARIAN'S FUNERAL, which are really polemics against grammar per se.) Although Chomsky contributed some valuable techniques, grammarians have _always_ believed that grammar was a precise, mechanical thing. =============================================================================== 12. What is a dialect? [--M.C. + M.R.] A dialect is any variety of a language spoken by a specific community of people. Most languages have many dialects. Everyone speaks a dialect. In fact everyone speaks an _idiolect_, i.e., a personal language. (Your English language is not quite the same as my English language, though they are probably very, very close.) A group of people with very similar idiolects are considered to be speaking the same dialect. Some dialects, such as Standard American English, are taught in schools and used widely around the world. Others are very localized. Localized or uneducated dialects are _not_ merely failed attempts to speak the standard language. William Labov and others have demonstrated, for example, that the speech of inner-city blacks has its own intricate grammar, quite different in some ways from that of Standard English. It should be emphasized that linguists do not consider some dialects superior to others-- though speakers of the language may do so. Varieties of language are called "dialects" if the speakers can understand each other and "languages" if they can't. For example, Irish English and Southern American English are dialects of English, but English and German are different languages (though related). This criterion is not always as easy to apply as it sounds. Intelligibility may vary with familiarity and interest, or may depend on the subject. A more serious problem is the _dialect continuum_: a chain of dialects such that any two adjoining dialects are mutually intelligible, but the dialects at each end are not. Speakers of Belgian Dutch, for instance, can't understand Swiss German, but between them there lies a continuum of mutually intelligible dialects. Sometimes the use of the terms "language" or "dialect" is politically motivated. Norwegian and Danish are dialects of the same language, but are considered separate languages because of their political independence. The Chinese "dialects" on the other hand are mutually unintelligible languages (but they share a common _written_ language). =============================================================================== 13. Are all languages equally complex, or are some more primitive than others? Obviously, the size of vocabulary and the variety and sophistication of literary forms will depend on the culture. But the _grammar_ of all languages is about equally complex. Even people with a very "primitive" material culture, such as the Australian Aborigines, speak complex languages. Different languages put their complexity in different places. English has complex, intricate sentence structure, but simple morphology (each word has only a few forms). Finnish has freer syntax but much more complex morphology. The only really simple languages are _pidgins_ and _creoles_, which result when speakers of different languages are suddenly forced to live and work together. They quickly arrive at a very simple language with vocabulary from both languages, and a simple grammar of a specific kind (e.g., they are likely to use repetition to express plurals). Such a language is called a _pidgin_ initially, then becomes a _creole_ when babies are born who acquire it as a native language. =============================================================================== 14. What about artificial languages, such as Esperanto? [--markrose] Hundreds of constructed languages have been devised in the last few centuries. Early proposals, such as those of Lodwick (1647), Wilkins, or Leibniz, were attempts to devise an ideal language based on philosophical classification of concepts, and used wholly invented words. Most were too complex to learn, but one, Jean Francois Sudre's Solresol, achieved some popularity in the last century; its entire vocabulary was built from the names of the notes of the musical scale, and could be sung as well as spoken. Later the focus shifted to languages based on existing languages, with a polyglot (usually European) vocabulary and a simplified grammar, whose purpose was to facilitate international communication. Johann Schleyer's Volapu"k (1880) was the first to achieve success; its name is based on English ("world-speech"), and reflects Schleyer's notions of phonetic simplicity. It was soon eclipsed by Ludwig Zamenhof's Esperanto (1887), whose grammar was simpler and its vocabulary more recognizable. Esperanto has remained the most successful and best-known artificial language, with a million or more speakers and a voluminous literature; children of Esperantists have even learned it as a native language. Its relative success hasn't prevented the appearance of new proposals, such as Ido, Interlingua, Occidental, and Novial. There have also been attempts to simplify Latin (Latino Sine Flexione, 1903) and English (Basic English, 1930) for international use. The recent Loglan and Lojban, based on predicate logic, may represent a revival of a priori language construction. See also Mario Pei, ONE LANGUAGE FOR THE WORLD; Detlev Blanke, INTERNATIONALE PLANSPRACHEN (in German). There is a newsgroup, soc.culture.esperanto, dedicated to Esperanto. The FAQ for this group contains pointers to mailing lists for other constructed languages.