Languages spoken in India belong to several language families, the major ones being the Indo-Aryan languages spoken by 78.05% of Indians and the Dravidian languages spoken by 19.64% of Indians. Languages spoken by the remaining 2.31% of the population belong to the Austroasiatic, Sino-Tibetan, Tai-Kadai, and a few other minor language families and isolates. India (780) has the world’s second highest number of languages, after Papua New Guinea (839).

Article 343 of the Indian constitution stated that the official language of the Union should become Hindi in Devanagari script instead of the extant English. But this was thought to be a violation of the constitution’s guarantee of federalism. Later, a constitutional amendment, The Official Languages Act, 1963, allowed for the continuation of English in the Indian government indefinitely until legislation decides to change it.[2] The form of numerals to be used for the official purposes of the Union were supposed to be the international form of Indian numerals, distinct from the numerals used in most English-speaking countries.[1] Despite the misconceptions, Hindi is not the national language of India. The Constitution of India does not give any language the status of national language.

The Eighth Schedule of the Indian Constitution lists 22 languages, which have been referred to as scheduled languages and given recognition, status and official encouragement. In addition, the Government of India has awarded the distinction of classical language to Kannada, Malayalam, Odia, Sanskrit, Tamil and Telugu. Classical language status is given to languages which have a rich heritage and independent nature.

According to the Census of India of 2001, India has 122 major languages and 1599 other languages. However, figures from other sources vary, primarily due to differences in definition of the terms “language” and “dialect”. The 2001 Census recorded 30 languages which were spoken by more than a million native speakers and 122 which were spoken by more than 10,000 people. Two contact languages have played an important role in the history of India: Persian[14] and English. Persian was the court language during the Mughal period in India. It reigned as an administrative language for several centuries until the era of British colonisation. English continues to be an important language in India. It is used in higher education and in some areas of the Indian government. Hindi, the most commonly spoken language in India today, serves as the lingua franca across much of North and Central India. However, there have been anti-Hindi agitations in South India, most notably in the state of Tamil Nadu and Karnataka. Maharashtra, West Bengal, Assam, Punjab and other non-Hindi regions have also started to voice concerns about Hindi.


The southern Indian languages are from the Dravidian family. The Dravidian languages are indigenous to the Indian subcontinent. Proto-Dravidian languages were spoken in India in the 4th millennium BCE and started disintegrating into various branches around 3rd millennium BCE.[22] The Dravidian languages are classified in four groups: North, Central (Kolami–Parji), South-Central (Telugu–Kui), and South Dravidian (Tamil-Kannada).

The northern Indian languages from the Indo-Aryan branch of the Indo-European family evolved from Old Indic by way of the Middle Indic Prakrit languages and Apabhraṃśa of the Middle Ages. The Indo-Aryan languages developed and emerged in three stages — Old Indo-Aryan (1500 BCE to 600 BCE), Middle Indo-Aryan stage (600 BCE and 1000 CE) and New Indo-Aryan (between 1000 CE and 1300 CE). The modern north Indian Indo-Aryan languages all evolved into distinct, recognisable languages in the New Indo-Aryan Age.

Persian or Farsi was brought into India by the Ghaznavids and other Turko-Afghan dynasties as the court language. Culturally Persianized, they, in combination with the later Mughal dynasty (of Turco-Mongol origin), influenced the art, history and literature of the region for more than 500 years, resulting in the Persianisation of many Indian tongues, mainly lexically. In 1837, the British replaced Persian with English and Hindustani in Perso-Arabic script for administrative purposes, and the Hindi movement of the 19th Century replaced Persianised vocabulary with Sanskrit derivations and replaced or supplemented the use of Perso-Arabic script for administrative purposes with Devanagari.

Each of the northern Indian languages had different influences. For example, Hindustani was strongly influenced by Sanskrit, Persian, and Arabic, leading to the emergence of Modern Standard Hindi and Modern Standard Urdu as registers of the Hindustani language.


The first official survey of language diversity in the Indian subcontinent was carried out by Sir George Abraham Grierson from 1898 to 1928. Titled the Linguistic Survey of India, it reported a total of 179 languages and 544 dialects.[28] However, the results were skewed due to ambiguities in distinguishing between “dialect” and “language”,[28] use of untrained personnel and under-reporting of data from South India, as the former provinces of Burma and Madras, as well as the princely states of Cochin, Hyderabad, Mysore and Travancore were not included in the survey.

Different sources give widely differing figures, primarily based on how the terms “language” and “dialect” are defined and grouped. Ethnologue, produced by the Christian evangelist organisation SIL International, lists 461 tongues for India (out of 6,912 worldwide), 447 of which are living, while 14 are extinct. The 447 living languages are further subclassified in Ethnologue as follows:-

Institutional – 63
Developing – 130
Vigorous – 187
In trouble – 54
Dying – 13
The People’s Linguistic Survey of India, a privately owned research institution in India, has recorded over 66 different scripts and more than 780 languages in India during its nationwide survey, which the organisation claims to be the biggest linguistic survey in India.

The People of India (POI) project of Anthropological Survey of India reported 325 languages which are used for in-group communication by 5,633 Indian communities.

Indo-Aryan language family

The largest of the language families represented in India, in terms of speakers, is the Indo-Aryan language family, a branch of the Indo-Iranian family, itself the easternmost, extant subfamily of the Indo-European language family. This language family predominates, accounting for some 1035 million speakers, or over 76.5 of the population, as per 2018 estimate. The most widely spoken languages of this group are Hindi (or more correctly, Hindustani, which includes Hindi and Urdu), Bengali, Konkani, Marathi, Gujarati, Punjabi, Kashmiri, Rajasthani, Sindhi, Assamese (Asamiya), Maithili and Odia. Aside from the Indo-Aryan languages, other Indo-European languages are also spoken in India, the most prominent of which is English, as a lingua franca.

Dravidian language family

The second largest language family is the Dravidian language family, accounting for some 277 million speakers, or approximately 20.5% as per 2018 estimate The Dravidian languages are spoken mainly in southern India and parts of eastern and central India as well as in parts of northeastern Sri Lanka, Pakistan, Nepal and Bangladesh. The Dravidian languages with the most speakers are Telugu, Tamil, Kannada and Malayalam. Besides the mainstream population, Dravidian languages are also spoken by small scheduled tribe communities, such as the Oraon and Gond tribes. Only two Dravidian languages are exclusively spoken outside India, Brahui in Pakistan and Dhangar, a dialect of Kurukh, in Nepal.

Austroasiatic language family

Families with smaller numbers of speakers are Austroasiatic and numerous small Sino-Tibetan languages, with some 10 and 6 million speakers, respectively, together 3% of the population.

The Austroasiatic language family (austro meaning South) is the autochthonous language in South Asia and Southeast Asia, other language families having arrived by migration. Austroasiatic languages of mainland India are the Khasi and Munda languages, including Santhali. The languages of the Nicobar islands also form part of this language family. With the exceptions of Khasi and Santhali, all Austroasiatic languages on Indian territory are endangered.

Sino-Tibetan language family

The Sino-Tibetan language family are well represented in India. However, their interrelationships are not discernible, and the family has been described as “a patch of leaves on the forest floor” rather than with the conventional metaphor of a “family tree”.

Sino-Tibetan languages are spoken across the Himalayas in the regions of Ladakh, Himachal Pradesh, Nepal, Sikkim, Bhutan, Arunachal Pradesh, and also in the Indian states of West Bengal, Assam (hills and autonomous councils),[50][51][52] Meghalaya, Nagaland, Manipur, Tripura and Mizoram. Sino-Tibetan languages spoken in India include the scheduled languages Meitei and Bodo, the non-scheduled languages of Karbi, Lepcha, and many varieties of several related Tibetic, West Himalayish, Tani, Brahmaputran, Angami–Pochuri, Tangkhul, Zeme, Kukish language groups, amongst many others.

Tai-Kadai language family

Ahom language, a Southwestern Tai language, had been once the dominant language of the Ahom Kingdom in modern-day Assam, but was later replaced by the Assamese language (known as Kamrupi in ancient era which is the pre-form of the Kamrupi dialect of today). Nowadays, small Tai communities and their languages remain in Assam and Arunachal Pradesh together with Sino-Tibetans, e.g. Tai Phake, Tai Aiton and Tai Khamti, which are similar to the Shan language of Shan State, Myanmar; the Dai language of Yunnan, China; the Lao language of Laos; the Thai language of Thailand; and the Zhuang language in Guangxi, China.

Great Andamanese language family

The extinct and endangered languages of the Andaman Islands form a fifth Andamanese- , comprising two families, namely:

the Great Andamanese, comprising a number of extinct languages apart from one highly endangered language with a dwindling number of speakers.
the Ongan family of the southern Andaman Islands, comprising two extant languages, Önge and Jarawa, and one extinct language, Jangil.
In addition, Sentinelese, an unattested language of the Andaman Islands, is generally considered to be related and part of the aerial group.


The language families in India are not necessarily related to the various ethnic groups in India, specifically the Indo-Aryan and Dravidian people. The languages within each family have been influenced to a large extent by both families.

Urdu has also had a significant influence on many of today’s Indian languages. Many North Indian languages have lost much of their Sanskritised base (50% current vocabulary) to a more Urdu-based form. In terms of the written script, most Indian languages, except the Tamil script, nearly perfectly accommodate the Sanskrit language. South Indian languages have adopted new letters to write various Indo-Aryan based words as well, and have added new letters to their native alphabets as the languages began to mix and influence each other.

Though various Indo-Aryan and Dravidian languages may seem mutually exclusive when first heard, there is a much deeper underlying influence that both language families have had on each other down to a linguistic science. There is proof of the intermixing of Dravidian and Indo-Aryan languages through the pockets of Dravidian-based languages on remote areas of Pakistan, and interspersed areas of North India. In addition, there is a whole science regarding the tonal and cultural expression within the languages that are quite standard across India. Languages may have different vocabulary, but various hand and tonal gestures within two unrelated languages can still be common due to cultural amalgamations between invading people and the natives over time; in this case, the Indo-Aryan peoples and the native Dravidian people.


Hindi, written in Devanagari script, is the most prominent language spoken in the country. In the 2001 census, 422 million (422,048,642) people in India reported Hindi to be their native language. This figure not only included Hindi speakers of Hindustani, but also people who identify as native speakers of related languages who consider their speech to be a dialect of Hindi, the Hindi belt. Hindi (or Hindustani) is the native language of most people living in Delhi, Uttar Pradesh, Uttarakhand, Chhattisgarh, Himachal Pradesh, Chandigarh, Bihar, Jharkhand, Madhya Pradesh, Haryana, and Rajasthan.

“Modern Standard Hindi”, a standardised language is one of the official languages of the Union of India. In addition, it is one of only two languages used for business in Parliament however the Rajya Sabha now allows all 22 official languages on the Eighth Schedule to be spoken.

Hindustani, evolved from khari boli (खड़ी बोली), a prominent tongue of Mughal times, which itself evolved from Apabhraṃśa, an intermediary transition stage from Prakrit, from which the major North Indian Indo-Aryan languages have evolved.

Varieties of Hindi spoken in India include Rajasthani, Braj Bhasha, Haryanvi, Bundeli, Kannauji, Hindustani, Awadhi, Bagheli, Bhojpuri, Magahi, Nagpuri and Chhattisgarhi. By virtue of its being a lingua franca, Hindi has also developed regional dialects such as Bambaiya Hindi in Mumbai. In addition, a trade language, Andaman Creole Hindi has also developed in the Andaman Islands.

In addition, by use in popular culture such as songs and films, Hindi also serves as a lingua franca across both North and Central India.

Hindi is widely taught both as a primary language and language of instruction, and as a second tongue in most states.


Native to the Bengal region, comprising the nation of Bangladesh and the states of West Bengal, Tripura and Barak Valley region of Assam. Bengali (also spelt as Bangla: বাংলা) is the fifth most spoken language in the world. After the partition of India (1947), refugees from East Pakistan were settled in Tripura, and Jharkhand and the union territory of Andaman and Nicobar Islands. There is also a large number of Bengali-speaking people in Maharashtra and Gujarat where they work as artisans in jewellery industries. Bengali developed from Abahatta, a derivative of Apabhramsha, itself derived from Magadhi Prakrit. The modern Bengali vocabulary contains the vocabulary base from Magadhi Prakrit and Pali, also borrowings & reborrowings from Sanskrit and other major borrowings from Persian, Arabic, Austroasiatic languages and other languages in contact with. Like most Indian languages, Bengali has a number of dialects. It exhibits diglossia, with the literary and standard form differing greatly from the colloquial speech of the regions that identify with the language. Bengali language has developed a rich cultural base spanning art, music, literature and religion. There have been many movements in defence of this language and in 1999 UNESCO declared 21 Feb as the International Mother Language Day in commemoration of the Bengali Language Movement in 1952.


Marathi is an Indo-Aryan language.It is the official language and co-official language in Maharashtra and Goa states of Western India respectively, and is one of the official languages of India. There were 83 million speakers in 2011 and 72 million speakers in 2001.[80] Marathi has the third largest number of native speakers in India. Marathi has some of the oldest literature of all modern Indo-Aryan languages, dating from about 1200 AD (Mukundraj’s Vivek Sindhu from the close of the 12th century). The major dialects of Marathi are Standard Marathi and the Varhadi dialect. There are other related languages such as Khandeshi, Dangi, Vadvali and Samavedi. Malvani Konkani has been heavily influenced by Marathi varieties.Marathi is one of several languages that descend from Maharashtri Prakrit. Further change led to the Apabhraṃśa languages like Old Marathi.

Marathi is the official language of Maharashtra and co-official language in the union territories of Daman and Diu and Dadra and Nagar Haveli. In Goa, Konkani is the sole official language; however, Marathi may also be used for all official purposes.

Over a period of many centuries the Marathi language and people came into contact with many other languages and dialects. The primary influence of Prakrit, Maharashtri, Dravidian languages, Apabhraṃśa and Sanskrit is understandable. At least 50% of the words in Marathi are either taken or derived from Sanskrit. Many scholars claim that Sanskrit has derived many words from Marathi. Marathi has also shared directions, vocabulary and grammar with languages such as Indian Dravidian languages, and foreign languages such as Persian, Arabic, English and a little from Portuguese.


Telugu is the most widely spoken Dravidian language in India and around the world. Telugu is an official language in Andhra Pradesh, Telangana and Yanam, making it one of the few languages (along with Hindi, Bengali, and Urdu) with official status in more than one state. It is also spoken by a significant number of people in the Andaman and Nicobar Islands, Chhattisgarh, Karnataka, Maharashtra, Odisha, Tamil Nadu, Gujarat and by the Sri Lankan Gypsy people. It is one of six languages with classical status in India. Telugu ranks fourth by the number of native speakers in India (81 million in the 2011 Census), fifteenth in the Ethnologue list of most-spoken languages worldwide and is the most widely spoken Dravidian language.


Tamil (also spelt as Thamizh: தமிழ்) is a Dravidian language predominantly spoken in Tamil Nadu, Puduchery and many parts of Sri Lanka. It is also spoken by large minorities in the Andaman and Nicobar Islands, Kerala, Karnataka, Andhra Pradesh, Malaysia, Singapore, Mauritius and throughout the world. Tamil ranks fifth by the number of native speakers in India (61 million in the 2001 Census[81][better source needed]) and ranks 20th in the list of most spoken languages.[citation needed] It is one of the 22 scheduled languages of India and was the first Indian language to be declared a classical language by the Government of India in 2004. Tamil is one of the longest surviving classical languages in the world. It has been described as “the only language of contemporary India which is recognisably continuous with a classical past.” The two earliest manuscripts from India, acknowledged and registered by UNESCO Memory of the World register in 1997 and 2005, are in Tamil. Tamil is an official language of Tamil Nadu, Puducherry, Andaman and Nicobar Islands, Sri Lanka and Singapore. It is also recognized as minority language in Canada, Malaysia, Mauritius and South Africa.


After independence, Modern Standard Urdu, the Persianised register of Hindustani became the national language of Pakistan. During British colonial times, a knowledge of Hindustani or Urdu was a must for officials. Hindustani was made the second language of British Indian Empire after English and considered as the language of administration. The British introduced the use of Roman script for Hindustani as well as other languages. Urdu had 70 million speakers in India (as per the Census of 2001), and, along with Hindi, is one of the 22 officially recognised regional languages of India and also an official language in the Indian states of Jammu and Kashmir, Delhi, Uttar Pradesh, Bihar and Telangana that have significant Muslim populations.


Gujarati is an Indo-Aryan language. It is native to the west Indian region of Gujarat. Gujarati is part of the greater Indo-European language family. Gujarati is descended from Old Gujarati (c. 1100 – 1500 CE), the same source as that of Rajasthani. Gujarati is the chief language in the Indian state of Gujarat. It is also an official language in the union territories of Daman and Diu and Dadra and Nagar Haveli. According to the Central Intelligence Agency (CIA), 4.5% of population of India (1.21 billion according to 2011 census) speaks Gujarati. This amounts to 54.6 million speakers in India.


Kannada language is a Dravidian language which branched off from Kannada-Tamil sub group around 500 B.C.E according to the Dravidian scholar Zvelebil.[89] According to the Dravidian scholars Steever and Krishnamurthy, the study of Kannada language is usually divided into three linguistic phases: Old (450–1200 CE), Middle (1200–1700 CE) and Modern (1700–present).[90][91] The earliest written records are from the 5th century, and the earliest available literature in rich manuscript (Kavirajamarga) is from c. 850. Kannada language has the second oldest written tradition of all vernacular languages of India. Current estimates of the total number of epigraph present in Karnataka range from 25,000 by the scholar Sheldon Pollock to over 30,000 by the Sahitya Akademi, making Karnataka state “one of the most densely inscribed pieces of real estate in the world”.[98] According to Garg and Shipely, more than a thousand notable writers have contributed to the wealth of the language.


Malayalam (/mæləˈjɑːləm/;[101] മലയാളം, Malayāḷam ? [ maləjaːɭəm]) has official language status in the state of Kerala and in the union territories of Lakshadweep and Puducherry. It belongs to the Dravidian family of languages and is spoken by some 38 million people. Malayalam is also spoken in the neighboring states of Tamil Nadu and Karnataka; with some speakers in the Nilgiris, Kanyakumari and Coimbatore districts of Tamil Nadu, and the Dakshina Kannada and the Kodagu district of Karnataka. Malayalam originated from Middle Tamil (Sen-Tamil) in the 7th century. As Malayalam began to freely borrow words as well as the rules of grammar from Sanskrit, the Grantha alphabet was adopted for writing and came to be known as Arya Eluttu. This developed into the modern Malayalam script.

Writing systems

Most languages in India are written in Brahmi-derived scripts, such as Devanagari, Tamil, Telugu, Kannada,Meitei Mayek, Odia, Eastern Nagari – Assamese/Bengali, etc., though Urdu is written in a script derived from Arabic, and a few minor languages such as Santali use independent scripts.

Various Indian languages have their own scripts. Hindi, Marathi, Maithili and Angika are languages written using the Devanagari script. Most major languages are written using a script specific to them, such as Assamese (Asamiya) with Asamiya, Bengali with Bengali, Punjabi with Gurmukhi, Meitei with Meitei Mayek, Odia with Odia script, Gujarati with Gujarati, etc. Urdu and sometimes Kashmiri, Saraiki and Sindhi are written in modified versions of the Perso-Arabic script. With this one exception, the scripts of Indian languages are native to India. Languages like Kodava that didn’t have a script whereas Tulu which had a script adopted Kannada due to its readily available printing settings; these languages have taken up the scripts of the local official languages as their own and are written in the Kannada script.


