POC#: Analytics on India census using Spark
In this article, I have explored Census data for India to understand changes in India’s demographics, population growth, religion distribution, gender distribution and sex ratio etc. Even by using small data, I could still gain a lot of valuable insights about the country. I have used Spark SQL and Inbuild graphs provided by Databricks.
India is the second most populous country in the world, with over 1.271 billion people, more than a sixth of the world's population. Already containing 17.5% of the world's population, India is projected to be the world's most populous country by 2025, surpassing China, its population reaching 1.6 billion by 2050.Its population growth rate is 1.2%.
We have loaded Census Data into Tables
India’s States with Number of Districts.India’s Population Density in terms of Districts.
Scheduled Castes (SC’s) Population per State.
Literacy Rate per States in India
States having Literacy Rate less than 50%
Gender wise Literacy rate per State
Education Type wise Literacy rate per State
Genders Ration per State
Population by Religion per state
Drinking water Facility for Every State in India
Status of Electricity Facility per State
Education Facility per State
Medical Facility per State
Bus Transportation per State
Road Status per State
Residence Status in India by State
Outstanding work!
ReplyDeleteThank you very much
DeleteThis is brilliant piece of analysis and can be applied to multiple programme run by Indian government .
ReplyDeleteThanks a lot waseem
ReplyDeleteHi Bhavesh,
ReplyDeleteWhere is the input dataset used for this analysis?
Please post your sparkSQL code to get this visualizaation..
Thanks.
A lot of nice graphs. Is the data set available somewhere for download?
ReplyDeleteI am asking because our product Querona allows building a logical data warehouse and it emulates SQL Server protocol, translating Transact-SQL to Spark SQL. I would like to migrate your charts and show them from Power BI.
very good
ReplyDeleteHi Bhavesh,
ReplyDeleteit's a outstanding work.if possible could you please upload dataset and piece of code on git other will get learning exposure. Really appreciate