π Project Overview
What You'll Learn: Create professional, interactive dashboards from ClydeEnergy's world database that you can share with anyone, anywhere in the world.
π― What We'll Build Together
π Final Dashboard Features
- Interactive Maps: Click and explore countries worldwide
- Population Analytics: Filter by population size and density
- Economic Insights: GDP comparisons and economic indicators
- City Explorer: Discover major cities and urban centers
- Language Diversity: Explore global language patterns
- Share Anywhere: Send a link to share your dashboard
π‘ Beginner-Friendly: This guide assumes you're new to Python and data analysis. Every step is explained in simple terms with plenty of examples.
π§ Initial Setup
ποΈ Get ClydeEnergy's World Database
π Download the Database
First, let's get the world database from ClydeEnergy's official repository:
π Get Database from GitHub1
Option 1: Download as ZIP (Easiest for Beginners)
- Click the GitHub link above
- Click the green "Code" button
- Select "Download ZIP"
- Extract the ZIP file to your computer
- Look for the world.sql file inside
π₯οΈ Environment Setup
Required Software (Choose One):
π Recommended for Beginners: Use Google Colab - it requires no setup and runs in your web browser. Just go to colab.research.google.com and create a new notebook.
π¦ Stage 1: Install Required Libraries
What are Libraries? Libraries are pre-written code that help us perform specific tasks. Think of them as tools in a toolbox - each one has a specific purpose.
1
Install pandas (Data Handling)
Cell 1 - Install pandas
# pandas helps us work with data in tables (like Excel) !pip install pandas
Expected Output:
Successfully installed pandas-2.0.3
2
Install SQLAlchemy (Database Connection)
Cell 2 - Install SQLAlchemy
# SQLAlchemy helps us connect to databases !pip install SQLAlchemy
3
Install matplotlib (Basic Charts)
Cell 3 - Install matplotlib
# matplotlib creates charts and graphs !pip install matplotlib
4
Install seaborn (Beautiful Charts)
Cell 4 - Install seaborn
# seaborn makes our charts look professional !pip install seaborn
5
Install numpy (Math Operations)
Cell 5 - Install numpy
# numpy helps with mathematical calculations !pip install numpy
6
Install Interactive Dashboard Libraries
Cell 6 - Install plotly and dash for interactive dashboards
# plotly creates interactive charts you can click and explore !pip install plotly # dash creates web-based dashboards you can share !pip install dash # Additional libraries for enhanced functionality !pip install pymysql
7
Test All Installations
Cell 7 - Test if everything installed correctly
# Test all our installations import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns import plotly.express as px import plotly.graph_objects as go from urllib.parse import quote_plus from sqlalchemy import create_engine print("β All libraries installed successfully!") print(f"π pandas version: {pd.__version__}") print(f"π’ numpy version: {np.__version__}") print("π― Ready to start analyzing data!")
π Stage 2: Database Connection
What We're Doing: We'll connect to ClydeEnergy's world database to access information about countries, cities, and languages worldwide.
1
Import Required Libraries
Cell 1 - Import libraries for database work
# Import all the tools we need import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns import plotly.express as px import plotly.graph_objects as go from sqlalchemy import create_engine from urllib.parse import quote_plus import warnings warnings.filterwarnings('ignore') # Hide warning messages print("π All libraries imported successfully!") print("π Ready to connect to the database!")
2
Setup Database Connection
Cell 2 - Database connection setup
# Database configuration DATABASE_CONFIG = { 'host': 'localhost', 'port': 3306, 'username': 'root', 'password': 'your_password', # Replace with your actual password 'database': 'world' } print("βοΈ Database configuration ready!") print(f"π Connecting to: {DATABASE_CONFIG['database']} database") # URL-encode the password to handle special characters encoded_password = quote_plus(DATABASE_CONFIG['password']) # Create connection string with encoded password connection_string = f"mysql+pymysql://{DATABASE_CONFIG['username']}:{encoded_password}@{DATABASE_CONFIG['host']}:{DATABASE_CONFIG['port']}/{DATABASE_CONFIG['database']}" print("π Connection string created successfully!")
3
Create Database Engine (Updated)
Cell 3 - Create database connection (Updated as requested)
# Create the database engine (this is like opening a door to the database) try: engine = create_engine(connection_string) print("β Database engine created successfully!") # Test the connection test_query = "SELECT COUNT(*) as country_count FROM country" result = pd.read_sql(test_query, engine) print(f"π Connected to ClydeEnergy's world database!") print(f"π Found {result['country_count'][0]} countries in the database") print("π Connection test successful!") except Exception as e: print(f"β Connection failed: {str(e)}") print("π‘ Please check your database credentials and try again")
π Stage 3: Load Data from Database
1
Load Countries Data
Cell 1 - Load all country information
# Load country data with all important information countries_query = """ SELECT Code, Name, Continent, Region, SurfaceArea, IndepYear, Population, LifeExpectancy, GNP, GNPOld, LocalName, GovernmentForm, HeadOfState, Capital, Code2 FROM country ORDER BY Population DESC """ try: countries_df = pd.read_sql(countries_query, engine) print("π COUNTRIES DATA LOADED SUCCESSFULLY!") print(f"π Total countries: {len(countries_df):,}") print("β Countries data ready for analysis!") except Exception as e: print(f"β Error loading countries data: {str(e)}")
2
Load Cities Data
Cell 2 - Load all city information
# Load city data with country information cities_query = """ SELECT c.ID, c.Name as CityName, c.CountryCode, c.District, c.Population as CityPopulation, co.Name as CountryName, co.Continent FROM city c LEFT JOIN country co ON c.CountryCode = co.Code ORDER BY c.Population DESC """ try: cities_df = pd.read_sql(cities_query, engine) print("ποΈ CITIES DATA LOADED SUCCESSFULLY!") print(f"π Total cities: {len(cities_df):,}") print("β Cities data ready for analysis!") except Exception as e: print(f"β Error loading cities data: {str(e)}")
3
Load Languages Data
Cell 3 - Load language information
# Load language data with country information languages_query = """ SELECT cl.CountryCode, cl.Language, cl.IsOfficial, cl.Percentage, co.Name as CountryName, co.Continent, co.Population as CountryPopulation FROM countrylanguage cl LEFT JOIN country co ON cl.CountryCode = co.Code ORDER BY cl.Percentage DESC """ try: languages_df = pd.read_sql(languages_query, engine) print("π£οΈ LANGUAGES DATA LOADED SUCCESSFULLY!") print(f"π Total language records: {len(languages_df):,}") print("β Languages data ready for analysis!") except Exception as e: print(f"β Error loading languages data: {str(e)}")
π§Ή Stage 4: Clean and Prepare Data
1
Basic Data Cleaning
Cell 1 - Clean and enhance data
# Clean and enhance countries data countries_clean = countries_df.copy() # Fill missing values countries_clean['LifeExpectancy'] = countries_clean['LifeExpectancy'].fillna( countries_clean.groupby('Continent')['LifeExpectancy'].transform('mean') ) countries_clean['GNP'] = countries_clean['GNP'].fillna(0) # Create useful calculated fields countries_clean['PopulationDensity'] = countries_clean['Population'] / countries_clean['SurfaceArea'] countries_clean['GNPPerCapita'] = np.where( countries_clean['Population'] > 0, (countries_clean['GNP'] * 1000000) / countries_clean['Population'], 0 ) print("β Countries data cleaned and enhanced!") print(f"π Shape: {countries_clean.shape}") # Clean cities data cities_clean = cities_df.copy() cities_clean['District'] = cities_clean['District'].fillna('Unknown') print("β Cities data cleaned!") # Clean languages data languages_clean = languages_df.copy() languages_clean['IsOfficial'] = languages_clean['IsOfficial'].map({'T': True, 'F': False}) print("β Languages data cleaned!") print("π All data cleaning completed!")
π Stage 5: Basic Analysis
1
Explore Country Statistics
Cell 1 - Basic country analysis
# Basic statistics about countries print("π WORLD COUNTRIES ANALYSIS") print("=" * 40) print(f"π Total countries: {len(countries_clean)}") print(f"π Continents: {countries_clean['Continent'].nunique()}") print(f"π₯ Total population: {countries_clean['Population'].sum():,}") print(f"π° Total GNP: ${countries_clean['GNP'].sum():,.0f} million") # Top 10 most populous countries print("\nπ TOP 10 MOST POPULOUS COUNTRIES:") top10_countries = countries_clean.nlargest(10, 'Population')[['Name', 'Population', 'Continent']] for i, (idx, row) in enumerate(top10_countries.iterrows()): print(f" {i + 1}. {row['Name']}: {row['Population']:,} people ({row['Continent']})") # Continent breakdown print("\nπ POPULATION BY CONTINENT:") continent_pop = countries_clean.groupby('Continent')['Population'].sum().sort_values(ascending=False) for continent, population in continent_pop.items(): print(f" {continent}: {population:,} people") print("\nβ Basic analysis completed!")
2
Analyze Cities and Languages
Cell 2 - Cities and languages analysis
# Analyze cities print("ποΈ WORLD CITIES ANALYSIS") print("=" * 40) print(f"π Total cities: {len(cities_clean)}") print(f"π Countries with cities: {cities_clean['CountryCode'].nunique()}") # Top 10 largest cities print("\nπ TOP 10 LARGEST CITIES:") top10_cities = cities_clean.nlargest(10, 'CityPopulation')[['CityName', 'CountryName', 'CityPopulation']] for i, (idx, row) in enumerate(top10_cities.iterrows()): print(f" {i + 1}. {row['CityName']}, {row['CountryName']}: {row['CityPopulation']:,} people") # Analyze languages print("\nπ£οΈ WORLD LANGUAGES ANALYSIS") print("=" * 40) print(f"π Total language records: {len(languages_clean)}") print(f"π£οΈ Unique languages: {languages_clean['Language'].nunique()}") # Most common languages print("\nπ MOST COMMON LANGUAGES:") common_languages = languages_clean['Language'].value_counts().head(10) for language, count in common_languages.items(): print(f" {language}: spoken in {count} countries") print("\nβ Cities and languages analysis completed!")
π¨ Stage 6: Create Beautiful Visualizations
1
Population Visualization
Cell 1 - Create population charts
# Set up visualization style plt.style.use('seaborn-v0_8') plt.rcParams['figure.figsize'] = (12, 8) sns.set_palette("husl") # 1. Population by Continent (Bar Chart) plt.figure(figsize=(12, 6)) continent_pop = countries_clean.groupby('Continent')['Population'].sum().sort_values(ascending=False) bars = plt.bar(continent_pop.index, continent_pop.values, color='skyblue', edgecolor='navy', alpha=0.7) plt.title('World Population by Continent', fontsize=16, fontweight='bold') plt.xlabel('Continent', fontsize=12) plt.ylabel('Population', fontsize=12) plt.xticks(rotation=45) # Add value labels on bars for bar in bars: height = bar.get_height() plt.text(bar.get_x() + bar.get_width()/2., height, f'{height/1000000:.1f}M', ha='center', va='bottom', fontweight='bold') plt.tight_layout() plt.show() print("π Population by continent chart created!") # 2. Top 15 Countries by Population (Horizontal Bar Chart) plt.figure(figsize=(12, 8)) top15_countries = countries_clean.nlargest(15, 'Population') plt.barh(top15_countries['Name'], top15_countries['Population'], color='lightcoral', alpha=0.8) plt.title('Top 15 Most Populous Countries', fontsize=16, fontweight='bold') plt.xlabel('Population', fontsize=12) plt.ylabel('Country', fontsize=12) plt.gca().invert_yaxis() # Largest on top # Format x-axis to show millions plt.gca().xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'{x/1000000:.0f}M')) plt.tight_layout() plt.show() print("π Top countries chart created!")
2
Economic Analysis Charts
Cell 2 - Create economic visualization
# Economic Analysis Visualizations # 1. GNP vs Population Scatter Plot plt.figure(figsize=(12, 8)) # Filter out countries with 0 GNP for better visualization countries_with_gnp = countries_clean[countries_clean['GNP'] > 0] scatter = plt.scatter(countries_with_gnp['Population'], countries_with_gnp['GNP'], c=countries_with_gnp['LifeExpectancy'], cmap='viridis', alpha=0.6, s=60, edgecolors='black', linewidth=0.5) plt.xlabel('Population', fontsize=12) plt.ylabel('GNP (Million USD)', fontsize=12) plt.title('Population vs GNP by Country (Color = Life Expectancy)', fontsize=14, fontweight='bold') # Add colorbar cbar = plt.colorbar(scatter) cbar.set_label('Life Expectancy (years)', fontsize=12) # Format axes plt.gca().xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'{x/1000000:.0f}M')) plt.gca().yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x/1000:.0f}B')) plt.tight_layout() plt.show() print("π° Economic analysis scatter plot created!") # 2. Life Expectancy by Continent (Box Plot) plt.figure(figsize=(12, 6)) countries_with_life = countries_clean[countries_clean['LifeExpectancy'] > 0] sns.boxplot(data=countries_with_life, x='Continent', y='LifeExpectancy', palette='Set2') plt.title('Life Expectancy Distribution by Continent', fontsize=16, fontweight='bold') plt.xlabel('Continent', fontsize=12) plt.ylabel('Life Expectancy (years)', fontsize=12) plt.xticks(rotation=45) plt.tight_layout() plt.show() print("π Life expectancy box plot created!") # 3. Population Density Analysis plt.figure(figsize=(12, 6)) countries_with_density = countries_clean[countries_clean['PopulationDensity'] > 0] # Use log scale for better visualization plt.hist(countries_with_density['PopulationDensity'], bins=30, color='gold', alpha=0.7, edgecolor='black') plt.xlabel('Population Density (people per sq km)', fontsize=12) plt.ylabel('Number of Countries', fontsize=12) plt.title('Distribution of Population Density Worldwide', fontsize=16, fontweight='bold') plt.yscale('log') # Log scale for better visualization plt.tight_layout() plt.show() print("π Population density histogram created!") print("β All economic visualizations completed!")
3
City and Language Analysis
Cell 3 - Cities and languages charts
# Cities and Languages Analysis # 1. Top 20 Largest Cities plt.figure(figsize=(14, 8)) top20_cities = cities_clean.nlargest(20, 'CityPopulation') colors = plt.cm.Set3(np.linspace(0, 1, len(top20_cities))) bars = plt.barh(top20_cities['CityName'], top20_cities['CityPopulation'], color=colors) plt.title('Top 20 Largest Cities in the World', fontsize=16, fontweight='bold') plt.xlabel('Population', fontsize=12) plt.ylabel('City', fontsize=12) plt.gca().invert_yaxis() # Format x-axis plt.gca().xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'{x/1000000:.1f}M')) plt.tight_layout() plt.show() print("ποΈ Top cities chart created!") # 2. Cities by Continent plt.figure(figsize=(10, 6)) cities_by_continent = cities_clean.groupby('Continent').size().sort_values(ascending=False) plt.pie(cities_by_continent.values, labels=cities_by_continent.index, autopct='%1.1f%%', startangle=90, colors=sns.color_palette('pastel')) plt.title('Distribution of Cities by Continent', fontsize=16, fontweight='bold') plt.axis('equal') plt.show() print("π Cities by continent pie chart created!") # 3. Most Common Languages plt.figure(figsize=(12, 8)) top_languages = languages_clean['Language'].value_counts().head(15) plt.barh(range(len(top_languages)), top_languages.values, color='lightgreen', alpha=0.8) plt.yticks(range(len(top_languages)), top_languages.index) plt.xlabel('Number of Countries', fontsize=12) plt.ylabel('Language', fontsize=12) plt.title('Most Widely Spoken Languages (by number of countries)', fontsize=14, fontweight='bold') plt.gca().invert_yaxis() # Add value labels for i, v in enumerate(top_languages.values): plt.text(v + 0.1, i, str(v), va='center', fontweight='bold') plt.tight_layout() plt.show() print("π£οΈ Languages chart created!") print("β All basic visualizations completed!")
β‘ Stage 7: Create Interactive Dashboard
1
Interactive World Map
Cell 1 - Create interactive world map
# Create Interactive World Map with Plotly import plotly.express as px import plotly.graph_objects as go # Prepare data for world map countries_map = countries_clean.copy() # Create interactive choropleth map fig = px.choropleth( countries_map, locations='Code', color='Population', hover_name='Name', hover_data={ 'Population': ':,', 'LifeExpectancy': ':.1f', 'GNPPerCapita': ':,.0f', 'Continent': True }, color_continuous_scale='Viridis', title='π Interactive World Population Map', labels={'Population': 'Population', 'Code': 'Country Code'} ) # Update layout fig.update_layout( title_x=0.5, title_font_size=20, geo=dict( showframe=False, showcoastlines=True, projection_type='equirectangular' ), width=1000, height=600 ) # Show the map fig.show() print("πΊοΈ Interactive world map created!") print("π‘ Click and hover over countries to explore data!")
2
Interactive Charts Dashboard
Cell 2 - Create interactive charts
# Create Multiple Interactive Charts # 1. Interactive Population vs GNP Scatter Plot fig1 = px.scatter( countries_clean[countries_clean['GNP'] > 0], x='Population', y='GNP', size='SurfaceArea', color='Continent', hover_name='Name', hover_data={ 'Population': ':,', 'GNP': ':,.0f', 'LifeExpectancy': ':.1f', 'SurfaceArea': ':,.0f' }, title='π Population vs GNP by Country (Interactive)', labels={ 'Population': 'Population', 'GNP': 'GNP (Million USD)', 'SurfaceArea': 'Surface Area (sq km)' } ) fig1.update_layout( title_x=0.5, title_font_size=16, width=800, height=500 ) fig1.show() print("π Interactive scatter plot created!") # 2. Interactive Bar Chart - Top Countries by Population top15 = countries_clean.nlargest(15, 'Population') fig2 = px.bar( top15, x='Name', y='Population', color='Continent', hover_data={ 'Population': ':,', 'LifeExpectancy': ':.1f', 'GNPPerCapita': ':,.0f' }, title='π Top 15 Most Populous Countries (Interactive)', labels={'Population': 'Population', 'Name': 'Country'} ) fig2.update_layout( title_x=0.5, title_font_size=16, xaxis_tickangle=-45, width=1000, height=500 ) fig2.show() print("π Interactive bar chart created!") print("β All interactive charts created!")
3
Create Full Dashboard App
Cell 3 - Build complete dashboard with Dash
# Create a Complete Interactive Dashboard import dash from dash import dcc, html, Input, Output import plotly.express as px # Initialize the Dash app app = dash.Dash(__name__) # Define the layout app.layout = html.Div([ html.H1("π ClydeEnergy's World Database Dashboard", style={'textAlign': 'center', 'color': '#2c3e50', 'marginBottom': 30}), # Controls section html.Div([ html.Div([ html.Label("Select Continent:", style={'fontWeight': 'bold'}), dcc.Dropdown( id='continent-dropdown', options=[{'label': 'All', 'value': 'All'}] + [{'label': cont, 'value': cont} for cont in countries_clean['Continent'].unique()], value='All', style={'marginBottom': 20} ) ], style={'width': '30%', 'display': 'inline-block'}), html.Div([ html.Label("Select Metric:", style={'fontWeight': 'bold'}), dcc.Dropdown( id='metric-dropdown', options=[ {'label': 'Population', 'value': 'Population'}, {'label': 'GNP', 'value': 'GNP'}, {'label': 'Life Expectancy', 'value': 'LifeExpectancy'}, {'label': 'Population Density', 'value': 'PopulationDensity'} ], value='Population', style={'marginBottom': 20} ) ], style={'width': '30%', 'display': 'inline-block', 'marginLeft': '5%'}) ], style={'marginBottom': 30}), # Charts section html.Div([ # World map html.Div([ dcc.Graph(id='world-map') ], style={'width': '100%', 'marginBottom': 30}), # Bar chart and scatter plot html.Div([ html.Div([ dcc.Graph(id='bar-chart') ], style={'width': '48%', 'display': 'inline-block'}), html.Div([ dcc.Graph(id='scatter-plot') ], style={'width': '48%', 'display': 'inline-block', 'marginLeft': '4%'}) ]) ]) ]) # Callbacks for interactivity @app.callback( [Output('world-map', 'figure'), Output('bar-chart', 'figure'), Output('scatter-plot', 'figure')], [Input('continent-dropdown', 'value'), Input('metric-dropdown', 'value')] ) def update_charts(selected_continent, selected_metric): # Filter data based on continent selection if selected_continent == 'All': filtered_df = countries_clean else: filtered_df = countries_clean[countries_clean['Continent'] == selected_continent] # Create world map map_fig = px.choropleth( filtered_df, locations='Code', color=selected_metric, hover_name='Name', color_continuous_scale='Viridis', title=f'World {selected_metric} Map' ) map_fig.update_layout(height=400) # Create bar chart top10 = filtered_df.nlargest(10, selected_metric) bar_fig = px.bar( top10, x='Name', y=selected_metric, color='Continent', title=f'Top 10 Countries by {selected_metric}' ) bar_fig.update_layout(xaxis_tickangle=-45, height=400) # Create scatter plot scatter_fig = px.scatter( filtered_df[filtered_df['GNP'] > 0], x='Population', y='GNP', size=selected_metric, color='Continent', hover_name='Name', title=f'Population vs GNP (Size = {selected_metric})' ) scatter_fig.update_layout(height=400) return map_fig, bar_fig, scatter_fig print("π― Dashboard app created!") print("π± To run the dashboard, execute the next cell") print("π‘ The dashboard will be available at http://127.0.0.1:8050/")
π Stage 8: Share Your Dashboard (Windows Fixed)
β οΈ Windows Connection Issue Fixed: The error occurs because `host='0.0.0.0'` doesn't work on Windows. Here are the working solutions:
1
Solution 1: Run Dashboard on Localhost (FIXED)
Cell 1 - Fixed dashboard runner for Windows
# FIXED VERSION FOR WINDOWS - Run the dashboard locally if __name__ == '__main__': # Use localhost instead of 0.0.0.0 for Windows compatibility try: print("π Starting ClydeEnergy's World Dashboard...") print("β³ Please wait while the dashboard loads...") app.run_server(debug=True, host='127.0.0.1', port=8050) except Exception as e: print(f"β Port 8050 busy. Trying alternative port...") app.run_server(debug=True, host='127.0.0.1', port=8051) print("π Dashboard is running!") print("π Access your dashboard at: http://127.0.0.1:8050/") print("π± Open your web browser and go to the URL above") print("π The dashboard will auto-refresh when you make changes") print("βΉοΈ To stop the dashboard, press Ctrl+C in the terminal")
2