🤔 What is Data Visualization?
🎯 What You'll Learn to Create
Bar Charts
What it shows: Compare quantities between different groups (like population of different continents)
When to use: When you want to compare amounts
Pie Charts
What it shows: Parts of a whole (like what percentage of world population lives in each continent)
When to use: When you want to show proportions
Line Charts
What it shows: How something changes over time (like economic growth over years)
When to use: When you want to show trends
Scatter Plots
What it shows: Relationship between two numbers (like country size vs population)
When to use: When you want to find patterns
🗄️ Your Data (World Database)
🏙️ CITY Table - Information about cities:
• Name: City name (like "New York", "London")
• Population: How many people live there
• Country: Which country it's in
• District: State or region within the country
🌍 COUNTRY Table - Information about countries:
• Name: Country name (like "United States", "Japan")
• Population: Total people in the country
• Continent: Which continent (Asia, Europe, etc.)
• Life Expectancy: Average age people live to
• GNP: How much money the country makes (economy)
🗣️ LANGUAGE Table - Information about languages:
• Language: Language name (like "English", "Spanish")
• Country: Which country speaks it
• Official: Is it the official language?
• Percentage: What % of people speak it
⚙️ Stage 1: Setup Your Computer
Install Python Libraries (The Tools)
🤔 What are libraries?
Libraries are like toolboxes with ready-made tools. Instead of building everything from scratch, you use these pre-made tools to create charts quickly!
# Copy this EXACTLY and run it in your Python environment # This installs all the tools you need !pip install mysql-connector-python # Connects to your database !pip install pandas # Handles data like Excel !pip install matplotlib # Creates charts !pip install seaborn # Makes charts prettier !pip install numpy # Does math calculations print("✅ All tools installed! Ready to create charts!")
• mysql-connector-python: Talks to your database
• pandas: Like Excel but in Python - organizes your data
• matplotlib: The main chart-making tool
• seaborn: Makes charts look professional
• numpy: Does math calculations quickly
Import Your Tools
🤔 What does "import" mean?
It's like taking tools out of your toolbox so you can use them. You need to do this every time you start working.
# Import all the tools we just installed # Run this code every time you start a new session import mysql.connector # Database connector import pandas as pd # Data organizer (we call it 'pd' for short) import matplotlib.pyplot as plt # Chart maker (we call it 'plt') import seaborn as sns # Pretty chart maker (we call it 'sns') import numpy as np # Math helper (we call it 'np') from datetime import datetime # Date and time helper # Tell matplotlib to make nice-looking charts plt.style.use('default') plt.rcParams['figure.figsize'] = (10, 6) # Make charts this big plt.rcParams['font.size'] = 12 # Use this text size print("✅ All tools ready to use!") print(f"📅 Started at: {datetime.now()}") print("👤 User: ClydeEnergy")
Instead of typing "matplotlib.pyplot" every time, we use "plt"
Instead of typing "pandas" every time, we use "pd"
This saves time and makes code easier to read!
🔌 Stage 2: Connect to Your Database
Set Up Your Database Connection
🤔 What is a database connection?
Your data is stored in a database (like a digital filing cabinet). To get the data, you need to "connect" to it by providing the right credentials (like a key to open the cabinet).
# Your database connection settings # ⚠️ IMPORTANT: Change "YOUR_PASSWORD_HERE" to your actual MySQL password! config = { 'host': 'localhost', # Your computer 'user': 'root', # Your username (usually 'root') 'password': 'YOUR_PASSWORD_HERE', # ⚠️ CHANGE THIS! 'database': 'world', # Name of your database 'port': 3306 # Door number to connect through } print("📋 Database settings configured!") print("⚠️ Remember to change your password!")
Create Connection Functions
🤔 What is a function?
A function is like a recipe - you write it once, then use it many times. These functions will help you connect to the database and get data easily.
# Helper functions - copy this exactly! # These are like recipes you can use over and over def create_connection(): """This function connects to your database""" try: # Try to connect using your settings connection = mysql.connector.connect(**config) print("✅ Connected to database successfully!") return connection except mysql.connector.Error as e: # If it fails, tell us what went wrong print(f"❌ Connection failed: {e}") print("💡 Check your password and make sure MySQL is running") return None def get_data(query): """This function gets data from your database""" # Connect to database connection = create_connection() if connection: try: # Create a cursor (think of it as a pointer) cursor = connection.cursor(buffered=True) # Run your query (question to the database) cursor.execute(query) # Get the column names columns = [desc[0] for desc in cursor.description] # Get all the data rows = cursor.fetchall() # Close the cursor and connection cursor.close() connection.close() # Turn it into a pandas DataFrame (like Excel spreadsheet) df = pd.DataFrame(rows, columns=columns) return df except mysql.connector.Error as e: print(f"❌ Query failed: {e}") return None else: print("❌ No database connection") return None print("✅ Helper functions created!") print("🔧 Now you can easily get data from your database!")
• create_connection(): Connects to your database
• get_data(query): Gets data using a SQL question
Both functions include error handling - if something goes wrong, they'll tell you what happened!
Test Your Connection
🤔 Why test the connection?
Before creating charts, we need to make sure we can actually get data from the database. This is like checking if your internet connection works before trying to watch a video!
# Test if your connection works print("🔍 Testing database connection...") print("=" * 40) # Try to connect connection = create_connection() if connection: print("🎉 SUCCESS! Your database connection works!") # Let's see what tables you have query = "SHOW TABLES" tables_df = get_data(query) if tables_df is not None: print(f"\n📊 Tables in your database:") for i, table in enumerate(tables_df.iloc[:, 0], 1): print(f" {i}. {table}") # Count records in each table print(f"\n📈 Record counts:") for table in ['city', 'country', 'countrylanguage']: count_query = f"SELECT COUNT(*) FROM {table}" count_df = get_data(count_query) if count_df is not None: count = count_df.iloc[0, 0] print(f" • {table}: {count:,} records") connection.close() print("\n✅ Connection test complete! Ready to make charts!") else: print("❌ Connection failed. Please check:") print(" 1. Your MySQL password in the config") print(" 2. MySQL server is running") print(" 3. 'world' database exists")
📊 Stage 3: Create Your First Chart
Get Data for Your Chart
🤔 What is SQL?
SQL is a language for asking questions to databases. Think of it like asking "Hey database, can you give me the population of each continent?" The database then gives you the answer!
# Let's get population data by continent print("📊 Getting population data...") # This SQL query asks: "Give me the total population for each continent" query = """ SELECT Continent, -- The continent name SUM(Population) as Total_Pop -- Add up all population in that continent FROM country -- From the country table WHERE Population > 0 -- Only countries with population data GROUP BY Continent -- Group the results by continent ORDER BY Total_Pop DESC -- Sort from highest to lowest population """ # Get the data using our helper function population_data = get_data(query) # Let's see what we got! if population_data is not None: print("✅ Data retrieved successfully!") print("\n📋 Here's your data:") print(population_data) # Make the numbers easier to read (in billions) population_data['Population_Billions'] = population_data['Total_Pop'] / 1_000_000_000 print("\n🌍 Population by continent (in billions):") for _, row in population_data.iterrows(): continent = row['Continent'] pop_billions = row['Population_Billions'] print(f" {continent}: {pop_billions:.1f} billion people") else: print("❌ Couldn't get data. Check your database connection.")
• SELECT: "I want these columns"
• FROM: "From this table"
• WHERE: "Only include rows that match this condition"
• GROUP BY: "Group the data by this column"
• ORDER BY: "Sort the results this way"
Create Your First Bar Chart
🤔 What makes a good bar chart?
A bar chart compares quantities. The height of each bar shows the amount. We'll use different colors to make it pretty and add labels so people know what they're looking at!
# Create your first chart! print("🎨 Creating your first chart...") if population_data is not None: # Create a new figure (like a blank canvas) plt.figure(figsize=(12, 8)) # Make it 12 inches wide, 8 inches tall # Create the bar chart bars = plt.bar( population_data['Continent'], # X-axis: continent names population_data['Population_Billions'], # Y-axis: population in billions color=['#FF6B6B', '#4ECDC4', '#45B7D1', '#96CEB4', '#FFEAA7', '#DDA0DD'], # Pretty colors alpha=0.8 # Make bars slightly transparent ) # Add labels and title plt.title('World Population by Continent', fontsize=16, fontweight='bold', pad=20) plt.xlabel('Continent', fontsize=12, fontweight='bold') plt.ylabel('Population (Billions)', fontsize=12, fontweight='bold') # Add value labels on top of each bar for bar, value in zip(bars, population_data['Population_Billions']): height = bar.get_height() plt.text(bar.get_x() + bar.get_width()/2, height + 0.05, f'{value:.1f}B', ha='center', va='bottom', fontweight='bold') # Rotate continent names so they don't overlap plt.xticks(rotation=45, ha='right') # Add a grid to make it easier to read plt.grid(True, alpha=0.3, axis='y') # Make sure everything fits nicely plt.tight_layout() # Show your chart! plt.show() print("🎉 Congratulations! You created your first chart!") print("📊 This chart shows which continents have the most people") else: print("❌ Can't create chart without data")
• plt.figure(): Creates a blank canvas
• plt.bar(): Creates the bars
• plt.title(): Adds a title
• plt.xlabel/ylabel(): Labels the axes
• plt.text(): Adds value labels on bars
• plt.grid(): Adds grid lines
• plt.show(): Displays the chart
Understand Your Chart
🤔 What does your chart tell you?
Your bar chart shows that Asia has the most people, followed by Africa and Europe. This is much easier to see than looking at a table of numbers!
# Let's analyze what your chart shows print("🔍 ANALYZING YOUR FIRST CHART") print("=" * 40) if population_data is not None: # Find the most populous continent most_populous = population_data.iloc[0] # First row (highest population) least_populous = population_data.iloc[-1] # Last row (lowest population) print(f"🥇 Most populous continent: {most_populous['Continent']}") print(f" Population: {most_populous['Population_Billions']:.1f} billion people") print(f"\n🥉 Least populous continent: {least_populous['Continent']}") print(f" Population: {least_populous['Population_Billions']:.1f} billion people") # Calculate the difference difference = most_populous['Population_Billions'] - least_populous['Population_Billions'] print(f"\n📊 Difference: {difference:.1f} billion people") # Calculate total world population total_world_pop = population_data['Population_Billions'].sum() print(f"\n🌍 Total world population: {total_world_pop:.1f} billion people") # Show percentage for each continent print(f"\n📈 Percentage breakdown:") for _, row in population_data.iterrows(): percentage = (row['Population_Billions'] / total_world_pop) * 100 print(f" {row['Continent']}: {percentage:.1f}% of world population") print("\n🎉 You've successfully analyzed your first data visualization!") print("💡 Charts make it much easier to understand data than looking at numbers!")
📈 Stage 4: Create Different Types of Charts
Create a Pie Chart
🤔 When to use pie charts?
Pie charts show parts of a whole - like slices of a pizza! They're perfect for showing what percentage each continent represents of the total world population.
# Create a pie chart showing population percentages print("🥧 Creating a pie chart...") if population_data is not None: # Create a new figure plt.figure(figsize=(10, 8)) # Define colors for each slice colors = ['#FF6B6B', '#4ECDC4', '#45B7D1', '#96CEB4', '#FFEAA7', '#DDA0DD'] # Create the pie chart wedges, texts, autotexts = plt.pie( population_data['Population_Billions'], # Size of each slice labels=population_data['Continent'], # Label for each slice colors=colors, # Colors for each slice autopct='%1.1f%%', # Show percentages startangle=90, # Start from top explode=[0.1, 0, 0, 0, 0, 0] # Make first slice pop out ) # Make the text bold and easier to read for autotext in autotexts: autotext.set_color('white') autotext.set_fontweight('bold') autotext.set_fontsize(11) # Add title plt.title('World Population Distribution by Continent', fontsize=16, fontweight='bold', pad=20) # Make sure the pie is circular plt.axis('equal') # Show the chart plt.show() print("🎉 Pie chart created!") print("🥧 This shows what percentage of world population lives in each continent") else: print("❌ Can't create pie chart without data")
• plt.pie(): Creates the pie chart
• autopct: Shows percentages on slices
• explode: Makes slices pop out
• startangle: Where to start drawing
• plt.axis('equal'): Makes it perfectly round
Create a Horizontal Bar Chart
🤔 Why horizontal bars?
Sometimes horizontal bars look better, especially when you have long names or want to show rankings clearly. Let's show the top 10 most populous countries!
# Get data for top 10 most populous countries print("📊 Creating horizontal bar chart for top countries...") countries_query = """ SELECT Name as Country, Population, Continent FROM country WHERE Population > 0 ORDER BY Population DESC LIMIT 10 """ countries_data = get_data(countries_query) if countries_data is not None: # Convert population to millions for easier reading countries_data['Pop_Millions'] = countries_data['Population'] / 1_000_000 # Create the chart plt.figure(figsize=(12, 8)) # Create horizontal bars bars = plt.barh( countries_data['Country'], # Y-axis: country names countries_data['Pop_Millions'], # X-axis: population in millions color='#4ECDC4', # Nice teal color alpha=0.8 ) # Add labels and title plt.title('Top 10 Most Populous Countries', fontsize=16, fontweight='bold', pad=20) plt.xlabel('Population (Millions)', fontsize=12, fontweight='bold') plt.ylabel('Country', fontsize=12, fontweight='bold') # Add value labels at the end of each bar for i, (bar, value) in enumerate(zip(bars, countries_data['Pop_Millions'])): plt.text(value + 20, bar.get_y() + bar.get_height()/2, f'{value:.0f}M', va='center', fontweight='bold') # Reverse the order so highest is at top plt.gca().invert_yaxis() # Add grid for easier reading plt.grid(True, alpha=0.3, axis='x') # Make sure everything fits plt.tight_layout() # Show the chart plt.show() print("🎉 Horizontal bar chart created!") print("📊 This shows the 10 countries with the most people") # Show the ranking print("\n🏆 TOP 10 RANKING:") for i, (_, row) in enumerate(countries_data.iterrows(), 1): print(f" {i:2d}. {row['Country']}: {row['Pop_Millions']:.0f} million people") else: print("❌ Can't create chart without data")
• plt.barh(): Creates horizontal bars
• plt.gca().invert_yaxis(): Puts highest at top
• bar.get_y(): Gets position for labels
• Perfect for rankings and long labels
Create a Scatter Plot
🤔 What do scatter plots show?
Scatter plots show relationships between two numbers. We'll look at country size vs population to see if bigger countries always have more people (spoiler: they don't!).
# Get data comparing country size vs population print("📍 Creating scatter plot...") scatter_query = """ SELECT Name as Country, Population, SurfaceArea, Continent FROM country WHERE Population > 0 AND SurfaceArea > 0 ORDER BY Population DESC LIMIT 50 """ scatter_data = get_data(scatter_query) if scatter_data is not None: # Create the chart plt.figure(figsize=(12, 8)) # Define colors for each continent continent_colors = { 'Asia': '#FF6B6B', 'Europe': '#4ECDC4', 'Africa': '#45B7D1', 'North America': '#96CEB4', 'South America': '#FFEAA7', 'Oceania': '#DDA0DD' } # Create scatter plot with different colors for each continent for continent in scatter_data['Continent'].unique(): continent_data = scatter_data[scatter_data['Continent'] == continent] plt.scatter( continent_data['SurfaceArea'] / 1000, # X: Size in thousands of km² continent_data['Population'] / 1_000_000, # Y: Population in millions label=continent, # Legend label color=continent_colors.get(continent, '#333333'), alpha=0.7, # Transparency s=80 # Size of dots ) # Add labels and title plt.title('Country Size vs Population (Top 50 Countries)', fontsize=16, fontweight='bold', pad=20) plt.xlabel('Country Size (Thousands of km²)', fontsize=12, fontweight='bold') plt.ylabel('Population (Millions)', fontsize=12, fontweight='bold') # Add legend plt.legend(title='Continent', bbox_to_anchor=(1.05, 1), loc='upper left') # Add grid plt.grid(True, alpha=0.3) # Make sure everything fits plt.tight_layout() # Show the chart plt.show() print("🎉 Scatter plot created!") print("📍 This shows the relationship between country size and population") print("💡 Notice: Bigger countries don't always have more people!") # Find some interesting examples print("\n🔍 INTERESTING FINDINGS:") # Find the largest country largest = scatter_data.loc[scatter_data['SurfaceArea'].idxmax()] print(f"🌍 Largest country: {largest['Country']} ({largest['SurfaceArea']:,.0f} km²)") # Find most populous most_pop = scatter_data.loc[scatter_data['Population'].idxmax()] print(f"👥 Most populous: {most_pop['Country']} ({most_pop['Population']:,.0f} people)") else: print("❌ Can't create scatter plot without data")
• Each dot represents one country
• X-axis shows size, Y-axis shows population
• Different colors show different continents
• You can see if there's a pattern (correlation)
🎨 Stage 5: Make Your Charts Beautiful
Choose Professional Colors
🤔 Why do colors matter?
Colors make your charts more appealing and help people understand the data better. Professional color schemes make your work look more credible and easier to read.
# Define professional color palettes print("🎨 Setting up professional color schemes...") # Color palette 1: Modern blues and greens modern_colors = ['#3498db', '#2ecc71', '#e74c3c', '#f39c12', '#9b59b6', '#1abc9c'] # Color palette 2: Warm and friendly warm_colors = ['#ff7675', '#74b9ff', '#00b894', '#fdcb6e', '#6c5ce7', '#fd79a8'] # Color palette 3: Professional business business_colors = ['#2d3436', '#636e72', '#74b9ff', '#0984e3', '#00b894', '#00cec9'] # Let's create a beautiful chart with our population data if population_data is not None: # Create a more professional version of our first chart plt.figure(figsize=(14, 8)) # Use modern colors bars = plt.bar( population_data['Continent'], population_data['Population_Billions'], color=modern_colors[:len(population_data)], # Use as many colors as needed alpha=0.85, # Slight transparency edgecolor='white', # White borders around bars linewidth=2 # Thick borders ) # Professional styling plt.title('Global Population Distribution by Continent', fontsize=18, fontweight='bold', pad=25, color='#2d3436') # Dark gray title plt.xlabel('Continent', fontsize=14, fontweight='bold', color='#636e72') plt.ylabel('Population (Billions)', fontsize=14, fontweight='bold', color='#636e72') # Add value labels with better styling for bar, value in zip(bars, population_data['Population_Billions']): height = bar.get_height() plt.text(bar.get_x() + bar.get_width()/2, height + 0.05, f'{value:.1f}B', ha='center', va='bottom', fontweight='bold', fontsize=12, color='#2d3436') # Professional grid plt.grid(True, alpha=0.2, linestyle='--', color='#636e72') # Remove top and right spines for cleaner look ax = plt.gca() ax.spines['top'].set_visible(False) ax.spines['right'].set_visible(False) ax.spines['left'].set_color('#636e72') ax.spines['bottom'].set_color('#636e72') # Rotate labels nicely plt.xticks(rotation=45, ha='right', color='#636e72') plt.yticks(color='#636e72') # Add a subtle background color ax.set_facecolor('#f8f9fa') plt.tight_layout() plt.show() print("✨ Professional chart created!") print("🎨 Notice the improved colors, spacing, and overall appearance") else: print("❌ Need population data first")
• edgecolor: Adds borders to bars
• ax.spines: Controls chart borders
• set_facecolor: Changes background color
• alpha: Controls transparency
• Consistent color scheme throughout
Save Your Charts
🤔 Why save charts?
You'll want to use your charts in presentations, reports, or share them with others. Saving them as high-quality images makes them ready for any use!
# Create a function to save charts professionally import os def save_chart(filename, dpi=300, bbox_inches='tight', facecolor='white'): """Save chart with professional quality settings""" # Create charts folder if it doesn't exist if not os.path.exists('charts'): os.makedirs('charts') print("📁 Created 'charts' folder") # Save the chart full_path = f'charts/{filename}' plt.savefig( full_path, dpi=dpi, # High resolution (300 DPI = print quality) bbox_inches=bbox_inches, # Tight cropping facecolor=facecolor, # Background color edgecolor='none', # No border around image format='png' # PNG format (good for presentations) ) print(f"💾 Chart saved as: {full_path}") return full_path # Example: Create and save a chart print("💾 Creating and saving a professional chart...") if population_data is not None: # Create the chart plt.figure(figsize=(12, 8)) bars = plt.bar( population_data['Continent'], population_data['Population_Billions'], color=warm_colors[:len(population_data)], alpha=0.8 ) plt.title('World Population by Continent\n(ClydeEnergy Analysis)', fontsize=16, fontweight='bold', pad=20) plt.xlabel('Continent', fontsize=12, fontweight='bold') plt.ylabel('Population (Billions)', fontsize=12, fontweight='bold') # Add value labels for bar, value in zip(bars, population_data['Population_Billions']): height = bar.get_height() plt.text(bar.get_x() + bar.get_width()/2, height + 0.05, f'{value:.1f}B', ha='center', va='bottom', fontweight='bold') plt.xticks(rotation=45, ha='right') plt.grid(True, alpha=0.3, axis='y') plt.tight_layout() # Save the chart chart_file = save_chart('population_by_continent_professional.png') # Show the chart plt.show() print(f"✅ Chart saved successfully!") print(f"📍 Location: {os.path.abspath(chart_file)}") print("🎯 Perfect for presentations and reports!") else: print("❌ Need data to create and save chart")
• PNG: Best for presentations and web
• PDF: Best for printing and documents
• SVG: Best for websites (scalable)
• 300 DPI: Print quality resolution
💪 Practice Exercises
Exercise 1: Language Analysis
🎯 Your Challenge:
Create a chart showing the top 10 most spoken languages in the world. Use the countrylanguage table!