Melbourne Energy Analytics

Production-ready SQL analytics platform analyzing 250K+ installations and 4.5M energy readings with dynamic time filtering and performance optimization

Completed: February 2026
SQL, PostgreSQL, Python, Streamlit, Power BI

Project Overview

I built a production-ready SQL analytics platform analyzing 250,000+ installations and 4.5 million energy readings across Melbourne and regional Victoria. The platform features a star schema database design, dynamic time filtering, and comprehensive performance optimization achieving 7-9x query speedup through strategic indexing.

Role

Database Architect & Developer

Duration

3 weeks

Tools Used

PostgreSQL 18, Python, Streamlit, Plotly, Power BI

Data Volume

4.7M+ records (250K installations, 4.5M energy readings)

Business Problem

Energy installation companies in Victoria need to track multiple operational dimensions to improve efficiency and profitability. Key questions include: Which contractors consistently deliver quality work on time? Are installations actually reducing energy consumption? Which regions have the highest demand? What are the payment collection patterns?

Dataset Description

I designed and generated a realistic dataset simulating Victoria's energy efficiency program (VEU) operations:

  • 250,000 installations across 10 regions
  • 4.5 million energy readings (pre/post installation)
  • 25 contractors with varying performance levels
  • 24 months of historical data
  • Realistic patterns: seasonal demand, contractor learning curves, realistic delays

The challenge was not just generating data, but ensuring it had realistic statistical patterns that would support meaningful analysis in interviews and demonstrations.

Technical Approach

Database Design & Architecture

  • Designed a star schema optimized for analytical queries
  • 3 fact tables (installations, payments, energy readings) and 4 dimension tables
  • Implemented referential integrity with foreign key constraints
  • Created 10 strategic indexes achieving 7-9x performance improvement
  • Built 3 materialized views for frequently accessed metrics

Data Generation & Validation

  • Developed Python scripts using pandas and Faker to generate 4.7M+ realistic records
  • Implemented seasonal patterns (HVAC installations peak in summer/winter)
  • Modeled contractor learning curves (quality improves up to 15% over time)
  • Created 16 automated quality checks across Python and SQL layers
  • Ensured zero NULL timestamp issues through careful data generation logic
  • Generated energy savings showing 15-30% post-installation reduction
SQL Query Example 1

Advanced SQL with CTEs and window functions for contractor performance ranking

SQL Query Example 2

Advanced SQL with CTEs and window functions for contractor performance ranking. cont.

Results & Insights

Streamlit Analytics Dashboard

The interactive Streamlit dashboard provides six visualizations with dynamic time filtering (6/12/24 months) to analyze operational performance:

Key Features

  • Executive Summary: 5 KPI cards showing total installations, completion rate, delay rate, satisfaction, and revenue
  • Installation Trends: Monthly volume trends with completion status breakdown and revenue correlation
  • Contractor Performance: Top 10 contractors by satisfaction with scatter plot showing volume vs quality
  • Regional Analysis: Geographic distribution of installations and revenue by region
  • Energy Savings: Pre/post consumption comparison validating installation effectiveness
  • Payment Analysis: Accounts receivable aging and payment collection patterns
Dashboard Executive Summary

Executive dashboard with 5 real-time KPI cards and dynamic time filtering

Installation Trends

Monthly trends showing volume, status, revenue, and satisfaction patterns

Contractor Performance

Top 10 contractors with bar chart and scatter plot analysis

Regional Analysis

Pie chart and bar chart showing regional distribution and revenue

Power BI Dashboards

Developed professional Power BI dashboards with advanced DAX measures for executive-level business intelligence:

Dashboard Features

  • Executive Summary: KPI cards with time intelligence measures (YTD, MoM growth)
  • Contractor Performance: Top 10 rankings, volume vs quality scatter plots, performance tiers
  • Regional Analysis: Geographic distribution, revenue by region, regional metrics matrix
  • Time Series Analysis: Monthly trends, seasonal patterns, revenue collection over time
  • Interactive Slicers: Date range, region, completion status filters synchronized across all pages
  • Drill-Through Pages: Detailed contractor and region analysis with contextual filtering

Advanced DAX Measures

  • Time intelligence: YTD calculations, previous month comparisons, MoM growth percentages
  • Energy savings analysis: Pre/post consumption comparison with SUMX iteration
  • Performance tiers: Dynamic categorization based on satisfaction scores
  • Running totals: Cumulative installations with ALLSELECTED filter context
  • Completion and delay rate calculations with DIVIDE for safe division
Power BI Executive Summary

Power BI Executive Dashboard with KPI cards and monthly trends

Power BI Contractor Performance

Contractor performance analysis with top 10 rankings and scatter plot

Power BI Regional Analysis

Regional distribution and revenue analysis with interactive visualizations

Power BI Time Trends

Time series analysis showing seasonal patterns and revenue trends

Performance Optimization Results

Strategic indexing delivered significant performance improvements:

  • Contractor Performance Query: 847ms → 92ms (9.2x faster)
  • Energy Savings Analysis: 3,241ms → 412ms (7.9x faster)
  • Payment Aging Report: 521ms → 68ms (7.7x faster)

Optimization techniques included composite indexes, covering indexes with INCLUDE clause, and partial indexes for recent data.

Technical Achievements

  • 4.7 million records generated with realistic statistical patterns
  • 7-9x query performance improvement through strategic indexing
  • 16 quality checks across Python and SQL validation layers
  • Dynamic SQL generation with parameterized time ranges
  • 3-tier authentication (Streamlit secrets → env → manual input)
  • Zero NULL timestamp issues through careful data design
  • Professional Power BI dashboards with 15+ advanced DAX measures
  • Interactive drill-through pages and synchronized slicers across multiple dashboards

Recommendations & Conclusion

Key Learnings

This project taught me the difference between "working code" and "production-ready code":

  • Data Serialization Matters Early

    Initially used NULL timestamps for delayed installations, causing CSV serialization issues. Refactored to always generate valid timestamps—delayed jobs simply start later. Lesson: think about data serialization from day one.

  • Multi-Layer Validation is Essential

    Python validation catches logic errors during generation; SQL validation catches referential integrity issues after loading. Both layers are necessary for production confidence.

  • Performance Optimization is Iterative

    Started with slow queries (3+ seconds), added indexes (7-9x improvement), then created materialized views. Each layer builds on the previous. Key is measuring before and after each change.

  • Realistic Data Makes Better Analysis

    Adding seasonal patterns and contractor learning curves made the analysis more interesting. Interview questions about "what patterns do you see?" have real answers when data has realistic structure.

Future Improvements

If I were to continue developing this project:

  1. Integrate materialized views: Currently created but not used by dashboard queries
  2. Add pytest unit tests: Test data generation functions, SQL logic, validation checks
  3. Implement table partitioning: Partition fact_energy_readings by date for better scale
  4. Add CI/CD pipeline: GitHub Actions for automated testing and deployment
  5. Build REST API: FastAPI endpoints for programmatic data access

Skills Demonstrated

This project showcases my abilities in:

  • Advanced SQL: CTEs, window functions, complex joins, FILTER aggregates, performance tuning
  • Database Design: Star schema architecture, normalization, indexing strategy
  • Performance Optimization: Query analysis, index selection, materialized views
  • Data Engineering: ETL pipelines, data quality validation, error handling
  • Python Development: pandas operations, statistical modeling, data generation
  • Power BI & DAX: Advanced measures, time intelligence, interactive dashboards, drill-through pages
  • Dashboard Development: Streamlit, Plotly, Power BI, interactive visualizations
  • Production Thinking: Authentication patterns, validation layers, comprehensive documentation