Project Overview
I built a production-ready SQL analytics platform analyzing 250,000+ installations and 4.5 million energy readings across Melbourne and regional Victoria. The platform features a star schema database design, dynamic time filtering, and comprehensive performance optimization achieving 7-9x query speedup through strategic indexing.
Role
Database Architect & Developer
Tools Used
PostgreSQL 18, Python, Streamlit, Plotly, Power BI
Data Volume
4.7M+ records (250K installations, 4.5M energy readings)
Business Problem
Energy installation companies in Victoria need to track multiple operational dimensions to improve efficiency and profitability. Key questions include: Which contractors consistently deliver quality work on time? Are installations actually reducing energy consumption? Which regions have the highest demand? What are the payment collection patterns?
Dataset Description
I designed and generated a realistic dataset simulating Victoria's energy efficiency program (VEU) operations:
- 250,000 installations across 10 regions
- 4.5 million energy readings (pre/post installation)
- 25 contractors with varying performance levels
- 24 months of historical data
- Realistic patterns: seasonal demand, contractor learning curves, realistic delays
The challenge was not just generating data, but ensuring it had realistic statistical patterns that would support meaningful analysis in interviews and demonstrations.
Technical Approach
Database Design & Architecture
- Designed a star schema optimized for analytical queries
- 3 fact tables (installations, payments, energy readings) and 4 dimension tables
- Implemented referential integrity with foreign key constraints
- Created 10 strategic indexes achieving 7-9x performance improvement
- Built 3 materialized views for frequently accessed metrics
Data Generation & Validation
- Developed Python scripts using pandas and Faker to generate 4.7M+ realistic records
- Implemented seasonal patterns (HVAC installations peak in summer/winter)
- Modeled contractor learning curves (quality improves up to 15% over time)
- Created 16 automated quality checks across Python and SQL layers
- Ensured zero NULL timestamp issues through careful data generation logic
- Generated energy savings showing 15-30% post-installation reduction
Advanced SQL with CTEs and window functions for contractor performance ranking
Advanced SQL with CTEs and window functions for contractor performance ranking. cont.
Results & Insights
Streamlit Analytics Dashboard
The interactive Streamlit dashboard provides six visualizations with dynamic time filtering (6/12/24 months) to analyze operational performance:
Key Features
- Executive Summary: 5 KPI cards showing total installations, completion rate, delay rate, satisfaction, and revenue
- Installation Trends: Monthly volume trends with completion status breakdown and revenue correlation
- Contractor Performance: Top 10 contractors by satisfaction with scatter plot showing volume vs quality
- Regional Analysis: Geographic distribution of installations and revenue by region
- Energy Savings: Pre/post consumption comparison validating installation effectiveness
- Payment Analysis: Accounts receivable aging and payment collection patterns
Executive dashboard with 5 real-time KPI cards and dynamic time filtering
Monthly trends showing volume, status, revenue, and satisfaction patterns
Top 10 contractors with bar chart and scatter plot analysis
Pie chart and bar chart showing regional distribution and revenue
Power BI Dashboards
Developed professional Power BI dashboards with advanced DAX measures for executive-level business intelligence:
Dashboard Features
- Executive Summary: KPI cards with time intelligence measures (YTD, MoM growth)
- Contractor Performance: Top 10 rankings, volume vs quality scatter plots, performance tiers
- Regional Analysis: Geographic distribution, revenue by region, regional metrics matrix
- Time Series Analysis: Monthly trends, seasonal patterns, revenue collection over time
- Interactive Slicers: Date range, region, completion status filters synchronized across all pages
- Drill-Through Pages: Detailed contractor and region analysis with contextual filtering
Advanced DAX Measures
- Time intelligence: YTD calculations, previous month comparisons, MoM growth percentages
- Energy savings analysis: Pre/post consumption comparison with SUMX iteration
- Performance tiers: Dynamic categorization based on satisfaction scores
- Running totals: Cumulative installations with ALLSELECTED filter context
- Completion and delay rate calculations with DIVIDE for safe division
Power BI Executive Dashboard with KPI cards and monthly trends
Contractor performance analysis with top 10 rankings and scatter plot
Regional distribution and revenue analysis with interactive visualizations
Time series analysis showing seasonal patterns and revenue trends
Performance Optimization Results
Strategic indexing delivered significant performance improvements:
- Contractor Performance Query: 847ms → 92ms (9.2x faster)
- Energy Savings Analysis: 3,241ms → 412ms (7.9x faster)
- Payment Aging Report: 521ms → 68ms (7.7x faster)
Optimization techniques included composite indexes, covering indexes with INCLUDE clause, and partial indexes for recent data.
Technical Achievements
- 4.7 million records generated with realistic statistical patterns
- 7-9x query performance improvement through strategic indexing
- 16 quality checks across Python and SQL validation layers
- Dynamic SQL generation with parameterized time ranges
- 3-tier authentication (Streamlit secrets → env → manual input)
- Zero NULL timestamp issues through careful data design
- Professional Power BI dashboards with 15+ advanced DAX measures
- Interactive drill-through pages and synchronized slicers across multiple dashboards
Recommendations & Conclusion
Key Learnings
This project taught me the difference between "working code" and "production-ready code":
-
Data Serialization Matters Early
Initially used NULL timestamps for delayed installations, causing CSV serialization issues. Refactored to always generate valid timestamps—delayed jobs simply start later. Lesson: think about data serialization from day one.
-
Multi-Layer Validation is Essential
Python validation catches logic errors during generation; SQL validation catches referential integrity issues after loading. Both layers are necessary for production confidence.
-
Performance Optimization is Iterative
Started with slow queries (3+ seconds), added indexes (7-9x improvement), then created materialized views. Each layer builds on the previous. Key is measuring before and after each change.
-
Realistic Data Makes Better Analysis
Adding seasonal patterns and contractor learning curves made the analysis more interesting. Interview questions about "what patterns do you see?" have real answers when data has realistic structure.
Future Improvements
If I were to continue developing this project:
- Integrate materialized views: Currently created but not used by dashboard queries
- Add pytest unit tests: Test data generation functions, SQL logic, validation checks
- Implement table partitioning: Partition fact_energy_readings by date for better scale
- Add CI/CD pipeline: GitHub Actions for automated testing and deployment
- Build REST API: FastAPI endpoints for programmatic data access
Skills Demonstrated
This project showcases my abilities in:
- Advanced SQL: CTEs, window functions, complex joins, FILTER aggregates, performance tuning
- Database Design: Star schema architecture, normalization, indexing strategy
- Performance Optimization: Query analysis, index selection, materialized views
- Data Engineering: ETL pipelines, data quality validation, error handling
- Python Development: pandas operations, statistical modeling, data generation
- Power BI & DAX: Advanced measures, time intelligence, interactive dashboards, drill-through pages
- Dashboard Development: Streamlit, Plotly, Power BI, interactive visualizations
- Production Thinking: Authentication patterns, validation layers, comprehensive documentation