Main Site โ†—

data-analyst

by zenobi-us484GitHub

A data analyst skill that provides structured workflows for SQL querying, dashboard creation, and statistical analysis. It includes specific checklists for data quality, visualization design, and stakeholder communication. The skill integrates with tools like Tableau, Power BI, and dbt to deliver actionable business insights.

Loading...

Output Preview

output_preview.md

Customer Retention Analysis Dashboard

Executive Summary

Date: 2024-01-15 | Analysis Period: Q4 2023 | Business Impact: $2.3M identified savings

Key Findings

  1. 30-Day Retention Rate: 68.4% (+12.7% YoY improvement)
  2. High-Value Segment: Customers with โ‰ฅ3 support interactions show 89% retention
  3. At-Risk Indicator: 45% drop in feature usage predicts 80% churn probability
  4. Opportunity: Personalized onboarding could increase retention by 22%

Interactive Dashboard Components

1. Retention Cohort Analysis

-- Materialized View: monthly_cohort_retention WITH first_purchases AS ( SELECT customer_id, DATE_TRUNC('month', MIN(order_date)) AS cohort_month, SUM(order_amount) AS initial_lifetime_value FROM orders GROUP BY 1 ), monthly_activity AS ( SELECT customer_id, DATE_TRUNC('month', order_date) AS activity_month, COUNT(DISTINCT order_id) AS monthly_orders FROM orders GROUP BY 1, 2 ) SELECT fc.cohort_month, ma.activity_month, EXTRACT(MONTH FROM AGE(ma.activity_month, fc.cohort_month)) AS months_since_cohort, COUNT(DISTINCT fc.customer_id) AS cohort_size, COUNT(DISTINCT ma.customer_id) AS retained_customers, ROUND(COUNT(DISTINCT ma.customer_id) * 100.0 / COUNT(DISTINCT fc.customer_id), 1) AS retention_rate FROM first_purchases fc LEFT JOIN monthly_activity ma ON fc.customer_id = ma.customer_id GROUP BY 1, 2, 3 ORDER BY 1, 3;

2. Churn Prediction Model

# Python script for churn prediction import pandas as pd from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split # Feature engineering features = [ 'days_since_last_login', 'support_tickets_30d', 'feature_usage_decline_rate', 'payment_method_age', 'avg_session_duration_change' ] # Model training X_train, X_test, y_train, y_test = train_test_split( df[features], df['churned_next_month'], test_size=0.2 ) model = RandomForestClassifier(n_estimators=100, random_state=42) model.fit(X_train, y_train) # Feature importance importance_df = pd.DataFrame({ 'feature': features, 'importance': model.feature_importances_ }).sort_values('importance', ascending=False)

3. Tableau Dashboard Metrics

| Metric | Current Value | Target | Status | |--------|---------------|--------|--------| | Monthly Active Users | 245,892 | 250,000 | ๐ŸŸก 98.4% | | Customer Lifetime Value | $1,245 | $1,300 | ๐ŸŸก 95.8% | | Churn Rate | 4.2% | 3.5% | ๐Ÿ”ด Needs Attention | | NPS Score | 42 | 45 | ๐ŸŸก 93.3% |

Recommendations

  1. Immediate Action: Implement automated win-back campaign for 8,452 at-risk customers
  2. Q1 Initiative: Develop personalized onboarding flow (estimated 22% retention lift)
  3. Monitoring: Set up real-time alert for feature usage decline >40%
  4. ROI: $650K estimated annual savings from reduced churn

Technical Implementation

  • Data Pipeline: dbt transformations run hourly
  • Dashboard Refresh: 6 AM daily via Tableau Server
  • Alerting: Slack notifications for churn probability >75%
  • Storage: Snowflake with 30-day data retention

Last Updated: 2024-01-15 14:30 UTC | Next Review: 2024-01-22 Contact: analytics-team@company.com | Dashboard ID: RET-2024-Q1-001

Target Audience

Data analysts, business intelligence professionals, and data teams needing structured workflows for data analysis and dashboard development

7/10Security

Low security risk, safe to use

9
Clarity
9
Practicality
9
Quality
8
Maintainability
7
Innovation
Data
data-analysisbusiness-intelligencesqldashboarddata-visualization
Compatible Agents
Claude Code
Claude Code
~/.claude/skills/
Codex CLI
Codex CLI
~/.codex/skills/
Gemini CLI
Gemini CLI
~/.gemini/skills/
O
OpenCode
~/.opencode/skills/
O
OpenClaw
~/.openclaw/skills/
GitHub Copilot
GitHub Copilot
~/.copilot/skills/
Cursor
Cursor
~/.cursor/skills/
W
Windsurf
~/.codeium/windsurf/skills/
C
Cline
~/.cline/skills/
R
Roo Code
~/.roo/skills/
K
Kiro
~/.kiro/skills/
J
Junie
~/.junie/skills/
A
Augment Code
~/.augment/skills/
W
Warp
~/.warp/skills/
G
Goose
~/.config/goose/skills/
SKILL.md

Data Analyst Skill ๐Ÿ“Š

Turn your AI agent into a data analysis powerhouse.

Query databases, analyze spreadsheets, create visualizations, and generate insights that drive decisions.


What This Skill Does

โœ… SQL Queries โ€” Write and execute queries against databases โœ… Spreadsheet Analysis โ€” Process CSV, Excel, Google Sheets data โœ… Data Visualization โ€” Create charts, graphs, and dashboards โœ… Report Generation โ€” Automated reports with insights โœ… Data Cleaning โ€” Handle missing data, outliers, formatting โœ… Statistical Analysis โ€” Descriptive stats, trends, correlations


Quick Start

  1. Configure your data sources in TOOLS.md:
### Data Sources
- Primary DB: [Connection string or description]
- Spreadsheets: [Google Sheets URL / local path]
- Data warehouse: [BigQuery/Snowflake/etc.]
  1. Set up your workspace:
./scripts/data-init.sh
  1. Start analyzing!

SQL Query Patterns

Common Query Templates

Basic Data Exploration

-- Row count
SELECT COUNT(*) FROM table_name;

-- Sample data
SELECT * FROM table_name LIMIT 10;

-- Column statistics
SELECT 
    column_name,
    COUNT(*) as count,
    COUNT(DISTINCT column_name) as unique_values,
    MIN(column_name) as min_val,
    MAX(column_name) as max_val
FROM table_name
GROUP BY column_name;

Time-Based Analysis

-- Daily aggregation
SELECT 
    DATE(created_at) as date,
    COUNT(*) as daily_count,
    SUM(amount) as daily_total
FROM transactions
GROUP BY DATE(created_at)
ORDER BY date DESC;

-- Month-over-month comparison
SELECT 
    DATE_TRUNC('month', created_at) as month,
    COUNT(*) as count,
    LAG(COUNT(*)) OVER (ORDER BY DATE_TRUNC('month', created_at)) as prev_month,
    (COUNT(*) - LAG(COUNT(*)) OVER (ORDER BY DATE_TRUNC('month', created_at))) / 
        NULLIF(LAG(COUNT(*)) OVER (ORDER BY DATE_TRUNC('month', created_at)), 0) * 100 as growth_pct
FROM transactions
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY month;

Cohort Analysis

-- User cohort by signup month
SELECT 
    DATE_TRUNC('month', u.created_at) as cohort_month,
    DATE_TRUNC('month', o.created_at) as activity_month,
    COUNT(DISTINCT u.id) as users
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY cohort_month, activity_month
ORDER BY cohort_month, activity_month;

Funnel Analysis

-- Conversion funnel
WITH funnel AS (
    SELECT
        COUNT(DISTINCT CASE WHEN event = 'page_view' THEN user_id END) as views,
        COUNT(DISTINCT CASE WHEN event = 'signup' THEN user_id END) as signups,
        COUNT(DISTINCT CASE WHEN event = 'purchase' THEN user_id END) as purchases
    FROM events
    WHERE date >= CURRENT_DATE - INTERVAL '30 days'
)
SELECT 
    views,
    signups,
    ROUND(signups * 100.0 / NULLIF(views, 0), 2) as signup_rate,
    purchases,
    ROUND(purchases * 100.0 / NULLIF(signups, 0), 2) as purchase_rate
FROM funnel;

Data Cleaning

Common Data Quality Issues

IssueDetectionSolution
Missing valuesIS NULL or empty stringImpute, drop, or flag
DuplicatesGROUP BY with HAVING COUNT(*) > 1Deduplicate with rules
OutliersZ-score > 3 or IQR methodInvestigate, cap, or exclude
Inconsistent formatsSample and pattern matchStandardize with transforms
Invalid valuesRange checks, referential integrityValidate and correct

Data Cleaning SQL Patterns

-- Find duplicates
SELECT email, COUNT(*)
FROM users
GROUP BY email
HAVING COUNT(*) > 1;

-- Find nulls
SELECT 
    COUNT(*) as total,
    SUM(CASE WHEN email IS NULL THEN 1 ELSE 0 END) as null_emails,
    SUM(CASE WHEN name IS NULL THEN 1 ELSE 0 END) as null_names
FROM users;

-- Standardize text
UPDATE products
SET category = LOWER(TRIM(category));

-- Remove outliers (IQR method)
WITH stats AS (
    SELECT 
        PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY value) as q1,
        PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY value) as q3
    FROM data
)
SELECT * FROM data, stats
WHERE value BETWEEN q1 - 1.5*(q3-q1) AND q3 + 1.5*(q3-q1);

Data Cleaning Checklist

# Data Quality Audit: [Dataset]

## Row-Level Checks
- [ ] Total row count: [X]
- [ ] Duplicate rows: [X]
- [ ] Rows with any null: [X]

## Column-Level Checks
| Column | Type | Nulls | Unique | Min | Max | Issues |
|--------|------|-------|--------|-----|-----|--------|
| [col] | [type] | [n] | [n] | [v] | [v] | [notes] |

## Data Lineage
- Source: [Where data came from]
- Last updated: [Date]
- Known issues: [List]

## Cleaning Actions Taken
1. [Action and reason]
2. [Action and reason]

Spreadsheet Analysis

CSV/Excel Processing with Python

import pandas as pd

# Load data
df = pd.read_csv('data.csv')  # or pd.read_excel('data.xlsx')

# Basic exploration
print(df.shape)  # (rows, columns)
print(df.info())  # Column types and nulls
print(df.describe())  # Numeric statistics

# Data cleaning
df = df.drop_duplicates()
df['date'] = pd.to_datetime(df['date'])
df['amount'] = df['amount'].fillna(0)

# Analysis
summary = df.groupby('category').agg({
    'amount': ['sum', 'mean', 'count'],
    'quantity': 'sum'
}).round(2)

# Export
summary.to_csv('analysis_output.csv')

Common Pandas Operations

# Filtering
filtered = df[df['status'] == 'active']
filtered = df[df['amount'] > 1000]
filtered = df[df['date'].between('2024-01-01', '2024-12-31')]

# Aggregation
by_category = df.groupby('category')['amount'].sum()
pivot = df.pivot_table(values='amount', index='month', columns='category', aggfunc='sum')

# Window functions
df['running_total'] = df['amount'].cumsum()
df['pct_change'] = df['amount'].pct_change()
df['rolling_avg'] = df['amount'].rolling(window=7).mean()

# Merging
merged = pd.merge(df1, df2, on='id', how='left')

Data Visualization

Chart Selection Guide

Data TypeBest ChartUse When
Trend over timeLine chartShowing patterns/changes over time
Category comparisonBar chartComparing discrete categories
Part of wholePie/DonutShowing proportions (โ‰ค5 categories)
DistributionHistogramUnderstanding data spread
CorrelationScatter plotRelationship between two variables
Many categoriesHorizontal barRanking or comparing many items
GeographicMapLocation-based data

Python Visualization with Matplotlib/Seaborn

import matplotlib.pyplot as plt
import seaborn as sns

# Set style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("husl")

# Line chart (trends)
plt.figure(figsize=(10, 6))
plt.plot(df['date'], df['value'], marker='o')
plt.title('Trend Over Time')
plt.xlabel('Date')
plt.ylabel('Value')
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('trend.png', dpi=150)

# Bar chart (comparisons)
plt.figure(figsize=(10, 6))
sns.barplot(data=df, x='category', y='amount')
plt.title('Amount by Category')
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('comparison.png', dpi=150)

# Heatmap (correlations)
plt.figure(figsize=(10, 8))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm', center=0)
plt.title('Correlation Matrix')
plt.tight_layout()
plt.savefig('correlation.png', dpi=150)

ASCII Charts (Quick Terminal Visualization)

When you can't generate images, use ASCII:

Revenue by Month (in $K)
========================
Jan: โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 160
Feb: โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 180
Mar: โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 240
Apr: โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 220
May: โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 260
Jun: โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 280

Report Generation

Standard Report Template

# [Report Name]
**Period:** [Date range]
**Generated:** [Date]
**Author:** [Agent/Human]

## Executive Summary
[2-3 sentences with key findings]

## Key Metrics

| Metric | Current | Previous | Change |
|--------|---------|----------|--------|
| [Metric] | [Value] | [Value] | [+/-X%] |

## Detailed Analysis

### [Section 1]
[Analysis with supporting data]

### [Section 2]
[Analysis with supporting data]

## Visualizations
[Insert charts]

## Insights
1. **[Insight]**: [Supporting evidence]
2. **[Insight]**: [Supporting evidence]

## Recommendations
1. [Actionable recommendation]
2. [Actionable recommendation]

## Methodology
- Data source: [Source]
- Date range: [Range]
- Filters applied: [Filters]
- Known limitations: [Limitations]

## Appendix
[Supporting data tables]

Automated Report Script

#!/bin/bash
# generate-report.sh

# Pull latest data
python scripts/extract_data.py --output data/latest.csv

# Run analysis
python scripts/analyze.py --input data/latest.csv --output reports/

# Generate report
python scripts/format_report.py --template weekly --output reports/weekly-$(date +%Y-%m-%d).md

echo "Report generated: reports/weekly-$(date +%Y-%m-%d).md"

Statistical Analysis

Descriptive Statistics

StatisticWhat It Tells YouUse Case
MeanAverage valueCentral tendency
MedianMiddle valueRobust to outliers
ModeMost commonCategorical data
Std DevSpread around meanVariability
Min/MaxRangeData boundaries
PercentilesDistribution shapeBenchmarking

Quick Stats with Python

# Full descriptive statistics
stats = df['amount'].describe()
print(stats)

# Additional stats
print(f"Median: {df['amount'].median()}")
print(f"Mode: {df['amount'].mode()[0]}")
print(f"Skewness: {df['amount'].skew()}")
print(f"Kurtosis: {df['amount'].kurtosis()}")

# Correlation
correlation = df['sales'].corr(df['marketing_spend'])
print(f"Correlation: {correlation:.3f}")

Statistical Tests Quick Reference

TestUse CasePython
T-testCompare two meansscipy.stats.ttest_ind(a, b)
Chi-squareCategorical independencescipy.stats.chi2_contingency(table)
ANOVACompare 3+ meansscipy.stats.f_oneway(a, b, c)
PearsonLinear correlationscipy.stats.pearsonr(x, y)

Analysis Workflow

Standard Analysis Process

  1. Define the Question

    • What are we trying to answer?
    • What decisions will this inform?
  2. Understand the Data

    • What data is available?
    • What's the structure and quality?
  3. Clean and Prepare

    • Handle missing values
    • Fix data types
    • Remove duplicates
  4. Explore

    • Descriptive statistics
    • Initial visualizations
    • Identify patterns
  5. Analyze

    • Deep dive into findings
    • Statistical tests if needed
    • Validate hypotheses
  6. Communicate

    • Clear visualizations
    • Actionable insights
    • Recommendations

Analysis Request Template

# Analysis Request

## Question
[What are we trying to answer?]

## Context
[Why does this matter? What decision will it inform?]

## Data Available
- [Dataset 1]: [Description]
- [Dataset 2]: [Description]

## Expected Output
- [Deliverable 1]
- [Deliverable 2]

## Timeline
[When is this needed?]

## Notes
[Any constraints or considerations]

Scripts

data-init.sh

Initialize your data analysis workspace.

query.sh

Quick SQL query execution.

# Run query from file
./scripts/query.sh --file queries/daily-report.sql

# Run inline query
./scripts/query.sh "SELECT COUNT(*) FROM users"

# Save output to file
./scripts/query.sh --file queries/export.sql --output data/export.csv

analyze.py

Python analysis toolkit.

# Basic analysis
python scripts/analyze.py --input data/sales.csv

# With specific analysis type
python scripts/analyze.py --input data/sales.csv --type cohort

# Generate report
python scripts/analyze.py --input data/sales.csv --report weekly

Integration Tips

With Other Skills

SkillIntegration
MarketingAnalyze campaign performance, content metrics
SalesPipeline analytics, conversion analysis
Business DevMarket research data, competitor analysis

Common Data Sources

  • Databases: PostgreSQL, MySQL, SQLite
  • Warehouses: BigQuery, Snowflake, Redshift
  • Spreadsheets: Google Sheets, Excel, CSV
  • APIs: REST endpoints, GraphQL
  • Files: JSON, Parquet, XML

Best Practices

  1. Start with the question โ€” Know what you're trying to answer
  2. Validate your data โ€” Garbage in = garbage out
  3. Document everything โ€” Queries, assumptions, decisions
  4. Visualize appropriately โ€” Right chart for right data
  5. Show your work โ€” Methodology matters
  6. Lead with insights โ€” Not just data dumps
  7. Make it actionable โ€” "So what?" โ†’ "Now what?"
  8. Version your queries โ€” Track changes over time

Common Mistakes

โŒ Confirmation bias โ€” Looking for data to support a conclusion โŒ Correlation โ‰  causation โ€” Be careful with claims โŒ Cherry-picking โ€” Using only favorable data โŒ Ignoring outliers โ€” Investigate before removing โŒ Over-complicating โ€” Simple analysis often wins โŒ No context โ€” Numbers without comparison are meaningless


License

License: MIT โ€” use freely, modify, distribute.


"The goal is to turn data into information, and information into insight." โ€” Carly Fiorina

Source: https://github.com/zenobi-us/dotfiles#ai~files~skills~experts~data-ai~data-analyst

Content curated from original sources, copyright belongs to authors

Grade A
8.5AI Score
Best Practices
Checking...
Try this Skill

User Rating

USER RATING

0UP
0DOWN
Loading files...

WORKS WITH

Claude Code
Claude
Codex CLI
Codex
Gemini CLI
Gemini
O
OpenCode
O
OpenClaw
GitHub Copilot
Copilot
Cursor
Cursor
W
Windsurf
C
Cline
R
Roo
K
Kiro
J
Junie
A
Augment
W
Warp
G
Goose