Source: docs/guides/README_DATA_EXPLORATION.md

AWS Redshift Data Exploration Setup

This guide helps you set up comprehensive AWS Redshift data exploration using MCP (Model Context Protocol) tools, interactive chat interfaces, and Jupyter notebooks.

🚀 Quick Start

1. Test Your Connection

python client.py --connect-redshift

2. Start Interactive Exploration

python client.py --data-chat

3. Open Jupyter Notebook

python client.py --notebook

📋 What's Included

Core Components

Component	File	Description
MCP Tools	`apps/backend/tools/aws_redshift_mcp_tool.py`	Core Redshift integration tools
Interactive Chat	`enhanced_mcp_chat_repl.py`	Command-line data exploration interface
Jupyter Notebook	`data_exploration_redshift.ipynb`	Advanced analysis and visualization
CLI Integration	Enhanced `client.py`	Integrated command-line access
Test Suite	`apps/backend/test_aws_redshift_mcp.py`	Comprehensive testing framework

Available Tools

Connection Management: Test and verify database connectivity
Schema Discovery: Explore databases, schemas, and table structures
Data Analysis: Analyze table statistics and column distributions
Query Execution: Run SQL queries with performance monitoring
Data Export: Export query results to pandas/CSV for analysis
Performance Testing: Monitor and optimize query performance

🔧 Setup Instructions

Prerequisites

AWS Redshift Access: Cluster endpoint, database credentials
Python Environment: Python 3.8+ with required packages
Network Access: Firewall/security group access to Redshift

Step 1: Configure Environment

Create or update your .env file:

# Required AWS Redshift Settings
REDSHIFT_HOST=your-cluster.region.redshift.amazonaws.com
REDSHIFT_PORT=5439
REDSHIFT_DATABASE=your_database
REDSHIFT_USER=your_username
REDSHIFT_PASSWORD=your_password
REDSHIFT_SSL_MODE=require
 
# Optional Settings
AWS_ACCESS_KEY_ID=your-access-key
AWS_SECRET_ACCESS_KEY=your-secret-key
AWS_REGION=us-east-1
AWS_S3_BUCKET=your-s3-bucket-for-exports

Step 2: Install Dependencies

# Core backend dependencies
pip install -r apps/backend/requirements.txt
 
# Optional: Jupyter and visualization tools
pip install jupyter jupyterlab plotly seaborn matplotlib dash

Step 3: Test Your Setup

# Test the MCP tools
python apps/backend/test_aws_redshift_mcp.py
 
# Test connection via CLI
python client.py --connect-redshift

💻 Usage Examples

Interactive Data Chat

Start the enhanced chat interface:

python client.py --data-chat

Example session:

🏒 [public] > connect
✅ Connection successful!

🏒 [public] > schemas
📁 Found 3 schemas: public (12 tables), analytics (8 tables), staging (5 tables)

🏒 [public] > tables analytics
📋 Tables in 'analytics' schema:
   📄 user_events (15 columns)
   📄 revenue_daily (6 columns)

🏒 [public] > query SELECT COUNT(*) FROM analytics.user_events WHERE date >= CURRENT_DATE - 7
✅ Query successful - 1 rows returned
📊 Results:
count
2450000

🏒 [public] > export SELECT user_id, event_type, created_at FROM analytics.user_events WHERE date >= CURRENT_DATE - 1 LIMIT 10000
✅ Exported 10000 rows to export_1640995200.csv

Guided Workflow

Run the guided exploration workflow:

python client.py --explore-data

This will:

Test your database connection
Discover available schemas and tables
Provide next steps for exploration

Jupyter Notebook Analysis

Open the comprehensive analysis notebook:

python client.py --notebook

The notebook includes:

Connection testing and setup
Schema and table discovery
Data distribution analysis
Query performance testing
Pandas integration for advanced analysis
Visualization examples with plotly and matplotlib

🎯 Use Cases

1. Data Quality Assessment

# Start data chat
python client.py --data-chat
 
# Check data quality
> describe users
> stats users
> analyze users email
> query SELECT COUNT(DISTINCT email), COUNT(*) FROM users

2. Performance Monitoring

# Test query performance
> performance SELECT COUNT(*) FROM large_table
> performance SELECT * FROM large_table WHERE indexed_column = 'value'
 
# Explain query execution
> query EXPLAIN SELECT * FROM table1 JOIN table2 ON table1.id = table2.user_id

3. Data Export and Analysis

# Export data for analysis
> export SELECT * FROM users WHERE signup_date >= '2024-01-01' LIMIT 5000
 
# Export aggregated metrics
> export SELECT DATE_TRUNC('day', created_at) as date, COUNT(*) as signups FROM users GROUP BY date ORDER BY date

4. Schema Discovery

# Explore database structure
> schemas
> tables public
> tables analytics
> describe important_table
> stats important_table

🏗️ Architecture

MCP Tool Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Client Applications                      │
├─────────────────────────────────────────────────────────────┤
│  Enhanced Chat REPL  │  Jupyter Notebook  │  CLI Commands   │
├─────────────────────────────────────────────────────────────┤
│                     MCP Tool Layer                         │
├─────────────────────────────────────────────────────────────┤
│           AWS Redshift MCP Tool Functions                  │
│  • redshift_test_connection                                 │
│  • redshift_execute_query                                   │
│  • redshift_get_schema_info                                 │
│  • redshift_get_table_stats                                 │
│  • redshift_analyze_data_distribution                       │
│  • redshift_get_query_performance                           │
│  • redshift_export_to_pandas                               │
├─────────────────────────────────────────────────────────────┤
│              Database Connection Layer                      │
├─────────────────────────────────────────────────────────────┤
│     SQLAlchemy + psycopg2 + Connection Pooling            │
├─────────────────────────────────────────────────────────────┤
│                   AWS Redshift                             │
└─────────────────────────────────────────────────────────────┘

Data Flow

User Input: Commands via chat, CLI, or notebook
MCP Tool Processing: Route to appropriate Redshift tool
SQL Execution: Connect to Redshift, execute queries
Result Processing: Format, analyze, and return results
User Output: Display results in chat, notebook, or export files

🛠️ Customization

Adding Custom Analysis

Extend the MCP tools by adding functions to aws_redshift_mcp_tool.py:

async def custom_analysis_function(table_name: str, analysis_type: str) -> Dict[str, Any]:
    """Custom analysis function"""
    # Your custom logic here
    query = f"SELECT custom_analysis({analysis_type}) FROM {table_name}"
    return await redshift_execute_query(query)

Custom Chat Commands

Add commands to the enhanced chat REPL in enhanced_mcp_chat_repl.py:

# Add to __init__ method commands dictionary
self.commands['custom'] = self.custom_command
 
async def custom_command(self, args: str = ""):
    """Your custom command"""
    # Implementation here
    pass

Notebook Extensions

Customize the Jupyter notebook by:

Adding new analysis cells
Importing additional visualization libraries
Creating custom data processing functions

🔍 Troubleshooting

Common Issues

Issue	Symptoms	Solution
Connection Failed	"Connection failed" error	Check `.env` credentials, security groups
Import Errors	ModuleNotFoundError	Run `pip install -r apps/backend/requirements.txt`
SSL Issues	SSL/TLS errors	Try `REDSHIFT_SSL_MODE=prefer` or `disable`
Timeout Errors	Connection timeout	Check network connectivity, firewall rules
Permission Denied	Access denied errors	Verify database user permissions

Debug Commands

# Test environment variables
python -c "import os; print('Host:', os.getenv('REDSHIFT_HOST'))"
 
# Run comprehensive tests
python apps/backend/test_aws_redshift_mcp.py
 
# Enable debug mode
python client.py --data-chat --verbose
 
# Test basic connectivity
python -c "import psycopg2; print('psycopg2 available')"

Log Analysis

Enable debug logging by setting:

import logging
logging.basicConfig(level=logging.DEBUG)

📈 Performance Optimization

Query Optimization

Use LIMIT: Always limit large result sets
Index Usage: Query on indexed columns when possible
Column Selection: Use specific columns instead of SELECT *
WHERE Clauses: Filter data early in queries

Connection Optimization

Connection Pooling: Automatic in MCP tools
SSL Overhead: Adjust SSL mode based on security needs
Query Batching: Group related queries together

Memory Management

Result Streaming: Large results are processed efficiently
Export Limits: Use reasonable limits for data exports
Cache Usage: Leverage cached results in chat interface

🚀 Advanced Features

Integration with Other Systems

The MCP tools can be integrated with:

BI Tools: Export data for Tableau, PowerBI, etc.
Machine Learning: Export to pandas for ML workflows
APIs: Build REST endpoints using the MCP tools
Automation: Schedule data exports and analysis

Scaling for Teams

MCP Server Deployment: Deploy for team access
Shared Notebooks: Version control Jupyter notebooks
Configuration Management: Centralize environment configs
Access Control: Implement user-based access controls

📚 Resources

Documentation

Data Discovery Commands: data_discovery_commands.md
MCP Tool Reference: apps/backend/tools/aws_redshift_mcp_tool.py
Environment Config: .env.sample

Examples

Test Suite: apps/backend/test_aws_redshift_mcp.py
Chat Interface: enhanced_mcp_chat_repl.py
Jupyter Notebook: data_exploration_redshift.ipynb

Support

For additional help:

Review the troubleshooting section
Check existing issues in the repository
Run the test suite to validate your setup
Enable debug mode for detailed logging

Ready to explore your data? Start with python client.py --explore-data and follow the guided workflow!

Automated Email & Reporting System League Pipeline Morning Report System

AWS Redshift Data Exploration Setup

🚀 Quick Start

1. Test Your Connection

2. Start Interactive Exploration

3. Open Jupyter Notebook

📋 What's Included

Core Components

Available Tools

🔧 Setup Instructions

Prerequisites

Step 1: Configure Environment

Step 2: Install Dependencies

Step 3: Test Your Setup

💻 Usage Examples

Interactive Data Chat

Guided Workflow

Jupyter Notebook Analysis

🎯 Use Cases

1. Data Quality Assessment

2. Performance Monitoring

3. Data Export and Analysis

4. Schema Discovery

🏗️ Architecture

MCP Tool Architecture

Data Flow

🛠️ Customization

Adding Custom Analysis

Custom Chat Commands

Notebook Extensions

🔍 Troubleshooting

Common Issues

Debug Commands

Log Analysis

📈 Performance Optimization

Query Optimization

Connection Optimization

Memory Management

🚀 Advanced Features

Integration with Other Systems

Scaling for Teams

📚 Resources

Documentation

Examples

Support

Platform

Documentation

Community

Support