Pair Your Technical Library with Live Data Using MCP

When Your Books and Data Start Talking to One Other…

I’ve been collecting technical books for for some time now. PDFs on Tableau, Python, SQL, data science, machine learning—you name it. They sit in organized folders on my hard drive, searchable only by filename or if I remember which book had that one section about LOD calculations. There’s a good chance that the information will be outdated by the time I can get through it all.

Meanwhile, I’m constantly working with live data in Tableau Cloud. Creating dashboards, writing calculations, analyzing datasets. Two separate worlds that never really connected.

What if they could?

The Spark

I came across a post by Brian Julius demonstrating NotebookLM for querying Power BI documentation with natural language. Brilliant concept—but I wanted something more specific. I needed answers directly applicable to Tableau development, not just general knowledge retrieval.

I considered building a custom GPT with my documentation uploaded. But that raised some concerns:

Platform dependency – What happens when OpenAI (or LLM of your choice) updates their model or changes policies?
Limited control – Can’t customize the search algorithm or embedding approach
Black box – Inability to determine how content is indexed or retrieved
Cost – Potential future pricing changes

I wanted more control over the entire stack. If I’m going to invest time building a customized knowledge base, I want to own it, understand it, and be able to maintain it regardless of what any platform decides to do.

While I continued to explore Model Context Protocol (MCP), and realized that I might be able to build exactly what I needed: a dual-MCP system where one server searches across my entire technical library, and another queries live Tableau data—both accessible through Claude Desktop in a single conversation.

The result? I can now ask questions like:

“What do my Tableau books say about LOD calculation best practices, then show me examples using fields from my Sample-Superstore data?”

Or:

“According to my Python books, what’s the best way to handle missing data, and do I have any NULL values in my Sales field?”

Theory meets practice… Books meet data… And it actually works!

🎯 The Use Case

Before diving into the build process, let’s consider why this is significant.

As data developers, we constantly switch between two modes:

Learning mode – Reading documentation, books, articles to understand concepts
Building mode – Applying those concepts to real data in Tableau

The problem? These modes are disconnected. You read about a technique in a book, then separately try to implement it. If you’re like me, you forget where you saw that perfect example. You can’t quickly reference multiple sources while actively working with your data.

This project bridges that gap. Instead of:

Opening PDFs manually to search for examples
Switching between books and Tableau
Trying to remember which book explained which concept

You get:

Natural language search across your entire library
Live data queries in the same conversation
Immediate application of theory to your actual datasets

For someone building Tableau dashboards, creating content, or learning new techniques—this is transformative.

🏗️ The Architecture

The system uses two MCP (Model Context Protocol) servers working together:

Server 1: Technical Knowledge Base

Indexes your PDF library using ChromaDB
Creates searchable text chunks with embeddings
Returns relevant content based on semantic search
Built with Python, sentence-transformers, PyPDF2

Server 2: Tableau Cloud

Connects to your Tableau environment via REST API
Queries datasources, metadata, and fields
Provides live data analysis
Uses the official @tableau/mcp-server

Both connect to Claude Desktop, letting you query them together in natural language.

⚠️ A Reality Check: It Wasn’t Exactly “Smooth Sailing”

Setting this up from scratch in a fresh Windows environment revealed six real issues: missing C++ build tools, NumPy 2.0 breaking compatibility, virtual environment activation quirks in PowerShell, version mismatches in dependencies, and Claude Desktop configuration challenges. Each one stopped progress completely until solved. The full setup took about 45 minutes once I knew what I was doing, but getting there required troubleshooting each roadblock. Every issue and solution is documented in the 🔧 Issues & Solutions section below to aid you if you’re planning to build this yourself.

📚 Building the Knowledge Base

I decided to build everything from scratch in a new production folder to properly document the setup process. This allowed me to start with a clean Windows environment, fresh virtual environment, and real-world testing conditions.

Setting Up Dependencies

After creating my virtual environment and installing the base requirements, I hit the first major blocker—ChromaDB wouldn’t compile (see Issue #2 below). Once I installed the Microsoft C++ Build Tools and restarted, the dependencies fell into place. Well, mostly. NumPy 2.0 caused compatibility issues (Issue #3), and I needed to upgrade sentence-transformers to work with the newer huggingface_hub (Issue #4).

With dependencies finally working:

pip install chromadb==0.4.18 sentence-transformers>=2.7.0 PyPDF2==3.0.1 python-dotenv==1.0.0 "numpy<2.0" mcp>=0.1.0

pip install chromadb==0.4.18 sentence-transformers>=2.7.0 PyPDF2==3.0.1 python-dotenv==1.0.0 "numpy<2.0" mcp>=0.1.0

Indexing 110 PDFs

I had organized my PDFs in topic subfolders:

The indexing script needed to search recursively through these subfolders (Issue #6), and handle problematic PDFs gracefully (Issue #7). With those fixes in place, I pointed it at my library:

python scripts\index_books.py --pdf-dir "C:\Users\username\tableau-knowledge-mcp\books"

python scripts\index_books.py --pdf-dir "C:\Users\username\tableau-knowledge-mcp\books"

The Results:

Loading embedding model: sentence-transformers/all-MiniLM-L6-v2
Initializing ChromaDB at: ./chroma_db
Found 110 PDF files

[1/110] Processing: Learning_Tableau.pdf
  ✓ Indexed 856 chunks
[2/110] Processing: Python_for_Data_Analysis.pdf
  ✓ Indexed 1,243 chunks
...

✅ Indexing complete!
📊 Successfully indexed: 95/110 books
📊 Total chunks: 75,003
📊 Average chunks per book: 789

Loading embedding model: sentence-transformers/all-MiniLM-L6-v2
Initializing ChromaDB at: ./chroma_db
Found 110 PDF files

[1/110] Processing: Learning_Tableau.pdf
  ✓ Indexed 856 chunks
[2/110] Processing: Python_for_Data_Analysis.pdf
  ✓ Indexed 1,243 chunks
...

✅ Indexing complete!
📊 Successfully indexed: 95/110 books
📊 Total chunks: 75,003
📊 Average chunks per book: 789

Time: ~23 minutes for 110 PDFs

Success rate: 86% (15 files failed—scanned PDFs, encrypted files, or corrupted downloads)

Believe it or not, over 75% is actually an excellent outcome. The 15 failures were expected and the script handled them quite well without crashing.

🖥️ Configuring Claude Desktop

With the knowledge base built, I needed to connect both MCP servers to Claude Desktop.

C:\Users\{username}\AppData\Roaming\Claude\claude_desktop_config.json

C:\Users\{username}\AppData\Roaming\Claude\claude_desktop_config.json

$Windows Explorer showing the claude_desktop_config.json file in the AppData\Roaming\Claude directory$

I set up both servers with the proper configuration (see the detailed config in Issue #8 below). The critical piece I discovered through troubleshooting: you need to use absolute paths with double backslashes on Windows, and include an env section with CHROMA_DB_PATH as an absolute path. Without that, the knowledge base server would disconnect immediately.

After restarting Claude Desktop completely and waiting 10 seconds, both servers showed as connected.

Image: Both MCP servers showing as “running” in Claude Desktop

✅ Testing the Magic

With both servers running, I could finally test what this thing could actually do.

Test 1: Knowledge Base Only

List all books in my technical knowledge base

📚 Used technical-knowledge-base integration

Here’s your complete library organized by topic — 95 books indexed and ready to search!

Result: Complete list of 95 indexed books organized by topic. Fast, accurate, working perfectly.

Test 2: Tableau Only

What datasources are available in my Tableau environment?

📊 Used tableau integration

Found your Tableau Cloud datasources with full metadata. The MCP server connection is working perfectly!

Result: Full list of datasources with metadata. The Tableau MCP server integration was solid.

Test 3: Both Together – The Real Power

Search my Tableau books for LOD calculations, then show me the fields available in Sample-Superstore

🔄 Orchestrated dual tasks for Tableau analysis simultaneously

Sure thing! I’ll search your technical library for LOD calculations and pull up the Superstore datasource fields at the same time!

📚 Excavated book results and pivoted toward datasource exploration

Got the LOD results! Now let me grab the Superstore datasource.

What Your Books Say About LOD Calculations:

All results came from the Tableau Cookbook for Experienced Professionals, which covers all three LOD keywords in depth:

FIXED — Calculates at a specific level of detail regardless of the view’s dimensions. Classic use case: {FIXED [Customer ID] : AVG([Sales])} gives you average sales per customer no matter what else is on the view.
INCLUDE — Expands granularity beyond what’s in the view by pulling in additional dimensions. Useful when you need detail from dimensions not displayed but still want to control aggregation.
EXCLUDE — Removes specified dimensions from the calculation, effectively raising the level of detail. Great for computing totals at a higher grain (e.g., country-level sales while Ship Mode and Region are in the view).

One important caveat your book flags: LODs increase query complexity, especially with large datasets or nested LODs, and they’re static — they don’t respond to filters unless you explicitly account for that inside the expression.

📊 Used tableau integration

Sample-Superstore Fields (DataDevQuest Project)

Dimensions (String): Row ID, Product ID, Category, Sub-Category, Segment, Ship Mode, Customer Name, Customer ID, Order ID, Product Name, City, State/Province, Postal Code, Region, Country/Region

Dates: Order Date, Ship Date, Quarter (calc: DATETRUNC('quarter', [Order Date]))

Measures: Sales, Profit, Discount, Quantity

LOD Calculations already in the datasource — and this is the fun part, you’ve already got some pre-built:

Total Sales: { SUM([Sales]) }
Total Profit: { SUM([Profit]) }
Total Units Sold: { SUM([Quantity]) }
Total Avg Discount: {AVG([Discount])}

You’ve got a solid foundation here — those existing FIXED LODs in Superstore are great examples to reference when building new ones. Want to try querying some data using those LODs, or build out a new LOD expression for a specific use case?

Result:

Relevant excerpts from my Tableau books explaining LOD calculation concepts and best practices
Complete field list from my Sample-Superstore datasource with data types
Natural synthesis combining the theoretical knowledge with my live data context

This is where it clicked. I could learn about a technique and immediately see how to apply it to my actual data—all in one conversation. No switching between applications. No trying to remember which book had that example. Just natural conversation that pulls from both knowledge sources.

💡 Real Use Cases

Learning + Application:

What are best practices for handling NULL values according to my data books, and do I have any in my Orders table?

Content Creation:

Find examples of LOD calculations in my Tableau books, then create demo queries using my Sample-Superstore data for a blog post

Troubleshooting:

My RANK() calculation isn’t working. What are common mistakes from my Tableau books, and what’s the actual calculation in my workbook?

Data Exploration:

What customer segmentation techniques are explained in my analytics books, and what segments exist in my customer data?

The system maintains context across the conversation, so you can ask follow-up questions that reference both sources naturally.

📊 By the Numbers

Setup time: ~45 minutes (including C++ Build Tools installation)
Indexing time: ~23 minutes for 110 PDFs
Books indexed: 95 out of 110 (86% success rate)
Total chunks: 75,003 searchable text segments
Average chunks per book: 789
Storage: ~450MB for ChromaDB vector database
Embedding model: sentence-transformers/all-MiniLM-L6-v2 (80MB)

🚀 Want to Build Your Own?

The complete code, setup instructions, and troubleshooting guide are available on GitHub.

What you’ll need:

Windows 10/11 (for this guide; adaptable to Mac/Linux)
Python 3.10+
Node.js 18+ (for Tableau MCP server)
Tableau Cloud account with personal access token
Claude Desktop (free download from Anthropic)
Your collection of PDF technical books
~1 hour for setup (following the guide)

If you’re planning to build this yourself, every Windows-specific gotcha is documented below. The setup takes about 45 minutes, but only if you know what you’re doing. These issues cost me several hours to solve, so hopefully this section saves you time.

**If you’re building this yourself, the solutions to potential roadblocks below can save you hours. Not planning to build? Feel free to skip to ‘Final Thoughts’ below.**

🔧 Appendix: Issues & Solutions (For Builders)

Here’s every real issue I encountered in a fresh Windows environment, with the exact solutions that worked.

Issue #1: PowerShell ExecutionPolicy Blocks Virtual Environment

The Problem:

.\venv\Scripts\Activate.ps1 cannot be loaded because running scripts 
is disabled on this system

.\venv\Scripts\Activate.ps1 cannot be loaded because running scripts 
is disabled on this system

Windows PowerShell’s default security policy prevents running any scripts, including Python’s virtual environment activation script.

The Fix:

Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser

Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser

This allows locally-created scripts (like Activate.ps1) to run without Administrator rights.

Issue #2: ChromaDB Requires C++ Build Tools

The Problem:

error: Microsoft Visual C++ 14.0 or greater is required
Building wheel for chroma-hnswlib (pyproject.toml) ... error

error: Microsoft Visual C++ 14.0 or greater is required
Building wheel for chroma-hnswlib (pyproject.toml) ... error

ChromaDB needs to compile C++ code on Windows. Fresh Windows installations don’t have the necessary build tools.

The Fix:

Download Microsoft C++ Build Tools: https://visualstudio.microsoft.com/visual-cpp-build-tools/
Run installer and select “Desktop development with C++”
Install (~6GB download, 10-15 minutes)
Restart your computer (actually restart, not just close windows)
Then pip install chromadb works

Time investment: ~15 minutes, but this is non-negotiable for Windows users.

Issue #3: NumPy 2.0 Breaks Compatibility

The Problem:

AttributeError: `np.float_` was removed in the NumPy 2.0 release. 
Use `np.float64` instead.

AttributeError: `np.float_` was removed in the NumPy 2.0 release. 
Use `np.float64` instead.

NumPy 2.0 (released in 2024) removed old type aliases that sentence-transformers and chromadb still reference.

The Fix:

pip install "numpy<2.0"

pip install "numpy<2.0"

This installs NumPy 1.26.x which maintains backward compatibility.

Why it matters: Without this constraint, you’ll get runtime errors during indexing even though everything installed successfully.

Issue #4: sentence-transformers Version Incompatibility

The Problem:

ImportError: cannot import name 'cached_download' from 'huggingface_hub'

ImportError: cannot import name 'cached_download' from 'huggingface_hub'

The older sentence-transformers version (2.2.2) is incompatible with newer huggingface_hub releases.

The Fix:

pip install --upgrade sentence-transformers

pip install --upgrade sentence-transformers

This upgrades to 2.7.0+ which works with current huggingface_hub versions.

Issue #5: Missing MCP Package

The Problem:

Error importing dependencies: No module named 'mcp'

Error importing dependencies: No module named 'mcp'

When installing packages individually to solve other issues, it’s easy to miss the core MCP server package.

The Fix:

pip install mcp

pip install mcp

Simple but critical, as the entire server won’t start without it.

Issue #6: Script Doesn’t Search Subfolders

The Problem:

Found 0 PDF files

Found 0 PDF files

The script only searched the immediate directory, not subfolders where books are organized by topic.

The Fix:

Update scripts/index_books.py:

# Change this:
pdf_files = list(self.pdf_dir.glob("*.pdf"))

# To this:
pdf_files = list(self.pdf_dir.glob("**/*.pdf"))

# Change this:
pdf_files = list(self.pdf_dir.glob("*.pdf"))

# To this:
pdf_files = list(self.pdf_dir.glob("**/*.pdf"))

The ** makes it search recursively through all subfolders.

Issue #7: Script Crashes on Problematic PDFs

The Problem:

TypeError: TextInputSequence must be str

TypeError: TextInputSequence must be str

Some PDFs (scanned, encrypted, corrupted) return None instead of text, causing ChromaDB’s embedding function to crash.

The Fix:

Add robust error handling to the indexing script:

# Validate text exists and is a string
if not text or not isinstance(text, str) or len(text.strip()) == 0:
    logger.warning(f"No valid text extracted from {pdf_path.name}")
    return ""

# Validate chunks before sending to ChromaDB
valid_batch = []
for c in batch:
    if (c.get('text') and 
        isinstance(c['text'], str) and 
        len(c['text'].strip()) > 0):
        valid_batch.append(c)

# Validate text exists and is a string
if not text or not isinstance(text, str) or len(text.strip()) == 0:
    logger.warning(f"No valid text extracted from {pdf_path.name}")
    return ""

# Validate chunks before sending to ChromaDB
valid_batch = []
for c in batch:
    if (c.get('text') and 
        isinstance(c['text'], str) and 
        len(c['text'].strip()) > 0):
        valid_batch.append(c)

Now the script skips problematic files and continues processing.

Issue #8: Knowledge Base Server Disconnects in Claude Desktop

The Problem:

Knowledge base server shows as disconnected in Claude Desktop while Tableau server works fine.

The Root Cause:

When Claude Desktop runs the Python script, the working directory isn’t the project folder, so relative paths like ./chroma_db don’t resolve correctly.

The Fix:

Add an env section with absolute paths to claude_desktop_config.json:

{
  "mcpServers": {
    "tableau": {
      "command": "npx",
      "args": ["-y", "@tableau/mcp-server@latest"],
      "env": {
        "SERVER": "https://your-site.online.tableau.com",
        "SITE_NAME": "your-site-name",
        "PAT_NAME": "your-token-name",
        "PAT_VALUE": "your-token-secret"
      }
    },
    "technical-knowledge-base": {
      "command": "C:\\Users\\{username}\\project\\venv\\Scripts\\python.exe",
      "args": [
        "C:\\Users\\{username}\\project\\src\\server.py"
      ],
      "env": {
        "CHROMA_DB_PATH": "C:\\Users\\{username}\\project\\chroma_db",
        "PDF_LIBRARY_PATH": "C:\\Users\\{username}\\books",
        "EMBEDDING_MODEL": "sentence-transformers/all-MiniLM-L6-v2",
        "MAX_SEARCH_RESULTS": "5"
      }
    }
  }
}

{
  "mcpServers": {
    "tableau": {
      "command": "npx",
      "args": ["-y", "@tableau/mcp-server@latest"],
      "env": {
        "SERVER": "https://your-site.online.tableau.com",
        "SITE_NAME": "your-site-name",
        "PAT_NAME": "your-token-name",
        "PAT_VALUE": "your-token-secret"
      }
    },
    "technical-knowledge-base": {
      "command": "C:\\Users\\{username}\\project\\venv\\Scripts\\python.exe",
      "args": [
        "C:\\Users\\{username}\\project\\src\\server.py"
      ],
      "env": {
        "CHROMA_DB_PATH": "C:\\Users\\{username}\\project\\chroma_db",
        "PDF_LIBRARY_PATH": "C:\\Users\\{username}\\books",
        "EMBEDDING_MODEL": "sentence-transformers/all-MiniLM-L6-v2",
        "MAX_SEARCH_RESULTS": "5"
      }
    }
  }
}

Critical details:

Use double backslashes \\ on Windows
Use absolute paths (not relative ./)
The env section with CHROMA_DB_PATH is essential

After updating the config, completely quit Claude Desktop (File → Quit), wait 10 seconds, then reopen. Both servers should connect successfully.

💭 Final Thoughts

This project turned into way more than I expected. What started as “let’s index some PDFs” became a deep dive into Windows development, managing dependencies, and setting up production-ready processes.

Fortunately, I was able to document each issue along the way. Every error is resolved. Every weird Windows quirk is explained. That’s the value of building in a fresh environment and documenting the journey.

The result is a system that can combine theoretical knowledge with live data. No more switching between PDF readers and Tableau. No more trying to remember which book explained which concept. Just natural conversation with your entire technical library and your live data together.

For me, this has changed how I learn new Tableau techniques and create content. I rely far less on a sea of bookmarked websites. I can research a topic across multiple books, then immediately test examples with real data—all in one place.

If you’re someone who collects technical books and works with data daily, this might be worth the setup time. The first time you ask a question that pulls from both your library and your live Tableau data, you’ll start picturing the possibilities. Think of the time saved! Imagine the accelerated learning curve!

Ready to try it? Check out the GitHub repository and let me know how your build goes. I’d love to hear how others use this approach and take it even further.

If you run into any issues during setup, be sure to check out the troubleshooting guide for additional guidance.

DATA[dev].diary

Pair Your Technical Library with Live Data Using MCP

When Your Books and Data Start Talking to One Other…

The Spark

🎯 The Use Case

🏗️ The Architecture

⚠️ A Reality Check: It Wasn’t Exactly “Smooth Sailing”

📚 Building the Knowledge Base

Setting Up Dependencies

Indexing 110 PDFs

The Results:

🖥️ Configuring Claude Desktop

✅ Testing the Magic

💡 Real Use Cases

📊 By the Numbers

🚀 Want to Build Your Own?

🔧 Appendix: Issues & Solutions (For Builders)

Issue #1: PowerShell ExecutionPolicy Blocks Virtual Environment

Issue #2: ChromaDB Requires C++ Build Tools

Issue #3: NumPy 2.0 Breaks Compatibility

Issue #4: sentence-transformers Version Incompatibility

Issue #5: Missing MCP Package

Issue #6: Script Doesn’t Search Subfolders

Issue #7: Script Crashes on Problematic PDFs

Issue #8: Knowledge Base Server Disconnects in Claude Desktop

💭 Final Thoughts

Like this: