When Your Books and Data Start Talking to One Other…
I’ve been collecting technical books for for some time now. PDFs on Tableau, Python, SQL, data science, machine learning—you name it. They sit in organized folders on my hard drive, searchable only by filename or if I remember which book had that one section about LOD calculations. There’s a good chance that the information will be outdated by the time I can get through it all.
Meanwhile, I’m constantly working with live data in Tableau Cloud. Creating dashboards, writing calculations, analyzing datasets. Two separate worlds that never really connected.
What if they could?
The Spark
I came across a post by Brian Julius demonstrating NotebookLM for querying Power BI documentation with natural language. Brilliant concept—but I wanted something more specific. I needed answers directly applicable to Tableau development, not just general knowledge retrieval.
I considered building a custom GPT with my documentation uploaded. But that raised some concerns:
- Platform dependency – What happens when OpenAI (or LLM of your choice) updates their model or changes policies?
- Limited control – Can’t customize the search algorithm or embedding approach
- Black box – Inability to determine how content is indexed or retrieved
- Cost – Potential future pricing changes
I wanted more control over the entire stack. If I’m going to invest time building a customized knowledge base, I want to own it, understand it, and be able to maintain it regardless of what any platform decides to do.
While I continued to explore Model Context Protocol (MCP), and realized that I might be able to build exactly what I needed: a dual-MCP system where one server searches across my entire technical library, and another queries live Tableau data—both accessible through Claude Desktop in a single conversation.

The result? I can now ask questions like:
“What do my Tableau books say about LOD calculation best practices, then show me examples using fields from my Sample-Superstore data?”
Or:
“According to my Python books, what’s the best way to handle missing data, and do I have any NULL values in my Sales field?”
Theory meets practice… Books meet data… And it actually works!
🎯 The Use Case
Before diving into the build process, let’s consider why this is significant.
As data developers, we constantly switch between two modes:
- Learning mode – Reading documentation, books, articles to understand concepts
- Building mode – Applying those concepts to real data in Tableau
The problem? These modes are disconnected. You read about a technique in a book, then separately try to implement it. If you’re like me, you forget where you saw that perfect example. You can’t quickly reference multiple sources while actively working with your data.
This project bridges that gap. Instead of:
- Opening PDFs manually to search for examples
- Switching between books and Tableau
- Trying to remember which book explained which concept
You get:
- Natural language search across your entire library
- Live data queries in the same conversation
- Immediate application of theory to your actual datasets
For someone building Tableau dashboards, creating content, or learning new techniques—this is transformative.
🏗️ The Architecture
The system uses two MCP (Model Context Protocol) servers working together:
Server 1: Technical Knowledge Base
- Indexes your PDF library using ChromaDB
- Creates searchable text chunks with embeddings
- Returns relevant content based on semantic search
- Built with Python, sentence-transformers, PyPDF2
Server 2: Tableau Cloud
- Connects to your Tableau environment via REST API
- Queries datasources, metadata, and fields
- Provides live data analysis
- Uses the official @tableau/mcp-server
Both connect to Claude Desktop, letting you query them together in natural language.
⚠️ A Reality Check: It Wasn’t Exactly “Smooth Sailing”
Setting this up from scratch in a fresh Windows environment revealed six real issues: missing C++ build tools, NumPy 2.0 breaking compatibility, virtual environment activation quirks in PowerShell, version mismatches in dependencies, and Claude Desktop configuration challenges. Each one stopped progress completely until solved. The full setup took about 45 minutes once I knew what I was doing, but getting there required troubleshooting each roadblock. Every issue and solution is documented in the 🔧 Issues & Solutions section below to aid you if you’re planning to build this yourself.
📚 Building the Knowledge Base
I decided to build everything from scratch in a new production folder to properly document the setup process. This allowed me to start with a clean Windows environment, fresh virtual environment, and real-world testing conditions.
Setting Up Dependencies
After creating my virtual environment and installing the base requirements, I hit the first major blocker—ChromaDB wouldn’t compile (see Issue #2 below). Once I installed the Microsoft C++ Build Tools and restarted, the dependencies fell into place. Well, mostly. NumPy 2.0 caused compatibility issues (Issue #3), and I needed to upgrade sentence-transformers to work with the newer huggingface_hub (Issue #4).
With dependencies finally working:
pip install chromadb==0.4.18 sentence-transformers>=2.7.0 PyPDF2==3.0.1 python-dotenv==1.0.0 "numpy<2.0" mcp>=0.1.0Indexing 110 PDFs
I had organized my PDFs in topic subfolders:

The indexing script needed to search recursively through these subfolders (Issue #6), and handle problematic PDFs gracefully (Issue #7). With those fixes in place, I pointed it at my library:
python scripts\index_books.py --pdf-dir "C:\Users\username\tableau-knowledge-mcp\books"The Results:
Loading embedding model: sentence-transformers/all-MiniLM-L6-v2
Initializing ChromaDB at: ./chroma_db
Found 110 PDF files
[1/110] Processing: Learning_Tableau.pdf
✓ Indexed 856 chunks
[2/110] Processing: Python_for_Data_Analysis.pdf
✓ Indexed 1,243 chunks
...
✅ Indexing complete!
📊 Successfully indexed: 95/110 books
📊 Total chunks: 75,003
📊 Average chunks per book: 789Time: ~23 minutes for 110 PDFs
Success rate: 86% (15 files failed—scanned PDFs, encrypted files, or corrupted downloads)
Believe it or not, over 75% is actually an excellent outcome. The 15 failures were expected and the script handled them quite well without crashing.
🖥️ Configuring Claude Desktop
With the knowledge base built, I needed to connect both MCP servers to Claude Desktop.
C:\Users\{username}\AppData\Roaming\Claude\claude_desktop_config.json
I set up both servers with the proper configuration (see the detailed config in Issue #8 below). The critical piece I discovered through troubleshooting: you need to use absolute paths with double backslashes on Windows, and include an env section with CHROMA_DB_PATH as an absolute path. Without that, the knowledge base server would disconnect immediately.
After restarting Claude Desktop completely and waiting 10 seconds, both servers showed as connected.

Image: Both MCP servers showing as “running” in Claude Desktop
✅ Testing the Magic
With both servers running, I could finally test what this thing could actually do.
Test 1: Knowledge Base Only
Result: Complete list of 95 indexed books organized by topic. Fast, accurate, working perfectly.
Test 2: Tableau Only
Result: Full list of datasources with metadata. The Tableau MCP server integration was solid.
Test 3: Both Together – The Real Power
Result:
- Relevant excerpts from my Tableau books explaining LOD calculation concepts and best practices
- Complete field list from my Sample-Superstore datasource with data types
- Natural synthesis combining the theoretical knowledge with my live data context
This is where it clicked. I could learn about a technique and immediately see how to apply it to my actual data—all in one conversation. No switching between applications. No trying to remember which book had that example. Just natural conversation that pulls from both knowledge sources.
💡 Real Use Cases
The system maintains context across the conversation, so you can ask follow-up questions that reference both sources naturally.
📊 By the Numbers
- Setup time: ~45 minutes (including C++ Build Tools installation)
- Indexing time: ~23 minutes for 110 PDFs
- Books indexed: 95 out of 110 (86% success rate)
- Total chunks: 75,003 searchable text segments
- Average chunks per book: 789
- Storage: ~450MB for ChromaDB vector database
- Embedding model: sentence-transformers/all-MiniLM-L6-v2 (80MB)
🚀 Want to Build Your Own?
The complete code, setup instructions, and troubleshooting guide are available on GitHub.
What you’ll need:
- Windows 10/11 (for this guide; adaptable to Mac/Linux)
- Python 3.10+
- Node.js 18+ (for Tableau MCP server)
- Tableau Cloud account with personal access token
- Claude Desktop (free download from Anthropic)
- Your collection of PDF technical books
- ~1 hour for setup (following the guide)
If you’re planning to build this yourself, every Windows-specific gotcha is documented below. The setup takes about 45 minutes, but only if you know what you’re doing. These issues cost me several hours to solve, so hopefully this section saves you time.
**If you’re building this yourself, the solutions to potential roadblocks below can save you hours. Not planning to build? Feel free to skip to ‘Final Thoughts’ below.**
🔧 Appendix: Issues & Solutions (For Builders)
Here’s every real issue I encountered in a fresh Windows environment, with the exact solutions that worked.
Issue #1: PowerShell ExecutionPolicy Blocks Virtual Environment
The Problem:
.\venv\Scripts\Activate.ps1 cannot be loaded because running scripts
is disabled on this systemWindows PowerShell’s default security policy prevents running any scripts, including Python’s virtual environment activation script.
The Fix:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUserThis allows locally-created scripts (like Activate.ps1) to run without Administrator rights.
Issue #2: ChromaDB Requires C++ Build Tools
The Problem:
error: Microsoft Visual C++ 14.0 or greater is required
Building wheel for chroma-hnswlib (pyproject.toml) ... errorChromaDB needs to compile C++ code on Windows. Fresh Windows installations don’t have the necessary build tools.
The Fix:
- Download Microsoft C++ Build Tools: https://visualstudio.microsoft.com/visual-cpp-build-tools/
- Run installer and select “Desktop development with C++”
- Install (~6GB download, 10-15 minutes)
- Restart your computer (actually restart, not just close windows)
- Then
pip install chromadbworks
Time investment: ~15 minutes, but this is non-negotiable for Windows users.
Issue #3: NumPy 2.0 Breaks Compatibility
The Problem:
AttributeError: `np.float_` was removed in the NumPy 2.0 release.
Use `np.float64` instead.NumPy 2.0 (released in 2024) removed old type aliases that sentence-transformers and chromadb still reference.
The Fix:
pip install "numpy<2.0"This installs NumPy 1.26.x which maintains backward compatibility.
Why it matters: Without this constraint, you’ll get runtime errors during indexing even though everything installed successfully.
Issue #4: sentence-transformers Version Incompatibility
The Problem:
ImportError: cannot import name 'cached_download' from 'huggingface_hub'The older sentence-transformers version (2.2.2) is incompatible with newer huggingface_hub releases.
The Fix:
pip install --upgrade sentence-transformersThis upgrades to 2.7.0+ which works with current huggingface_hub versions.
Issue #5: Missing MCP Package
The Problem:
Error importing dependencies: No module named 'mcp'When installing packages individually to solve other issues, it’s easy to miss the core MCP server package.
The Fix:
pip install mcpSimple but critical, as the entire server won’t start without it.
Issue #6: Script Doesn’t Search Subfolders
The Problem:
Found 0 PDF filesThe script only searched the immediate directory, not subfolders where books are organized by topic.
The Fix:
Update scripts/index_books.py:
# Change this:
pdf_files = list(self.pdf_dir.glob("*.pdf"))
# To this:
pdf_files = list(self.pdf_dir.glob("**/*.pdf"))The ** makes it search recursively through all subfolders.
Issue #7: Script Crashes on Problematic PDFs
The Problem:
TypeError: TextInputSequence must be strSome PDFs (scanned, encrypted, corrupted) return None instead of text, causing ChromaDB’s embedding function to crash.
The Fix:
Add robust error handling to the indexing script:
# Validate text exists and is a string
if not text or not isinstance(text, str) or len(text.strip()) == 0:
logger.warning(f"No valid text extracted from {pdf_path.name}")
return ""
# Validate chunks before sending to ChromaDB
valid_batch = []
for c in batch:
if (c.get('text') and
isinstance(c['text'], str) and
len(c['text'].strip()) > 0):
valid_batch.append(c)Now the script skips problematic files and continues processing.
Issue #8: Knowledge Base Server Disconnects in Claude Desktop
The Problem:
Knowledge base server shows as disconnected in Claude Desktop while Tableau server works fine.
The Root Cause:
When Claude Desktop runs the Python script, the working directory isn’t the project folder, so relative paths like ./chroma_db don’t resolve correctly.
The Fix:
Add an env section with absolute paths to claude_desktop_config.json:
{
"mcpServers": {
"tableau": {
"command": "npx",
"args": ["-y", "@tableau/mcp-server@latest"],
"env": {
"SERVER": "https://your-site.online.tableau.com",
"SITE_NAME": "your-site-name",
"PAT_NAME": "your-token-name",
"PAT_VALUE": "your-token-secret"
}
},
"technical-knowledge-base": {
"command": "C:\\Users\\{username}\\project\\venv\\Scripts\\python.exe",
"args": [
"C:\\Users\\{username}\\project\\src\\server.py"
],
"env": {
"CHROMA_DB_PATH": "C:\\Users\\{username}\\project\\chroma_db",
"PDF_LIBRARY_PATH": "C:\\Users\\{username}\\books",
"EMBEDDING_MODEL": "sentence-transformers/all-MiniLM-L6-v2",
"MAX_SEARCH_RESULTS": "5"
}
}
}
}Critical details:
- Use double backslashes
\\on Windows - Use absolute paths (not relative
./) - The
envsection withCHROMA_DB_PATHis essential
After updating the config, completely quit Claude Desktop (File → Quit), wait 10 seconds, then reopen. Both servers should connect successfully.
💭 Final Thoughts
This project turned into way more than I expected. What started as “let’s index some PDFs” became a deep dive into Windows development, managing dependencies, and setting up production-ready processes.
Fortunately, I was able to document each issue along the way. Every error is resolved. Every weird Windows quirk is explained. That’s the value of building in a fresh environment and documenting the journey.
The result is a system that can combine theoretical knowledge with live data. No more switching between PDF readers and Tableau. No more trying to remember which book explained which concept. Just natural conversation with your entire technical library and your live data together.
For me, this has changed how I learn new Tableau techniques and create content. I rely far less on a sea of bookmarked websites. I can research a topic across multiple books, then immediately test examples with real data—all in one place.
If you’re someone who collects technical books and works with data daily, this might be worth the setup time. The first time you ask a question that pulls from both your library and your live Tableau data, you’ll start picturing the possibilities. Think of the time saved! Imagine the accelerated learning curve!
Ready to try it? Check out the GitHub repository and let me know how your build goes. I’d love to hear how others use this approach and take it even further.
If you run into any issues during setup, be sure to check out the troubleshooting guide for additional guidance.