Graphify

Turn any folder of code, documentation, papers, or images into a queryable knowledge graph. Uses tree-sitter AST for code (20+ languages), Whisper for audio/video, and LLM agents for documents.

Think of it as building a semantic index of your codebase—one you can search by concept instead of grepping for filenames.

What This Skill Does

Graphify builds interactive knowledge graphs that reveal architecture, dependencies, and cross-file relationships. Instead of jumping between files, ask the graph “what calls this function?” or “which modules depend on X?” and get visual connection maps.

The skill produces three artifacts: an interactive HTML visualization, a structured markdown report highlighting key insights, and a persistent JSON graph for programmatic queries.

Quick Start

# Build knowledge graph from current directory
graphify .

# Build from specific path
graphify /path/to/project

# Watch mode — auto-rebuild on file changes
graphify . --watch

# Expose graph as MCP server for Claude to query
python -m graphify.serve graphify-out/graph.json

When to Use Graphify

Before planning — Build the graph first to understand architecture, then plan with visual context.

Architecture analysis — Find god nodes (most-connected concepts), surprising dependencies, and architectural anti-patterns.

Codebase navigation — Prefer structure-based navigation over grepping when entering unfamiliar projects.

Token-efficient context — 71.5x fewer tokens than raw files — crucial for large codebases.

Cross-domain understanding — Combine code AST + documents + papers + images for complete context.

Output Artifacts

FilePurpose
graphify-out/graph.htmlInteractive visualization with search, filtering, community detection
graphify-out/GRAPH_REPORT.mdGod nodes, surprising connections, architecture insights, suggested questions
graphify-out/graph.jsonPersistent graph for queries across sessions
graphify-out/cache/SHA256-based incremental updates (only reprocesses changed files)

Interactive Visualization

Open graphify-out/graph.html in your browser to:

  • Search for concepts — Type a class name, function, or topic
  • Filter by community — See related concepts grouped automatically
  • Explore neighbors — Click a node to see everything connected
  • Query shortest path — Find the connection route between two concepts

Supported Input Types

Code Files (Tree-Sitter AST)

20+ languages supported with deterministic parsing:

Python, JavaScript, TypeScript, Go, Rust, Java, C, C++, Ruby, C#, Kotlin, Scala, PHP, Swift, Lua, Zig, PowerShell, Elixir, Objective-C, Julia

For each language, the graph extracts:

  • Function and class definitions
  • Imports and dependencies
  • Call chains and inheritance hierarchies
  • Variable assignments and mutations
  • Type annotations (where available)

No API calls for code — Tree-sitter runs locally. File contents never leave your machine.

Audio & Video (Whisper)

Transcribe podcasts, talks, and videos:

# Builds knowledge graph from audio transcript + timestamps
graphify ./videos/*.mp4 --watch

Local transcription — Uses Whisper on-device. No files sent to external APIs.

Documents & Papers (LLM-Powered)

PDFs, Markdown, Office documents — LLM agents extract key concepts, relationships, and summaries:

# Extract from all PDFs and markdown
graphify ./docs ./papers/*.pdf

Semantic extraction — Processes via your configured model provider (Claude/OpenAI).

Images (Vision Models)

Screenshots, diagrams, photographs — extract text and describe visual elements:

graphify ./screenshots/*.png

Graph Report Insights

The auto-generated GRAPH_REPORT.md includes:

  • God nodes — Most-connected concepts in the graph (likely core abstractions)
  • Surprising connections — Unexpected dependencies or architectural patterns
  • Community detection — Automatically grouped concepts and modules
  • Suggested questions — Intelligent queries to explore based on graph structure

Relationship Confidence Tags

Every relationship is tagged by how it was discovered:

TagMeaningReliability
EXTRACTEDDirectly from AST (imports, calls, inheritance)100%
INFERREDLLM-derived from semantic analysis85–95%
AMBIGUOUSUncertain — requires human verification<85%

MCP Server Mode

Expose the graph as an MCP server so Claude can query it directly:

# Start server from graph file
python -m graphify.serve graphify-out/graph.json

Available MCP Tools

ToolQuery Type
query_graphSearch for concepts by name or type
get_nodeDetails of a specific node (definition, references)
get_neighborsFind all connected concepts
shortest_pathFind minimum hops between two concepts

Claude Code Integration

Add to .claude/.mcp.json:

{
  "mcpServers": {
    "graphify": {
      "command": "python",
      "args": ["-m", "graphify.serve", "graphify-out/graph.json"]
    }
  }
}

Then query within Claude Code:

Use the graphify MCP tool to find how AuthService connects to UserRepository

Installation

# Core install (code parsing only)
pip install graphifyy

# Download language grammars
graphify install

# With MCP server support
pip install 'graphifyy[mcp]'

# Full install (MCP + PDF + video + office + community detection)
pip install 'graphifyy[all]'

Note: Package name is graphifyy (double-y) on PyPI.

Requirements: Python 3.10+

Architecture: Three-Pass Processing

Pass 1: AST Extraction (Local, Deterministic)

Tree-sitter parses code in 20+ languages. Extraction is:

  • Deterministic — same input → same output every time
  • Local — no API calls, file contents stay private
  • Fast — parallel processing across files

Pass 2: Audio/Video (Local via Whisper)

Transcription runs on-device:

  • Private — audio never uploaded
  • Timestamped — preserves temporal markers
  • Fallback-safe — missing audio → gap in graph, not a blocker

Pass 3: Semantic Extraction (API-Based)

LLM agents process documents, papers, and images:

  • Parallel — multiple documents processed simultaneously
  • Configurable — choose model, context window, temperature
  • Privacy-aware — send only what’s necessary

Incremental Updates

Graph rebuilds are optimized:

  • SHA256 file hashing — unchanged files skipped
  • Caching — previous results reused
  • Partial updates — only changed files reprocessed

Perfect for watch mode during active development.

Privacy & Data Handling

Input TypeProcessingPrivacy
Code filesTree-sitter (local)Private
Audio/videoWhisper (local)Private
DocumentsLLM API (configurable)Depends on provider
ImagesVision model (configurable)Depends on provider

No data leaves your machine unless you explicitly send it via LLM API calls.

Workflow Integration

Before Planning

graphify /path/to/target
# Read GRAPH_REPORT.md for architecture insights
/ck:plan "Implement feature X"  # now informed by graph

With Scout

Use graphify for high-level structure, scout for specific file details:

graphify .                      # build architecture graph
/ck:scout "auth module"          # find specific files

Active Development

graphify . --watch              # rebuild graph as you work
# Use graph.html as live architecture reference

Limitations

  • First build on large codebases can be slow (AST parsing + LLM calls)
  • Semantic extraction quality depends on the underlying model
  • Neo4j integration requires separate setup (pip install 'graphifyy[neo4j]')
  • Leiden community detection requires pip install 'graphifyy[leiden]'

For very large codebases (1000+ files), consider:

  • Scoping to a subdirectory (graphify ./src/core)
  • Running in watch mode to incrementally build the graph
  • Using the cache (graphify-out/cache/) for faster rebuilds
  • Scout — Quick file search for specific capabilities
  • Repomix — Full context dump for raw codebase data
  • GKG — Semantic symbol navigation with go-to-definition
  • Plan — Plan implementation after understanding architecture

Best Practices

Build early — Run graphify before planning complex features. Architecture understanding saves time.

Use watch mode — Enable --watch during exploratory coding to keep the graph fresh.

Combine with scout — Use graphify for overview, scout for drill-down into specific files.

Export for sharing — Copy graphify-out/graph.html and GRAPH_REPORT.md to share architecture with teammates.

Verify ambiguous edges — Relationships tagged AMBIGUOUS need human verification. Don’t assume them without checking the code.

Examples

Understanding a Django Project

graphify . --watch
# Opens graph.html → see models, views, middleware connections
# GRAPH_REPORT.md → lists god nodes (core models, middleware)

Mapping a Microservices Architecture

graphify ./services
# Graph shows service boundaries and external dependencies
# Shortest path queries reveal call chains across services

Learning a Codebase Before Contributing

graphify /external/project
# Read GRAPH_REPORT.md for architecture
# Query graph for "where are authentication checks?"
# Browse visualizations to plan contribution