Graphify

Turn any folder of code, documentation, papers, or images into a queryable knowledge graph. Uses tree-sitter AST for code (20+ languages), Whisper for audio/video, and LLM agents for documents.

Think of it as building a semantic index of your codebase—one you can search by concept instead of grepping for filenames.

What This Skill Does

Graphify builds interactive knowledge graphs that reveal architecture, dependencies, and cross-file relationships. Instead of jumping between files, ask the graph “what calls this function?” or “which modules depend on X?” and get visual connection maps.

The skill produces three artifacts: an interactive HTML visualization, a structured markdown report highlighting key insights, and a persistent JSON graph for programmatic queries.

Quick Start

# Build knowledge graph from current directory
graphify .

# Build from specific path
graphify /path/to/project

# Watch mode — auto-rebuild on file changes
graphify . --watch

# Expose graph as MCP server for Claude to query
python -m graphify.serve graphify-out/graph.json

When to Use Graphify

Before planning — Build the graph first to understand architecture, then plan with visual context.

Architecture analysis — Find god nodes (most-connected concepts), surprising dependencies, and architectural anti-patterns.

Codebase navigation — Prefer structure-based navigation over grepping when entering unfamiliar projects.

Token-efficient context — 71.5x fewer tokens than raw files — crucial for large codebases.

Cross-domain understanding — Combine code AST + documents + papers + images for complete context.

Output Artifacts

File	Purpose
`graphify-out/graph.html`	Interactive visualization with search, filtering, community detection
`graphify-out/GRAPH_REPORT.md`	God nodes, surprising connections, architecture insights, suggested questions
`graphify-out/graph.json`	Persistent graph for queries across sessions
`graphify-out/cache/`	SHA256-based incremental updates (only reprocesses changed files)

Interactive Visualization

Open graphify-out/graph.html in your browser to:

Search for concepts — Type a class name, function, or topic
Filter by community — See related concepts grouped automatically
Explore neighbors — Click a node to see everything connected
Query shortest path — Find the connection route between two concepts

Supported Input Types

Code Files (Tree-Sitter AST)

20+ languages supported with deterministic parsing:

Python, JavaScript, TypeScript, Go, Rust, Java, C, C++, Ruby, C#, Kotlin, Scala, PHP, Swift, Lua, Zig, PowerShell, Elixir, Objective-C, Julia

For each language, the graph extracts:

Function and class definitions
Imports and dependencies
Call chains and inheritance hierarchies
Variable assignments and mutations
Type annotations (where available)

No API calls for code — Tree-sitter runs locally. File contents never leave your machine.

Audio & Video (Whisper)

Transcribe podcasts, talks, and videos:

# Builds knowledge graph from audio transcript + timestamps
graphify ./videos/*.mp4 --watch

Local transcription — Uses Whisper on-device. No files sent to external APIs.

Documents & Papers (LLM-Powered)

PDFs, Markdown, Office documents — LLM agents extract key concepts, relationships, and summaries:

# Extract from all PDFs and markdown
graphify ./docs ./papers/*.pdf

Semantic extraction — Processes via your configured model provider (Claude/OpenAI).

Images (Vision Models)

Screenshots, diagrams, photographs — extract text and describe visual elements:

graphify ./screenshots/*.png

Graph Report Insights

The auto-generated GRAPH_REPORT.md includes:

God nodes — Most-connected concepts in the graph (likely core abstractions)
Surprising connections — Unexpected dependencies or architectural patterns
Community detection — Automatically grouped concepts and modules
Suggested questions — Intelligent queries to explore based on graph structure

Relationship Confidence Tags

Every relationship is tagged by how it was discovered:

Tag	Meaning	Reliability
`EXTRACTED`	Directly from AST (imports, calls, inheritance)	100%
`INFERRED`	LLM-derived from semantic analysis	85–95%
`AMBIGUOUS`	Uncertain — requires human verification	<85%

MCP Server Mode

Expose the graph as an MCP server so Claude can query it directly:

# Start server from graph file
python -m graphify.serve graphify-out/graph.json

Available MCP Tools

Tool	Query Type
`query_graph`	Search for concepts by name or type
`get_node`	Details of a specific node (definition, references)
`get_neighbors`	Find all connected concepts
`shortest_path`	Find minimum hops between two concepts

Claude Code Integration

Add to .claude/.mcp.json:

{
  "mcpServers": {
    "graphify": {
      "command": "python",
      "args": ["-m", "graphify.serve", "graphify-out/graph.json"]
    }
  }
}

Then query within Claude Code:

Use the graphify MCP tool to find how AuthService connects to UserRepository

Installation

ClaudeKit already ships ck:graphify as a bundled skill. Do not run graphify install for ClaudeKit setup: upstream uses that command to install its standalone Claude skill at ~/.claude/skills/graphify/SKILL.md.

# Core install (code parsing only)
pip install graphifyy

# With MCP server support
pip install 'graphifyy[mcp]'

# Full install (MCP + PDF + video + office + community detection)
pip install 'graphifyy[all]'

Note: Package name is graphifyy (double-y) on PyPI.

Requirements: Python 3.10+

Architecture: Three-Pass Processing

Pass 1: AST Extraction (Local, Deterministic)

Tree-sitter parses code in 20+ languages. Extraction is:

Deterministic — same input → same output every time
Local — no API calls, file contents stay private
Fast — parallel processing across files

Pass 2: Audio/Video (Local via Whisper)

Transcription runs on-device:

Private — audio never uploaded
Timestamped — preserves temporal markers
Fallback-safe — missing audio → gap in graph, not a blocker

Pass 3: Semantic Extraction (API-Based)

LLM agents process documents, papers, and images:

Parallel — multiple documents processed simultaneously
Configurable — choose model, context window, temperature
Privacy-aware — send only what’s necessary

Incremental Updates

Graph rebuilds are optimized:

SHA256 file hashing — unchanged files skipped
Caching — previous results reused
Partial updates — only changed files reprocessed

Perfect for watch mode during active development.

Privacy & Data Handling

Input Type	Processing	Privacy
Code files	Tree-sitter (local)	Private
Audio/video	Whisper (local)	Private
Documents	LLM API (configurable)	Depends on provider
Images	Vision model (configurable)	Depends on provider

No data leaves your machine unless you explicitly send it via LLM API calls.

Workflow Integration

Before Planning

graphify /path/to/target
# Read GRAPH_REPORT.md for architecture insights
/ck:plan "Implement feature X"  # now informed by graph

With Scout

Use graphify for high-level structure, scout for specific file details:

graphify .                      # build architecture graph
/ck:scout "auth module"          # find specific files

Active Development

graphify . --watch              # rebuild graph as you work
# Use graph.html as live architecture reference

Limitations

First build on large codebases can be slow (AST parsing + LLM calls)
Semantic extraction quality depends on the underlying model
Neo4j integration requires separate setup (pip install 'graphifyy[neo4j]')
Leiden community detection requires pip install 'graphifyy[leiden]'

For very large codebases (1000+ files), consider:

Scoping to a subdirectory (graphify ./src/core)
Running in watch mode to incrementally build the graph
Using the cache (graphify-out/cache/) for faster rebuilds

Scout — Quick file search for specific capabilities
Repomix — Full context dump for raw codebase data
GKG — Semantic symbol navigation with go-to-definition
Plan — Plan implementation after understanding architecture

Best Practices

Build early — Run graphify before planning complex features. Architecture understanding saves time.

Use watch mode — Enable --watch during exploratory coding to keep the graph fresh.

Combine with scout — Use graphify for overview, scout for drill-down into specific files.

Export for sharing — Copy graphify-out/graph.html and GRAPH_REPORT.md to share architecture with teammates.

Verify ambiguous edges — Relationships tagged AMBIGUOUS need human verification. Don’t assume them without checking the code.

Examples

Understanding a Django Project

graphify . --watch
# Opens graph.html → see models, views, middleware connections
# GRAPH_REPORT.md → lists god nodes (core models, middleware)

Mapping a Microservices Architecture

graphify ./services
# Graph shows service boundaries and external dependencies
# Shortest path queries reveal call chains across services

Learning a Codebase Before Contributing

graphify /external/project
# Read GRAPH_REPORT.md for architecture
# Query graph for "where are authentication checks?"
# Browse visualizations to plan contribution