Frequently Asked Questions

Caching (NEW in v2.7.0!)

How does caching work?

Search results are automatically cached locally for 1 hour (3600 seconds). When you make the same query again, you get instant results at $0 API cost. The cache key is based on: query text + provider + max_results.

Where are cached results stored?

In `.cache/` directory inside the skill folder by default. Override with `WSP_CACHE_DIR` environment variable: ```bash export WSP_CACHE_DIR="/path/to/custom/cache" ```

How do I see cache stats?

```bash python3 scripts/search.py --cache-stats ``` This shows total entries, size, oldest/newest entries, and breakdown by provider.

How do I clear the cache?

```bash python3 scripts/search.py --clear-cache ```

Can I change the cache TTL?

Yes! Default is 3600 seconds (1 hour). Set a custom TTL per request: ```bash python3 scripts/search.py -q "query" --cache-ttl 7200 # 2 hours ```

How do I skip the cache?

Use `--no-cache` to always fetch fresh results: ```bash python3 scripts/search.py -q "query" --no-cache ```

How do I know if a result was cached?

The response includes:

`"cached": true/false` — whether result came from cache

`"cache_age_seconds": 1234` — how old the cached result is (when cached)

---

General

How does auto-routing decide which provider to use?

Multi-signal analysis scores each provider based on: price patterns, explanation phrases, similarity keywords, URLs, product+brand combos, and query complexity. Highest score wins. Use `--explain-routing` to see the decision breakdown.

What if it picks the wrong provider?

Override with `-p serper/tavily/exa`. Check `--explain-routing` to understand why it chose differently.

What does "low confidence" mean?

Query is ambiguous (e.g., "Tesla" could be cars, stock, or company). Falls back to Serper. Results may vary.

Can I disable a provider?

Yes! In config.json: `"disabled_providers": ["exa"]`

---

API Keys

Which API keys do I need?

At minimum ONE key (or SearXNG instance). You can use just Serper, just Tavily, just Exa, just You.com, or just SearXNG. Missing keys = that provider is skipped.

Where do I get API keys?

Serper: https://serper.dev (2,500 free queries, no credit card)

Tavily: https://tavily.com (1,000 free searches/month)

Exa: https://exa.ai (1,000 free searches/month)

You.com: https://api.you.com (Limited free tier for testing)

SearXNG: Self-hosted, no key needed! https://docs.searxng.org/admin/installation.html

How do I set API keys?

Two options (both auto-load): Option A: .env file ```bash export SERPER_API_KEY="your-key" ``` Option B: config.json (v2.2.1+) ```json { "serper": { "api_key": "your-key" } } ```

---

Routing Details

How do I know which provider handled my search?

Check `routing.provider` in JSON output, or `[🔍 Searched with: Provider]` in chat responses.

Why does it sometimes choose Serper for research questions?

If the query has brand/product signals (e.g., "how does Tesla FSD work"), shopping intent may outweigh research intent. Override with `-p tavily`.

What's the confidence threshold?

Default: 0.3 (30%). Below this = low confidence, uses fallback. Adjustable in config.json.

---

You.com Specific

When should I use You.com over other providers?

You.com excels at:

RAG applications: Pre-extracted snippets ready for LLM consumption

Real-time information: Current events, breaking news, status updates

Combined sources: Web + news results in a single API call

Summarization tasks: "What's the latest on...", "Key points about..."

What's the livecrawl feature?

You.com can fetch full page content on-demand. Use `--livecrawl web` for web results, `--livecrawl news` for news articles, or `--livecrawl all` for both. Content is returned in Markdown format.

Does You.com include news automatically?

Yes! You.com's intelligent classification automatically includes relevant news results when your query has news intent. You can also use `--include-news` to explicitly enable it.

---

SearXNG Specific

Do I need my own SearXNG instance?

Yes! SearXNG is self-hosted. Most public instances disable the JSON API to prevent bot abuse. You need to run your own instance with JSON format enabled. See: https://docs.searxng.org/admin/installation.html

How do I set up SearXNG?

Docker is the easiest way: ```bash docker run -d -p 8080:8080 searxng/searxng ``` Then enable JSON in `settings.yml`: ```yaml search: formats: - html - json ```

Why am I getting "403 Forbidden"?

The JSON API is disabled on your instance. Enable it in `settings.yml` under `search.formats`.

What's the API cost for SearXNG?

$0! SearXNG is free and open-source. You only pay for hosting (~$5/month VPS). Unlimited queries.

When should I use SearXNG?

Privacy-sensitive queries: No tracking, no profiling

Budget-conscious: $0 API cost

Diverse results: Aggregates 70+ search engines

Self-hosted requirements: Full control over your search infrastructure

Fallback provider: When paid APIs are rate-limited

Can I limit which search engines SearXNG uses?

Yes! Use `--engines google,bing,duckduckgo` to specify engines, or configure defaults in `config.json`.

---

Provider Selection

Which provider should I use?

| Query Type | Best Provider | Why | |------------|---------------|-----| | Shopping ("buy laptop", "cheap shoes") | Serper | Google Shopping, price comparisons, local stores | | Research ("how does X work?", "explain Y") | Tavily | Deep research, academic quality, full-page content | | Startups/Papers ("companies like X", "arxiv papers") | Exa | Semantic/neural search, startup discovery | | RAG/Real-time ("summarize latest", "current events") | You.com | LLM-ready snippets, combined web+news | | Privacy ("search without tracking") | SearXNG | No tracking, multi-source, self-hosted |

Tip: Enable auto-routing and let the skill choose automatically! 🎯

Do I need all 5 providers?

No! All providers are optional. You can use:

1 provider (e.g., just Serper for everything)

2-3 providers (e.g., Serper + You.com for most needs)

All 5 (maximum flexibility + fallback options)

How much do the APIs cost?

| Provider | Free Tier | Paid Plan | |----------|-----------|-----------| | Serper | 2,500 queries/mo | $50/mo (5,000 queries) | | Tavily | 1,000 queries/mo | $150/mo (10,000 queries) | | Exa | 1,000 queries/mo | $1,000/mo (100,000 queries) | | You.com | Limited free | ~$10/mo (varies by usage) | | SearXNG | FREE ✅ | Only VPS cost (~$5/mo if self-hosting) |

Budget tip: Use SearXNG as primary + others as fallback for specialized queries!

How private is SearXNG really?

| Setup | Privacy Level | |-------|---------------| | Self-hosted (your VPS) | ⭐⭐⭐⭐⭐ You control everything | | Self-hosted (Docker local) | ⭐⭐⭐⭐⭐ Fully private | | Public instance | ⭐⭐⭐ Depends on operator's logging policy |

Best practice: Self-host if privacy is critical.

Which provider has the best results?

| Metric | Winner | |--------|--------| | Most accurate for facts | Serper (Google) | | Best for research depth | Tavily | | Best for semantic queries | Exa | | Best for RAG/AI context | You.com | | Most diverse sources | SearXNG (70+ engines) | | Most private | SearXNG (self-hosted) |

Recommendation: Enable multiple providers + auto-routing for best overall experience.

How does auto-routing work?

The skill analyzes your query for keywords and patterns:

```python "buy cheap laptop" → Serper (shopping signals) "how does AI work?" → Tavily (research/explanation) "companies like X" → Exa (semantic/similar) "summarize latest news" → You.com (RAG/real-time) "search privately" → SearXNG (privacy signals) ```

Confidence threshold: Only routes if confidence > 30%. Otherwise uses default provider. Override: Use `-p provider` to force a specific provider.

---

Production Use

Can I use this in production?

Yes! Web-search-plus is production-ready:

✅ Error handling with automatic fallback

✅ Rate limit protection

✅ Timeout handling (30s per provider)

✅ API key security (.env + config.json gitignored)

✅ 5 providers for redundancy

Tip: Monitor API usage to avoid exceeding free tiers!

What if I run out of API credits?

1. Fallback chain: Other enabled providers automatically take over 2. Use SearXNG: Switch to self-hosted (unlimited queries) 3. Upgrade plan: Paid tiers have higher limits 4. Rate limit: Use `disabled_providers` to skip exhausted APIs temporarily

---

Updates

How do I update to the latest version?

Via ClawHub (recommended): ```bash clawhub update web-search-plus --registry "https://www.clawhub.ai" --no-input ``` Manually: ```bash cd /path/to/workspace/skills/web-search-plus/ git pull origin main python3 scripts/setup.py # Re-run to configure new features ```

Where can I report bugs or request features?

GitHub Issues: https://github.com/robbyczgw-cla/web-search-plus/issues

ClawHub: https://www.clawhub.ai/skills/web-search-plus