API keys

Paperclip

Search, read, and analyze 11M+ papers, 225K+ regulatory documents, 1M+ clinical trials, and 150M+ abstracts via CLI, Python SDK, or MCP

Overview

Paperclip exposes the same corpus through a command-line interface, an optional Python SDK (gxl_paperclip), and MCP, allowing for the best tool to fit the task. Terminals and shell pipelines use the CLI; scripts, notebooks, and services often use the SDK; coding agents can use either, depending on environment. Every paper is a directory with full text, sections, figures, and supplements on a virtual filesystem at /papers/.

  • Search with natural language or regex across 11M+ papers, 225K+ FDA documents, 1M+ clinical trials, and 150M+ abstracts
  • Run subagents to read papers in parallel and return answers to queries
  • Pipe results through standard Unix tools (grep, awk, sed, jq, etc.)
  • Ask questions over figures
  • Query the database directly with SQL
11M+Full-text papers
150M+Abstracts
8 sourcesbioRxiv · medRxiv · arXiv · PMC · FDA · ClinicalTrials.gov · more

Installation

Choose the installation method that works best for your setup.

1One-line installer recommended

curl -fsSL https://paperclip.gxl.ai/install.sh | bash
Handles everything: installs to ~/.paperclip/, signs you in, and installs agent skills.

Verify your setup

paperclip config
# Server:  https://paperclip.gxl.ai
# Auth:    ✓ you@example.com
# Config:  ~/.paperclip

Alternative: MCP server

Use Paperclip as an MCP server directly — no local install needed.

When using the MCP server, native CLI commands using paperclip are not available.

Claude Code

1

Add the MCP server:

claude mcp add --transport http paperclip https://paperclip.gxl.ai/mcp
2

Start claude, enter /mcp, and select Authenticate under the paperclip server.

Cursor

1

Add to ~/.cursor/mcp.json (or .cursor/mcp.json in your project):

{
  "mcpServers": {
    "paperclip": {
      "url": "https://paperclip.gxl.ai/mcp",
      "type": "http"
    }
  }
}
2

Cmd/Ctrl + Shift + PTools & MCPs, enable the paperclip server, and authenticate.

Quick Start

# Search for papers
paperclip search "CRISPR base editing efficiency"

# Read a paper's metadata
paperclip cat /papers/bio_4f78753a6feb/meta.json

# Preview the first 50 lines
paperclip head -50 /papers/bio_4f78753a6feb/content.lines

# Grep within a single paper
paperclip grep -i "binding affinity" /papers/bio_4f78753a6feb/content.lines

# Grep across the entire corpus
paperclip grep -i "alphamissense" /papers/

# Map over search results with an AI reader
paperclip map --from s_abc123 "What methods were used?"

# Save results to a local file
paperclip search "CRISPR" -n 5 > results.txt

Any command works directly as a subcommand. Use paperclip bash '...' for pipes and chains:

paperclip bash 'search "protein folding" | grep "deep learning"'

Using with Agents

Once installed (via CLI skill or MCP), invoke Paperclip from Claude Code by mentioning /paperclip in your prompt. Claude loads the skill automatically and runs commands on your behalf.

❯ Using /paperclip, find the relative frequency of methods which use Boltz
vs. Chai models in papers

● Skill(paperclip)
  ⎿  Successfully loaded skill

● I'll search for papers using both Boltz and Chai as structure prediction
  models, then dig into the methods sections.

● Bash(paperclip search "Boltz structure prediction" -n 50 --all 2>&1 | head -60)
  ⎿  Found 29 papers  [s_1cbc742e]

● Bash(paperclip search "Chai-1 structure prediction" -n 50 --all 2>&1 | head -60)
  ⎿  Found 20 papers  [s_0ef54807]

You can also run Paperclip commands directly from the terminal alongside Claude:

# In your terminal (while Claude Code is running)
paperclip search "GLP-1 receptor agonist efficacy" -n 5
paperclip map --from s_abc123 "What were the primary endpoints?"
paperclip grep -i "hazard ratio" /papers/med_84637b0c77f5/content.lines
Claude Code will automatically use search → map → synthesize workflows when you ask research questions. The skill teaches it the full command set.

Data sources

Paperclip searches and reads across eight sources: the full bioRxiv and medRxiv preprint archives, PubMed Central (PMC) for open-access peer-reviewed literature, arXiv for CS/ML/physics/math preprints, OpenAlex abstracts for broad scientific coverage, FDA regulatory documents (including PMDA and EMA), ClinicalTrials.gov, and international clinical trial registries. Full-text sources support search, grep, cat, and sql. Abstract-only sources support search and sql.

SourceType# DocumentsYearsUpdatedNotes
bioRxivFull text~400K2013–presentMonthlyPreprint server for the biological sciences, operated by Cold Spring Harbor Laboratory.
medRxivFull text~100K2019–presentMonthlyPreprint server for the health and clinical sciences.
PubMed Central (PMC)Full text~7.5MAll yearsMonthlyOpen-access papers only. Includes top journals such as Nature, Science, Cell, NEJM, The Lancet, and more.
arXivFull text~3.0M1991–presentMonthlyAll arXiv categories. PDFs parsed with state-of-the-art OCR. Use -s arxiv.
OpenAlex AbstractsAbstracts only~50MAll yearsTitle + abstract search only (no full text). Use -s abstracts_only. Useful for broad literature surveys.
FDA (US)Full text~225KAll yearsMonthlyDrug approval reviews, labeling, CBER, advisory committee meetings. Use -s fda. Access via /fda/us/.
ClinicalTrials.govFull text~580KAll yearsMonthlyUS clinical trial registry. Use -s trials/us. Access via /clinicaltrials/us/.
Intl Regulatory + TrialsFull text~430KAll yearsMonthlyJapan PMDA (/fda/jp/), EU EPAR (/fda/eu/), ChiCTR, UMIN, JRCT, EudraCT, CTIS, ISRCTN, WHO ICTRP (13 registries). Use -s fda/jp, -s trials/intl, etc.

Regulatory documents and clinical trials are accessed via virtual directory paths (/fda/ and /clinicaltrials/) and support the same filesystem commands as papers. If you have suggestions for what we should index next, reach out to learn@gxl.ai.

Search and discovery commands use dedicated backends — search indexes, parallel document processing, vision models, and SQL engines. They look and feel like regular shell commands, but each one is powered by specialized infrastructure.

paperclip searches

Run multiple search queries in parallel and merge results.
# Run three searches in parallel
paperclip searches "CRISPR delivery" "gene editing cancer" "viral vectors"

# Tag results for accumulation across calls
paperclip searches --quiet --tag crispr "CRISPR delivery" "gene editing cancer"

# All results accumulated into: s_abc123  [tag: crispr]  (58 unique papers)
ParameterTypeDescription
QUERIESrequiredstring[]One or more search queries (quoted strings)
-nint= 50Limit per search (max: 1000)
-mstringMatch mode: any, all, 50%, 75%, phrase
-e, --exactflagExact phrase matching
--recentflagLimit to recent papers
--quietflagMinimal output — only the accumulated count and result ID
--tagstringTag for result accumulation

paperclip lookup

Look up papers by a specific metadata field.
paperclip lookup doi 10.1101/2024.01.15.575556
paperclip lookup author "David Baker" -n 10
paperclip lookup title "CRISPR base editing"
paperclip lookup pmc PMC7194329
paperclip lookup pmid 32943797
paperclip lookup arxiv 2403.03507
paperclip lookup journal "Nature Medicine"
paperclip lookup keywords "CRISPR"
paperclip lookup year 2024
paperclip lookup type review-article
ParameterTypeDescription
FIELDrequiredstringField to search (see below)
VALUErequiredstringValue to look up
-nint= 25Max results

Available fields

ParameterTypeDescription
doiTEXTDigital Object Identifier
titleTEXTPaper title
authorTEXTAuthor name
abstractTEXTAbstract text
sourceTEXTbiorxiv, medrxiv, pmc, arxiv, openalex, fda, or trials
month_year, dateTEXTPublication date (bioRxiv/medRxiv)
pmcTEXTPMC ID (e.g. PMC7194329). PMC only.
pmidTEXTPubMed ID. PMC only.
journalTEXTJournal name. PMC only.
publisherTEXTPublisher name. PMC only.
typeTEXTArticle type (e.g. review-article). PMC only.
keywordsTEXTKeywords. PMC only.
categoryTEXTSubject categories. PMC only.
licenseTEXTLicense type. PMC only.
yearTEXTPublication year. PMC only.
volume, issue, issnTEXTJournal volume, issue, ISSN. PMC only.

paperclip grep

Regex search within a paper or across the entire corpus. Corpus-wide grep uses a trigram index — sub-second across 11M+ documents.
# Corpus-wide regex search
paperclip grep -i "crispr\|cas9" /papers/

# Within a single paper
paperclip grep -i "binding" /papers/bio_4f78753a6feb/content.lines

# Show context lines
paperclip grep -i -C 2 "p53 mutation" /papers/bio_4f78753a6feb/content.lines

# Within a search result set
paperclip grep --from s_abc123 "kinase"

# Multiple patterns (OR'd together)
paperclip grep -e "CRISPR" -e "Cas9" /papers/bio_4f78753a6feb/content.lines
ParameterTypeDescription
PATTERNrequiredstringRegex pattern (or use -e for multiple)
PATHstring/papers/ for corpus-wide, or /papers/<id>/file.lines for single paper
-iflagCase-insensitive matching
-nflagShow line numbers
-cflagCount matches only
-vflagInvert match (show non-matching lines)
-oflagPrint only the matching part of lines
-wflagMatch whole words only
-lflagList only filenames with matches
-hflagSuppress filename prefix
-FflagFixed strings (literal match, no regex)
-e PATTERNstringExplicit pattern (repeatable for multi-pattern OR)
-m NUMintStop after NUM matches
-A NUMintShow NUM lines after each match
-B NUMintShow NUM lines before each match
-C NUMintShow NUM lines of context (before and after)
--fromstringGrep within a search result set (e.g. --from s_abc123)

paperclip scan

Multi-pattern grep — search for several keywords in a single pass, results grouped by pattern.
paperclip scan /papers/bio_4f78753a6feb/content.lines "AAV" "efficiency" "in vivo"
paperclip scan -i -C 3 /papers/bio_4f78753a6feb/content.lines "p53" "MDM2"
ParameterTypeDescription
FILErequiredstringPath to the file to scan
PATTERNSrequiredstring[]One or more patterns (quoted strings)
-iflagCase-insensitive matching
-C NUMint= 5Lines of context per match
scan is faster than running multiple grep calls — it reads the file once and matches all patterns in a single pass.

paperclip sql

Run read-only SQL queries against the unified papers database.
# Count papers by source
paperclip sql "SELECT source, COUNT(*) FROM documents GROUP BY source"

# Find papers by author
paperclip sql "SELECT title, doi, source FROM documents WHERE authors ILIKE '%Doudna%' LIMIT 5"

# Top journals (PMC papers)
paperclip sql "SELECT journal_title, COUNT(*) c FROM documents WHERE source = 'pmc' GROUP BY 1 ORDER BY c DESC LIMIT 10"

# Recent papers about a topic
paperclip sql "SELECT title, doi, pub_date FROM documents WHERE abstract_text ILIKE '%CRISPR%' ORDER BY pub_date DESC LIMIT 10"
ParameterTypeDescription
QUERYrequiredstringSQL SELECT statement
--source, -sstring= allFilter by source: biorxiv, medrxiv, pmc, arxiv, abstracts_only, fda, trials

Only SELECT on the documents table is allowed. 15-second timeout, 200-row limit.

Schema: documents

ParameterTypeDescription
idTEXTPaper identifier
titleTEXTPaper title
doiTEXTDigital Object Identifier
authorsTEXTComma-separated author list
sourceTEXT'biorxiv', 'medrxiv', 'pmc', 'arxiv', 'openalex', 'fda', or 'trials'
abstract_textTEXTPaper abstract
pub_dateTEXTPublication date (text)
journal_titleTEXTJournal name. PMC only.
article_typeTEXTe.g. research-article, review-article. PMC only.
pmidTEXTPubMed ID. PMC only.
keywordsJSONBArray of keywords. PMC only.
categoriesJSONBArray of subject categories. PMC only.
pub_yearINTPublication year. PMC only.
created_atTIMESTAMPWhen the record was indexed
PMC-only columns are NULL for bioRxiv/medRxiv papers. Use WHERE source = 'pmc' when querying these fields.

paperclip export

Export SQL query results to CSV and a table artifact (up to 1,000 rows).
paperclip export "SELECT title, doi, authors FROM documents WHERE source = 'biorxiv' ORDER BY created_at DESC LIMIT 100"

paperclip export --desc "CRISPR papers 2024" "SELECT DISTINCT d.title, d.doi FROM documents d JOIN content_blocks cb ON d.document_id = cb.document_id WHERE cb.content ILIKE '%CRISPR%' AND d.month_year >= '2024-01'"
ParameterTypeDescription
QUERYrequiredstringSQL SELECT or WITH statement
--descstringDescription for the export

The export command has access to additional tables beyond documents: content_blocks (id, document_id, line_number, content, section, block_type) and figures (document_id, graphic, source_path). Up to 1,000 rows.

paperclip map

Run parallel AI reader tasks over multiple papers from a search result.

Each paper gets read in full by an LLM that extracts the information you ask for. Results are returned with per-paper answers.

Typical workflow: search map → synthesize from results.

Step 1 — Searchbash
# Step 1: search for papers
$ paperclip search "AAV gene therapy delivery" -n 3

Found 3 papers  [s_1907a2d0]

  1. Covalently linked adenovirus-AAV complexes as a novel
     platform technology for gene therapy
     Logan Thrasher Collins, Wandy Beatty, et al.
     bio_f402e4cf6e4a · bioRxiv · 2024-08-21

  2. Myocardial infarction creates a critical time window
     for AAV gene therapy
     Gonglie Chen, Yueyang Zhang, et al.
     bio_f2997a136fe7 · bioRxiv · 2024-06-10

  3. A facile chemical strategy to synthesize precise
     AAV-protein conjugates for targeted gene delivery
     Quan Pham, Jake Glicksman, et al.
     bio_ea44a956784e · bioRxiv · 2024-07-20

[197ms, saved to s_1907a2d0]
Step 2 — Mapbash
# Step 2: map over those results
$ paperclip map --from s_1907a2d0 \
    "What delivery vector was used and what transduction efficiency was reported?"

Map complete: 3/3 papers

  ✓ Covalently linked adenovirus-AAV complexes as a novel platform...
    The study used covalently linked adenovirus-AAV (Ad-AAV) complexes.
    In vitro, approximately 80% transduction in HEK293 cells.

  ✓ Myocardial infarction creates a critical time window for AAV...
    The study utilized AAV9. Transduction efficiency: 32.4 ± 4.5% at
    1 day post-MI vs 16.2 ± 3.1% in sham controls.

  ✓ A facile chemical strategy to synthesize precise AAV-protein...
    AAV2-HER2 conjugates for targeted delivery to HER2+ cancer cells.
    Explicit transduction efficiency percentages were not provided.
ParameterTypeDescription
QUERYrequiredstringQuestion applied to every paper
--fromstringResults ID from a previous search (s_xxx). If omitted, uses the last search result.
--output_schemastringJSON schema for structured output per paper
--limitintMax number of papers to process
--offsetintSkip first N papers in the result set
Keep search results to 3–10 papers for fast map execution. Use -n 5 on your search to limit the set. The map command shows a live progress bar during execution.

paperclip filter

Filter search results for relevance to a specific query, removing off-topic papers.
# Filter results — removes irrelevant papers
paperclip filter --from s_abc123 "AAV delivery to the lung"

# Require at least N papers to pass (exit code 1 if fewer survive)
paperclip filter --from s_abc123 --require 5 "AAV delivery to the lung"
ParameterTypeDescription
--fromrequiredstringResults ID to filter (s_xxx)
QUERYrequiredstringThe user's original query (used to judge relevance)
--require NintFail (exit code 1) if fewer than N papers survive. Filtered results are still saved.

Previously-evaluated papers are cached — re-running filter after adding new search results only evaluates new papers.

paperclip ask-image

Analyze figures from papers using vision AI. Requires a paper directory context.
# List available figures for a paper
paperclip bash 'cd /papers/bio_4f78753a6feb/ && ask_image --list'

# Describe a figure (default question)
paperclip bash 'cd /papers/bio_4f78753a6feb/ && ask_image fig1.tif'

# Ask a specific question about a figure
paperclip bash 'cd /papers/bio_4f78753a6feb/ && ask_image fig1.tif "What does this figure show?"'

# Analyze multiple figures
paperclip bash 'cd /papers/bio_4f78753a6feb/ && ask_image fig1.tif fig2.tif "Compare these"'

# First, list figures to see what's available
paperclip ls /papers/bio_4f78753a6feb/figures/
ParameterTypeDescription
FIGURE_IDrequiredstringFigure filename (e.g. fig1.tif, 657517v1_fig1.tif)
QUESTIONstring= "Describe this figure in detail."Question about the image
--listflagList available figures for the current paper
ask-image requires being inside a paper directory. From the CLI, wrap with paperclip bash 'cd /papers/ID/ && ask_image ...'. When used via MCP, the agent's session maintains the working directory automatically.

paperclip cat

Read file contents from the paper filesystem. Large files are automatically truncated with a section index.
# Read metadata
paperclip cat /papers/bio_4f78753a6feb/meta.json

# Read full text (first 100 lines by default)
paperclip cat /papers/bio_4f78753a6feb/content.lines

# Read with line numbers
paperclip cat -n /papers/bio_4f78753a6feb/content.lines

# Read first N lines
paperclip cat --lines 50 /papers/bio_4f78753a6feb/content.lines

# Read a specific line range
paperclip cat --lines 100-200 /papers/bio_4f78753a6feb/content.lines

# Read a specific section
paperclip cat /papers/bio_4f78753a6feb/sections/Methods.lines
ParameterTypeDescription
FILErequiredstringPath to the file to read
-nflagShow line numbers
--lines NintShow first N lines (overrides the 100-line default)
--lines N-MrangeShow lines N through M

For large files, cat automatically shows the first 100 lines plus a section index. Use head, grep, or scan for more targeted reads.

paperclip head / tail

Display the first or last lines of a file.
paperclip head -50 /papers/bio_4f78753a6feb/content.lines
paperclip tail -20 /papers/bio_4f78753a6feb/content.lines

# Short form (equivalent)
paperclip head -n 50 /papers/bio_4f78753a6feb/content.lines
ParameterTypeDescription
FILErequiredstringPath to the file
-n N or -Nint= 10Number of lines to show

paperclip ls

List directory contents in the paper filesystem.
paperclip ls /papers/bio_4f78753a6feb/
# meta.json  content.lines  sections/  figures/  supplements/

paperclip ls /papers/bio_4f78753a6feb/sections/
# Introduction.lines  Methods.lines  Results.lines  Discussion.lines

# Long format with token estimates
paperclip ls -l /papers/bio_4f78753a6feb/
ParameterTypeDescription
PATHstring= current directoryDirectory to list
-lflagLong format (shows permissions, token estimates, titles)
-aflagShow all entries (including hidden)

paperclip tree

Display a directory tree.
paperclip tree /papers/bio_4f78753a6feb/
paperclip tree -L 2 /papers/bio_4f78753a6feb/
ParameterTypeDescription
PATHstring= current directoryDirectory to display
-L, --depthint= 3Maximum depth

paperclip wc

Count lines, words, or characters in paper files.
paperclip wc -l /papers/bio_4f78753a6feb/content.lines
#      847 /papers/bio_4f78753a6feb/content.lines

paperclip wc -w /papers/bio_4f78753a6feb/content.lines
ParameterTypeDescription
FILErequiredstringFile to count
-lflagCount lines
-wflagCount words
-c, -mflagCount characters

With no flags, all three counts are shown.

paperclip cd / pwd

Navigate the virtual filesystem within bash chains. Entering a paper directory shows a summary with title and token count.
# cd + command chain (cd only persists within the bash call)
paperclip bash 'cd /papers/bio_4f78753a6feb/ && ls'
paperclip bash 'cd /papers/bio_4f78753a6feb/ && ask_image --list'
paperclip bash 'cd /papers/bio_4f78753a6feb/ && grep -i "kinase" content.lines'

~ expands to /papers. From the CLI, cd is useful inside paperclip bash '...' chains to set context for commands that need a paper directory (like ask_image). When Paperclip is used as an MCP server, the agent's session maintains directory state across calls automatically.

paperclip sort / uniq / cut / tr

Standard text processing utilities, available as subcommands or in pipes.
# Sort, reverse, numeric, unique
paperclip bash 'grep "kinase" /papers/bio_abc123/content.lines | sort -u'

# Cut specific fields
paperclip bash 'grep "gene" /papers/bio_abc123/content.lines | cut -d: -f2'

# Translate characters
paperclip bash 'head -10 /papers/bio_abc123/content.lines | tr A-Z a-z'
ParameterTypeDescription
sortcmd-r reverse, -n numeric, -u unique
uniqcmd-c count occurrences, -d only duplicates, -u only unique
cutcmd-d delimiter, -f field numbers
trcmd-d delete chars, -s squeeze repeats, or two charsets for translate

paperclip sed

Stream editor (limited subset). Supports address ranges, substitution, and line deletion.
# Extract lines 50-100
paperclip sed -n '50,100p' /papers/bio_4f78753a6feb/content.lines

# Substitution
paperclip bash 'head -20 /papers/bio_abc123/content.lines | sed "s/CRISPR/crispr/g"'

# Delete matching lines
paperclip bash 'cat /papers/bio_abc123/content.lines | sed "/^$/d"'
ParameterTypeDescription
EXPRESSIONrequiredstringsed expression (s///, N,Mp, /pattern/d, /pattern/!d)
FILEstringInput file (or pipe via stdin)
-nflagSuppress default output (print only with p)

paperclip awk

Pattern processing (limited subset).
# Print Nth field
paperclip bash 'grep "kinase" /papers/bio_abc123/content.lines | awk "{print \$2}"'

# Custom delimiter
paperclip bash 'grep "kinase" /papers/bio_abc123/content.lines | awk -F: "{print \$1}"'

# Print matching lines (like grep)
paperclip bash 'cat /papers/bio_abc123/content.lines | awk "/kinase/"'

# Line range
paperclip bash 'cat /papers/bio_abc123/content.lines | awk "NR>=5 && NR<=10"'
ParameterTypeDescription
PROGRAMrequiredstringSupported: /pattern/, /start/,/end/, {print $N}, NR>=N && NR<=M, {print NR, $0}
-Fstring= whitespaceField delimiter

paperclip jq

Minimal JSON query tool. Supports dot-path access, keys, and length.
paperclip bash 'cat /papers/bio_4f78753a6feb/meta.json | jq .title'
paperclip bash 'cat /papers/bio_4f78753a6feb/meta.json | jq keys'
ParameterTypeDescription
FILTERrequiredstringjq filter: ., .key, .key.subkey, keys, length

Pipes & bash

For compound expressions (pipes, &&, chains), wrap in paperclip bash '...'. Single commands don't need this wrapper.
# Count grep matches
paperclip bash 'grep -ic "p53" /papers/bio_4f78753a6feb/content.lines'

# Chain search → grep
paperclip bash 'search "CRISPR delivery" -n 10 | grep "AAV"'

# Write results to scratch space
paperclip bash 'grep -i "kinase" /papers/ > /.gxl/kinase_hits.txt'

# Multi-step pipeline
paperclip bash 'head -100 /papers/bio_abc123/content.lines | grep -i "method" | wc -l'

Inside bash, you can use any allowlisted command without the paperclip prefix. Pipes (|) and chains (&&) work as expected.

Python is also available via paperclip python "print(2+2)" or paperclip python3 script.py, executed in a sandboxed environment.

paperclip results

View, browse, and export saved search and map results.

Every search, lookup, and map command saves its output with a result ID. Use results to access them later.

# Interactive picker (arrow keys to navigate, enter to save)
paperclip results

# View a specific result
paperclip results s_4a2b61f6

# View a map result
paperclip results m_ec2c9cc9

# Non-interactive list
paperclip results --list

# Export to CSV
paperclip results s_4a2b61f6 --save results.csv

# Export to plain text
paperclip results s_4a2b61f6 --save results.txt
ParameterTypeDescription
RESULT_IDstringSpecific result to view (s_xxx for search, m_xxx for map)
--listflagNon-interactive list of all saved results
--save PATHstringExport to file. .csv exports structured data; other extensions export plain text.
With no arguments, paperclip results opens an interactive TUI picker where you can navigate with arrow keys, paginate with n/p, and press enter to select and save.

Redirection

Append > to any Paperclip command to save its output to a local file.

Standard shell redirection captures command output to your local filesystem.

# Save search results locally
paperclip search "CRISPR delivery" -n 10 > crispr_results.txt

# Save a paper's metadata
paperclip cat /papers/bio_4f78753a6feb/meta.json > meta.json

# Save SQL query output
paperclip sql "SELECT title, doi FROM documents WHERE authors ILIKE '%Doudna%' LIMIT 20" > doudna_papers.txt

# Append to an existing file
paperclip search "protein folding" >> research_log.txt

# Download a figure
paperclip cat /papers/bio_abc123/figures/fig1.tif > fig1.tif

When cat encounters a binary file (image/figure), Paperclip generates a short-lived signed URL and streams the bytes directly — no intermediate text encoding.

Authentication

Paperclip can authenticate with OAuth (browser sign-in) or an API key for non-interactive use. OAuth credentials live at ~/.paperclip/credentials.json and refresh automatically. API keys are never written to disk by the CLI; pass them per run or via an environment variable (see ).

paperclip login     # opens browser for sign-in (OAuth)
paperclip config    # check server URL + OAuth status (do not pass --api-key)
paperclip logout    # clear OAuth credentials and local config

For servers, CI, or any environment without a browser, set PAPERCLIP_API_KEY or use the global --api-key flag before the subcommand:

export PAPERCLIP_API_KEY='gxl_...'
paperclip search "CRISPR delivery" -n 5

# Equivalent one-off:
paperclip --api-key 'gxl_...' search "CRISPR delivery" -n 5
The login and config commands only apply to OAuth and local settings; do not pass --api-key to them.

API keys

Create, rotate, and revoke keys from the Paperclip web app: API keys. The CLI sends the key on every request as the X-API-Key header.

Ways to provide the key

  • Environment variable PAPERCLIP_API_KEY (recommended for scripts and CI).
  • Global flag paperclip --api-key KEY ... or paperclip --api-key=KEY ... before the subcommand.
  • For passthrough filesystem commands (e.g. grep, cat), --api-key may also appear after the subcommand; the CLI strips it before running the remote command.
# CI / production job
export PAPERCLIP_API_KEY="$SECRET"
paperclip search "gene therapy" -n 20

# Ad-hoc (avoid shell history: prefer env or a secret store)
paperclip --api-key "$PAPERCLIP_API_KEY" grep -i "AAV" /papers/
If neither OAuth credentials nor an API key is available and stdin is not a TTY, the CLI exits with an error — set PAPERCLIP_API_KEY or run paperclip login on a machine with a browser.

MCP server with API key

To authenticate the MCP server in Cursor (or any MCP client that supports headers), add an X-API-Key header to your mcp.json:

{
  "mcpServers": {
    "paperclip": {
      "url": "https://paperclip.gxl.ai/mcp",
      "type": "http",
      "headers": {
        "X-API-Key": "gxl_your_api_key_here"
      }
    }
  }
}

Config

Configuration is stored in ~/.paperclip/config.json.

paperclip config                              # show current config
paperclip config --url http://localhost:8002   # set server URL (local dev)
ParameterTypeDescription
--urlstringSet the server base URL
--showflagShow current configuration (default when no flags)
PAPERCLIP_BASE_URLenvEnv var to override the server base URL
PAPERCLIP_API_KEYenvAPI key for non-interactive auth (same as global --api-key); not stored locally by the CLI
PAPERCLIP_CONFIG_DIRenv= ~/.paperclipEnv var to override the config directory

Skill Install

Install the Paperclip skill so your coding agent (Claude Code, Codex, Cursor) knows how to use Paperclip automatically.

# Via the CLI (interactive agent picker)
paperclip install

# Install to a specific project directory
paperclip install --dir /path/to/project

# Via npm/npx (no Python required)
npx gxl-paperclip
npx gxl-paperclip --all      # skip interactive prompt
npx gxl-paperclip --cursor   # Cursor only
The skill file is fetched from the server and stays up to date. It also auto-refreshes on paperclip login.

Update

Update Paperclip to the latest version. The CLI also performs silent background update checks (every 4 hours).

paperclip update
# Current version: ...
# Checking for updates...
# ✓ Already up to date (v...)

If an update is available, paperclip (with no arguments) will show a hint in the dashboard.

Filesystem Layout

Each paper lives at /papers/<id>/ with the following structure:

meta.json        — title, authors, doi, date, abstract, journal (JSON)
content.lines    — full text, line-numbered: L<n>: <text>
sections/        — named section files (Introduction.lines, Methods.lines, …)
figures/         — figure files (PMC papers)
supplements/     — supplementary files (PMC papers)

Paper IDs use prefixes by source:

  • bio_ — bioRxiv papers
  • med_ — medRxiv papers
  • PMC — PubMed Central papers (e.g. PMC12345678)
  • arx_ — arXiv papers

Regulatory documents and clinical trials use their own virtual directory paths:

  • /fda/us/ — US FDA documents (NDA, BLA identifiers)
  • /fda/jp/ — Japan PMDA reviews
  • /fda/eu/ — EU EPAR reports
  • /clinicaltrials/us/ — ClinicalTrials.gov (NCT identifiers)
  • /clinicaltrials/cn/ — ChiCTR (Chinese registry)
  • /clinicaltrials/jp/ — UMIN + JRCT (Japanese registries)
  • /clinicaltrials/eu/ — EudraCT + CTIS + ISRCTN (European registries)
  • /clinicaltrials/intl/ — All registries + WHO ICTRP (India, Iran, Australia/NZ, Germany, Netherlands, Korea, Thailand, Brazil, Africa, Peru, Cuba, Sri Lanka, Lebanon)

Scratch space: /.gxl/ is a writable directory for session files, map outputs, and temporary data.

content.lines files can be very long. Always use head -N to paginate rather than cat for large papers.

Python SDK · Overview

The gxl-paperclip package ships a Python SDK alongside the CLI. Installing it gives you both the paperclip command and the gxl_paperclip module.

Use the SDK when you want typed Python APIs, structured return values, and programmatic control flow (scripts, tests, notebooks, backend jobs). Use the CLI when you want interactive exploration, shell composition, or the same invocations you would run in a terminal. Neither is “for humans” or “for agents” exclusively — choose by task.

The SDK is a thin, typed wrapper around the same endpoints the CLI uses. Typed methods take Pythonic keyword arguments and return structured result objects instead of printing raw strings.

Install & authenticate

The SDK authenticates with API keys. Mint one from the dashboard and expose it as an env var:

export PAPERCLIP_API_KEY="pk_..."
from gxl_paperclip import PaperclipClient, APIKeyAuth

client = PaperclipClient.from_env()              # reads PAPERCLIP_API_KEY
# — or pass an explicit strategy —
client = PaperclipClient(auth=APIKeyAuth("pk_..."))
PaperclipClient.from_env() falls back to the credentials saved by paperclip login (~/.paperclip/credentials.json) when no API key env var is set — handy on a workstation where you've already signed in. OAuth is CLI-focused; programmatic users should use an API key.

Python SDK · Quick start

from gxl_paperclip import PaperclipClient

client = PaperclipClient.from_env()

result = client.search("CRISPR lipid nanoparticle", limit=5, source="pmc")
print(result.output)        # same formatted text the CLI prints
print(result.result_id)     # e.g. "s_14bebc10" — pass to map_()

for event in client.map_("What delivery methods were used?", from_results=result.result_id):
    if event.type == "progress":
        print(f"{event.completed}/{event.total} papers done")
    else:
        print(event.output)
Every optional kwarg defaults to None (or False for boolean flags). Leaving it unset means the flag is omitted and the server-side default applies. The tables below list each server-side default.

client.lookup()

client.lookup(
    field: str,
    value: str,
    *,
    limit: int | None = None,
    timeout: float | None = None,
) -> ExecuteResult

Look up papers by a metadata field — doi, pmc, pmid, author, title, journal, year, keywords, and more. Match is partial and case-insensitive.

ParameterTypeDescription
fieldrequiredstrMetadata field to search.
valuerequiredstrValue to match.
limitint= 25Max results.
timeoutfloat= 120 sClient timeout in seconds.

client.sql()

client.sql(
    query: str,
    *,
    source: str | None = None,
    timeout: float | None = None,
) -> ExecuteResult

Read-only SQL against the documents table. Server-side: 15s query timeout, 200-row cap, SELECT only.

ParameterTypeDescription
queryrequiredstrSELECT statement on the documents table.
sourcestr= "all"Pass "pmc" or "biorxiv" to restrict the query to one source.
timeoutfloat= 120 sClient timeout in seconds.

client.map_()

client.map_(
    question: str,
    *,
    from_results: str,
    timeout: float | None = None,
) -> Iterator[MapEvent]

Run an AI reader across every paper in a prior search/lookup result set. Yields MapProgressEvent updates during the run (OAuth streaming path) followed by a single MapResultEvent.

ParameterTypeDescription
questionrequiredstrPrompt asked against each paper.
from_resultsrequiredstrResult ID from a prior search/lookup, e.g. "s_14bebc10".
timeoutfloat= 300 sMap uses the slow-command default.

client.pull()

client.pull(
    target: str,
    dest: str | None = None,
    *,
    timeout: float | None = None,
) -> ExecuteResult

Download a paper or single file from the virtual filesystem. The ExecuteResult contains download_url / download_filename for binary payloads.

ParameterTypeDescription
targetrequiredstrPaper or file, e.g. "PMC10791696" or "PMC10791696/figures/fig1.jpg".
deststr= current directoryDestination directory on the server side of the command.
timeoutfloat= 120 sClient timeout in seconds.

client.ask_image()

client.ask_image(
    path: str,
    question: str | None = None,
    *,
    fn: str | None = None,
    timeout: float | None = None,
) -> ExecuteResult
ParameterTypeDescription
pathrequiredstrFigure path, e.g. "PMC11576387/figures/fx1.jpg".
questionstr= "Describe this figure in detail."Free-form question about the figure.
fnstr= free-form promptCanned flows: "describe" or "extract-data".
timeoutfloat= 300 sask-image uses the slow-command default.

client.bash()

client.bash(
    script: str,
    *,
    timeout: float | None = None,
) -> ExecuteResult

Run an arbitrary server-side pipeline, exactly like paperclip bash '…'.

result = client.bash('search "protein folding" | grep -i "deep learning"')
ParameterTypeDescription
scriptrequiredstrA single shell-style command string.
timeoutfloat= 120 sClient timeout in seconds.

client.results

client.results.list(*, limit: int | None = None) -> list[ResultRow]
client.results.get(result_id: str) -> ResultData

results.list() returns recently saved results for the authenticated user. results.get() fetches the raw saved output for a specific ID (e.g. "s_14bebc10", "m_ec2c9cc9").

ParameterTypeDescription
limitint= 20 (server default)Max rows returned by results.list().
result_idrequiredstrResult ID passed to results.get().

client.papers.*

Typed wrappers over the virtual-filesystem commands. Each method returns an ExecuteResult.

client.papers.cat(path)
client.papers.head(path, *, lines=None)      # default 10 lines (server default)
client.papers.tail(path, *, lines=None)      # default 10 lines (server default)
client.papers.ls(path)
client.papers.grep(pattern, path, *, ignore_case=False, extended=False)
client.papers.scan(path, patterns)           # list of patterns OR'd together
When a flag like lines, ignore_case, or extended is left unset, the SDK omits it from the command and the server's shell-style default applies.

Escape hatches: execute() & stream()

client.execute(command: str, args: Sequence[str] | None = None, *, timeout=None) -> ExecuteResult
client.stream(command: str, args: Sequence[str] | None = None, *, timeout=None) -> Iterator[MapEvent]

For commands without a typed wrapper — sed, awk, sort, cut, tr, jq, or any future server command — pass the argv tokens as a list and the SDK shell-quotes them for you.

result = client.execute(
    "awk",
    ["-F", "\t", "{print $1}", "/papers/PMC1/content.lines"],
)

stream() is a streaming equivalent; today only "map" streams (other commands raise ValueError).

Errors & result types

All HTTP and network failures raise a subclass of PaperclipError:

from gxl_paperclip import (
    AuthError, RateLimitError, NotFoundError, ServerError,
    RequestTimeoutError, NetworkError,
)

try:
    client.search("AlphaFold")
except AuthError:
    ...  # invalid API key or expired credentials
except RateLimitError:
    ...  # HTTP 429
except RequestTimeoutError:
    ...  # client-side timeout
except ServerError as exc:
    print(exc.status_code, exc.body)

Result dataclasses you'll encounter:

  • ExecuteResult — output, exit_code, elapsed_ms, result_id, download_url, download_filename, cwd, raw
  • MapProgressEvent — total, completed, failed, elapsed_s
  • MapResultEvent — output, result_id, elapsed_ms, exit_code
  • ResultRow — result_id, command, raw_input, latency_ms, created_at, raw
  • ResultData — result_id, output, command, raw_input, latency_ms, created_at, raw
  • HealthStatus — reachable, output, exit_code, elapsed_ms