Vockal's Security Architecture

Vockal was designed to accept commands from YOUR PHONE ONLY. It is fundamentally impossible for anyone to access your desktop through Vockal without your phone. This is due to how our software is designed.

Authentication

No Accounts,

Just Devices

Vockal has no user accounts. Instead, authentication is tied directly to your physical devices through a QR pairing flow:

Your desktop app displays a QR code containing a pairing identifier.

Your phone scans the QR code and sends the pairing request to the server.

The server issues unique authentication tokens to both devices. Each token is stored in OS-encrypted storage.

Windows

DPAPI

Credential Manager

iOS

Keychain

Keychain Services

Android

Keystore

EncryptedSharedPreferences

These tokens authenticate all WebSocket and REST connections. There are no passwords, no OAuth flows, no session cookies - nothing to phish.

Recovery is handled via email and a 6-digit verification code. When recovery completes, new tokens are issued and old ones are invalidated.

Permissions

Strict Action

Allowlist

Vockal operates on an explicit allowlist. Only a defined set of action types can be executed. Everything else is rejected.

Permitted Actions

Click at coordinates
Type text
Scroll
Switch application
Navigate to URL
Take screenshot

Permanently Banned

Shell commands
File system access
Registry edits
Process management
Network configuration
System modifications

The desktop binary does not contain code to execute anything outside this list. This is not a policy that can be overridden - it is an architectural constraint. Even if the server, LLM, or network is fully compromised, the desktop physically cannot execute a banned action because the code path does not exist.

Validation happens at two independent layers: the server validates action types before dispatching, and the desktop validates independently before executing. Both must agree for an action to proceed.

Data Flow

Where Your

Data Goes

A transparent breakdown of what data leaves your devices and where it ends up.

Audio

Sent to Google Cloud Speech-to-Text for transcription. Not stored on Vockal servers. Subject to Google Cloud's data processing terms.

Screenshots

Captured on-demand when the AI needs visual context. Stored temporarily (10-minute window) for multi-step reasoning. Sent to the LLM provider (OpenRouter) for analysis.

Commands

Your voice commands are transcribed and sent to the LLM provider to determine what actions to take. Transcriptions and action history are stored in our database for session continuity.

Context

Active app name and monitor layout are shared with the server to help the AI target the correct window and coordinates.

Transparency

Honest Threat

Model

Server Compromise

Every action dispatched to your desktop is signed with a per-session cryptographic key that only your desktop holds. Even with full server access, an attacker cannot forge valid action signatures. Beyond that, only actions from the allowlist are accepted - they can click, type, and scroll, but cannot execute shell commands, access your files, or modify your system.

Device Compromise

If an attacker gains root access to your phone or desktop, they can extract the authentication token. But at that point they already have full control of your device - Vockal adds no new attack surface.

Constraints

What Our Server

Can and Cannot Do

The server can:

Relay actions between devices
Read transcriptions and action parameters
Send context to LLM providers
Store session history
Manage device pairing state

The server cannot:

Execute anything outside the action allowlist
Access your file system or registry
Run shell commands on your desktop
Control your desktop without an authenticated device session

Actions are signed with a per-session cryptographic key stored only on your desktop. Even with full server access, forging a valid action signature is not possible. Combined with the strict action allowlist, the blast radius of any compromise is minimal.

Privacy

Minimal Data,

Honest Practices

Audio

Streamed to third-party STT. Never stored on our servers. Raw audio discarded after transcription.

Screenshots

Stored temporarily for multi-step AI reasoning. Expire after 10 minutes. Sent to LLM provider for visual analysis.

Accounts

None. No usernames, passwords, or profiles. Device tokens are the only credentials.

Data Sales

Never sold, rented, or traded. Not to advertisers, not to data brokers.

Third Parties

Audio processed by Google Cloud. AI reasoning by OpenRouter. Both operate under their own privacy policies.