Vockal's Security Architecture
Vockal was designed to accept commands from YOUR PHONE ONLY. It is fundamentally impossible for anyone to access your desktop through Vockal without your phone. This is due to how our software is designed.
Authentication
No Accounts,
Just Devices
Vockal has no user accounts. Instead, authentication is tied directly to your physical devices through a QR pairing flow:
Your desktop app displays a QR code containing a pairing identifier.
Your phone scans the QR code and sends the pairing request to the server.
The server issues unique authentication tokens to both devices. Each token is stored in OS-encrypted storage.
Windows
DPAPI
Credential Manager
iOS
Keychain
Keychain Services
Android
Keystore
EncryptedSharedPreferences
These tokens authenticate all WebSocket and REST connections. There are no passwords, no OAuth flows, no session cookies - nothing to phish.
Recovery is handled via email and a 6-digit verification code. When recovery completes, new tokens are issued and old ones are invalidated.
Permissions
Strict Action
Allowlist
Vockal operates on an explicit allowlist. Only a defined set of action types can be executed. Everything else is rejected.
Permitted Actions
- Click at coordinates
- Type text
- Scroll
- Switch application
- Navigate to URL
- Take screenshot
Permanently Banned
- Shell commands
- File system access
- Registry edits
- Process management
- Network configuration
- System modifications
The desktop binary does not contain code to execute anything outside this list. This is not a policy that can be overridden - it is an architectural constraint. Even if the server, LLM, or network is fully compromised, the desktop physically cannot execute a banned action because the code path does not exist.
Validation happens at two independent layers: the server validates action types before dispatching, and the desktop validates independently before executing. Both must agree for an action to proceed.
Data Flow
Where Your
Data Goes
A transparent breakdown of what data leaves your devices and where it ends up.
Sent to Google Cloud Speech-to-Text for transcription. Not stored on Vockal servers. Subject to Google Cloud's data processing terms.
Captured on-demand when the AI needs visual context. Stored temporarily (10-minute window) for multi-step reasoning. Sent to the LLM provider (OpenRouter) for analysis.
Your voice commands are transcribed and sent to the LLM provider to determine what actions to take. Transcriptions and action history are stored in our database for session continuity.
Active app name and monitor layout are shared with the server to help the AI target the correct window and coordinates.
Transparency
Honest Threat
Model
Server Compromise
Every action dispatched to your desktop is signed with a per-session cryptographic key that only your desktop holds. Even with full server access, an attacker cannot forge valid action signatures. Beyond that, only actions from the allowlist are accepted - they can click, type, and scroll, but cannot execute shell commands, access your files, or modify your system.
Device Compromise
If an attacker gains root access to your phone or desktop, they can extract the authentication token. But at that point they already have full control of your device - Vockal adds no new attack surface.
Constraints
What Our Server
Can and Cannot Do
The server can:
- Relay actions between devices
- Read transcriptions and action parameters
- Send context to LLM providers
- Store session history
- Manage device pairing state
The server cannot:
- Execute anything outside the action allowlist
- Access your file system or registry
- Run shell commands on your desktop
- Control your desktop without an authenticated device session
Actions are signed with a per-session cryptographic key stored only on your desktop. Even with full server access, forging a valid action signature is not possible. Combined with the strict action allowlist, the blast radius of any compromise is minimal.
Privacy
Minimal Data,
Honest Practices
Streamed to third-party STT. Never stored on our servers. Raw audio discarded after transcription.
Stored temporarily for multi-step AI reasoning. Expire after 10 minutes. Sent to LLM provider for visual analysis.
None. No usernames, passwords, or profiles. Device tokens are the only credentials.
Never sold, rented, or traded. Not to advertisers, not to data brokers.
Audio processed by Google Cloud. AI reasoning by OpenRouter. Both operate under their own privacy policies.