💡
Heads-up: VoxityAI™ is powered by the advanced real-time voice transformation engine originally developed in the open-source W-Okada project - one of the most powerful free tools in this space today.

However, despite its impressive capabilities, W-Okada can be complex to install and configure - especially for users without technical experience.

That's where VoxityAI™ steps in: a streamlined, plug-and-play solution designed to remove the friction and deliver results instantly.

✅ Built around the proven W-Okada core, but wrapped in a minimal and intuitive interface
✅ Ships with a curated selection of voice presets ready to use
✅ No complicated setup or technical steps required
✅ Tailored for English-speaking environments and workflows

You can either spend hours configuring open-source tools — or launch VoxityAI™ in minutes and jump straight into voice-changing.

👉 Get started now at voxityai.com

VoxityAI™ Documentation

Professional real-time voice transformation made simple

Welcome to the comprehensive documentation for VoxityAI™, the premier real-time AI voice changer that transforms your voice instantly for gaming, streaming, content creation, and professional applications. Built on the powerful W-Okada engine developed by Watanabe Okada, VoxityAI™ delivers studio-quality voice transformation with unprecedented ease of use.

VoxityAI™ represents a breakthrough in real-time voice conversion technology, utilizing state-of-the-art artificial intelligence models including RVC (Retrieval-based Voice Conversion), Beatrice v2, MMVC, so-vits-svc, and DDSP-SVC. These cutting-edge algorithms enable natural-sounding voice transformations while preserving emotional content, intonation, and speech patterns.

🚀

Instant Setup

Launch in minutes with our streamlined installation process - no complex configuration required

🎯

Real-Time Processing

Sub-150ms latency voice transformation for live applications using advanced AI models

🎭

Multiple Voice Models

Support for RVC, Beatrice v2, MMVC, so-vits-svc, and DDSP-SVC voice models

High Performance

GPU acceleration for NVIDIA, AMD, and Intel hardware with CPU fallback support

🔒

Privacy First

100% local processing - your voice data never leaves your device

🌐

Cross-Platform

Works seamlessly on Windows, macOS, Linux, and Google Colab environments

Core Technology

At its foundation, VoxityAI™ leverages the W-Okada voice changer engine, a revolutionary open-source project that has redefined real-time voice conversion. Unlike traditional pitch shifters or simple voice effects, our system uses sophisticated neural networks to analyze and transform vocal characteristics while maintaining the naturalness and emotional expression of human speech.

The technology supports server-client architecture, allowing users to distribute computational load across multiple devices. This means you can run the voice processing server on a powerful desktop while using the converted voice on a lighter device, perfect for gaming setups where system resources are precious.

Use Cases and Applications

VoxityAI™ excels across diverse applications where real-time voice transformation enhances user experience and creative expression:

Gaming and Entertainment

Transform your voice to match in-game characters in real-time, creating immersive role-playing experiences. Popular with VTubers and streamers who need consistent character voices across long streaming sessions. The low-latency processing ensures natural conversations without disruptive delays.

Content Creation

Professional voiceover artists use VoxityAI™ to expand their vocal range, creating multiple character voices for animations, podcasts, and audiobooks. The high-quality output rivals traditional studio processing while offering real-time flexibility.

Privacy and Communication

Maintain anonymity during online meetings, gaming sessions, or content creation. Perfect for streamers who want to protect their identity while engaging with audiences, or professionals conducting sensitive discussions.

Language Learning and Accessibility

Simulate different accents for language learning practice, or assist individuals with speech difficulties in achieving their desired vocal expression. The technology preserves speech patterns while transforming vocal characteristics.

Live Streaming and Broadcasting

Seamlessly integrate with OBS Studio, Discord, Twitch, and other platforms. Create engaging content with character voices that respond instantly to your speech, keeping audiences entertained without breaking immersion.

Technical Innovation

VoxityAI™ incorporates several breakthrough technologies that set it apart from conventional voice changers:

  • Neural Voice Conversion: Advanced AI models that understand vocal characteristics at a fundamental level, enabling natural transformations that preserve emotional content and speaking style.
  • Real-Time Processing: Optimized inference pipelines that deliver voice conversion with minimal latency, suitable for live conversations and interactive applications.
  • Cross-Platform Compatibility: Unified codebase that works seamlessly across Windows, macOS, and Linux, with automatic hardware optimization for different GPU vendors.
  • Modular Architecture: Support for multiple AI model types allows users to choose the best algorithm for their specific needs, from lightweight real-time models to high-quality offline processing.
  • Advanced Audio Processing: Built-in noise suppression, echo cancellation, and audio enhancement features ensure clean, professional output in any environment.

System Requirements

Ensure optimal performance with the right hardware configuration

VoxityAI™ is designed to work across a wide range of hardware configurations, from basic setups suitable for casual use to high-performance systems optimized for professional content creation. Understanding your system's capabilities helps optimize performance and achieve the best possible voice conversion quality.

Minimum Requirements

  • Operating System: Windows 10 (64-bit), macOS 10.15+, or Ubuntu 18.04+
  • Processor: Intel i5-4590 / AMD FX 8350 equivalent (4+ cores recommended)
  • Memory: 8 GB RAM system memory
  • Storage: 4 GB available space (additional space for voice models)
  • Audio Hardware: Compatible audio input/output device or USB headset
  • Network: Internet connection for initial setup and model access
  • Browser: Chrome 90+, Firefox 88+, or Safari 14+ for web interface

Expected Performance: 300-800ms latency, suitable for casual use and testing

GPU Acceleration Support

VoxityAI™ leverages GPU acceleration to minimize voice conversion latency and maximize quality. Different GPU architectures offer varying levels of performance optimization:

NVIDIA (CUDA)

Best overall performance with RTX series cards. Requires CUDA 11.7+ drivers and compatible PyTorch installation. Supports Tensor RT optimization for maximum performance.

Recommended Models:
  • RTX 4060/4070/4080/4090 (Latest generation - optimal performance)
  • RTX 3060/3070/3080/3090 (Excellent performance with 12GB+ VRAM)
  • RTX 2060/2070/2080 (Good performance for most models)
  • GTX 1660/1070/1080 (Basic support, limited to smaller models)
Performance Notes:
  • RTX 4090: 30-60ms latency with complex models
  • RTX 3070: 80-150ms latency typical
  • RTX 2060: 150-250ms latency acceptable
Highest Performance

AMD (DirectML/ROCm)

Good performance with RX 6000+ series. DirectML support provides hardware acceleration on Windows, while ROCm enables Linux acceleration.

Recommended Models:
  • RX 7800 XT/7900 XTX (Latest generation with excellent DirectML)
  • RX 6700 XT/6800 XT/6900 XT (Proven performance with 12GB+ VRAM)
  • RX 580/590/5700 XT (Basic support, adequate for lightweight models)
Configuration Tips:
  • Use ONNX model format for best AMD compatibility
  • Enable Smart Access Memory for improved performance
  • DirectML works best on Windows 11 with latest drivers
Good Performance

Intel (DirectML/XPU)

Emerging support with Intel Arc series and integrated graphics. Excellent power efficiency on laptops with Intel processors.

Supported Hardware:
  • Intel Arc A750/A770 (Good acceleration for discrete cards)
  • Intel Arc A380 (Basic acceleration, entry-level)
  • Intel Iris Xe (Limited acceleration, CPU fallback recommended)
  • 12th/13th gen integrated graphics (Basic support)
Emerging Support

Apple Silicon (Metal)

Excellent efficiency on M1/M2/M3 Macs with optimized Metal Performance Shaders integration. Unified memory architecture provides unique advantages.

Performance by Model:
  • M3 Pro/Max/Ultra (Exceptional performance, 40-100ms latency)
  • M2 Pro/Max/Ultra (Excellent performance, 60-150ms latency)
  • M1 Pro/Max/Ultra (Very good performance, 80-200ms latency)
  • M1/M2 Base (Good performance for lightweight models)
Unique Benefits:
  • No separate VRAM limitation due to unified memory
  • Excellent power efficiency for battery-powered use
  • Automatic thermal management prevents overheating
High Efficiency

Performance Optimization Guidelines

Voice conversion latency and quality depend on several interconnected factors. Understanding these relationships helps optimize your setup for specific use cases:

Hardware Impact on Performance

  • High-end GPU (RTX 4080+): 50-100ms typical latency, suitable for professional streaming and content creation
  • Mid-range GPU (RTX 3060/RX 6700 XT): 100-200ms typical latency, excellent for gaming and casual streaming
  • Entry-level GPU/CPU: 200-500ms typical latency, adequate for offline content creation and testing
  • Optimal buffer settings: Chunk size 256-512 samples, Extra buffer 8192-16384 samples

Model Complexity vs. Performance

  • Beatrice v2 models: Lightweight, 50-150ms latency, ideal for real-time applications
  • RVC models: Balanced quality/performance, 100-300ms latency, most versatile
  • MMVC models: Highest quality, 200-500ms latency, best for content creation
  • Custom trained models: Performance varies based on architecture and training parameters

Audio Configuration Impact

  • Sample Rate: 48kHz recommended for best quality, 44.1kHz for performance
  • Buffer Size: Smaller buffers reduce latency but require more processing power
  • Bit Depth: 32-bit float recommended for professional use, 16-bit adequate for casual use
  • Processing Mode: Server mode generally provides better performance than client mode

Storage and Network Requirements

While VoxityAI™ has modest storage requirements for the base installation, voice models and user data can accumulate significantly over time:

  • Base Installation: 2-4 GB depending on platform and included components
  • Voice Models: 100MB-2GB per model, depending on complexity and quality
  • Model Cache: 1-5 GB for temporary files and conversion cache
  • User Data: Audio recordings, configurations, and custom settings
  • Network Usage: Initial model access, updates, and cloud synchronization (optional)

SSD storage is strongly recommended for the main installation and frequently used models, as loading times significantly impact user experience when switching between voice models during live sessions.

Quick Start Guide

Get VoxityAI™ running in under 5 minutes

This quick start guide will have you using VoxityAI™ for real-time voice transformation in just a few minutes. Whether you're a streamer, gamer, or content creator, these steps will get you up and running with professional-quality voice conversion.

1

Launch VoxityAI™

Start the application from your desktop shortcut or applications menu. VoxityAI™ will automatically initialize the web-based interface in your default browser, typically opening at localhost:18888.

Default URL: http://localhost:18888

First Launch Notes: The initial startup may take 2-3 minutes as VoxityAI™ initializes core components, loads voice models, and optimizes settings for your hardware. This delay only occurs during the first launch.

Browser Compatibility: Chrome and Edge provide the best performance, though Firefox and Safari are also supported with slightly reduced features.
2

Configure Audio Devices

Proper audio configuration is crucial for optimal voice conversion. Select your input and output devices in the audio settings panel.

Input Device: Choose your microphone or audio interface. USB headsets generally provide the most consistent results.
Direct Output: Select your headphones/speakers to monitor the transformed voice in real-time.
Virtual Cable Output: Route to virtual audio device for integration with streaming software, Discord, or games.
Pro Setup: For streaming or Discord, install VB-Cable (Windows) or BlackHole (macOS) to route processed audio to broadcasting applications while maintaining separate monitoring.
3

Select Voice Model

VoxityAI™ includes several pre-configured voice models optimized for different use cases. You can also import custom models or access community-created voices.

RVC Models: Excellent for character voices and gender transformation with natural-sounding results
Beatrice v2: Optimized for real-time performance with minimal latency, perfect for gaming
MMVC: Highest quality conversion for professional content creation and voiceover work

Model Selection Tips: Start with Beatrice v2 models for lowest latency, then experiment with RVC models for higher quality. Each model type has different performance characteristics and optimal use cases.

4

Fine-Tune Settings

Adjust voice conversion parameters to achieve your desired sound. Most users can achieve excellent results with minimal adjustments.

Pitch (f0Factor): Adjust voice pitch - positive values raise pitch (male→female), negative values lower it
Index Ratio: Controls model feature influence (0.0-1.0) - higher values follow the model more closely
Chunk Size: Balance latency vs. quality - smaller values reduce delay but require more processing power
F0 Detector: Pitch detection algorithm - RMVPE offers best quality, DIO provides fastest processing

Ready to Transform!

VoxityAI™ is now processing your voice in real-time. The converted audio can be used with Discord, OBS Studio, Zoom, games, or any application that accepts audio input. Experiment with different models and settings to discover your perfect voice!

Testing Your Setup

Before using VoxityAI™ in your target application, perform these quick tests to ensure everything is working correctly:

Audio Quality Test

  1. Speak into your microphone and listen to the converted voice through your headphones
  2. Check for audio artifacts, distortion, or unnatural sounds
  3. Adjust the input gain if the voice sounds too quiet or distorted
  4. Test different voice models to find the best match for your voice type

Latency Test

  1. Count from 1 to 10 while monitoring the converted voice
  2. If delay is noticeable, reduce chunk size or switch to a lighter model
  3. For real-time applications, aim for sub-200ms total latency
  4. Consider your internet connection if using remote processing

Integration Test

  1. Open your target application (Discord, OBS, game, etc.)
  2. Select VoxityAI™ output as the microphone input device
  3. Test voice chat or recording functionality
  4. Verify that the converted voice is transmitted clearly

Common First-Launch Issues

Browser doesn't open automatically

Manually navigate to http://localhost:18888 in your preferred browser. Ensure no firewall or antivirus software is blocking the connection.

No microphone detected

Check browser permissions for microphone access. On Windows, verify privacy settings allow microphone access for browsers. Restart the browser after granting permissions.

High latency or poor quality

Start with smaller chunk sizes (256) and lighter models (Beatrice v2). Allow GPU drivers time to optimize, and close unnecessary applications consuming system resources.

Voice sounds robotic or unnatural

Reduce pitch adjustment to ±4 semitones maximum. Try different F0 detectors (RMVPE or Crepe). Ensure the voice model matches your gender and vocal range.

Next Steps

Once you have VoxityAI™ working with basic settings, explore these advanced features:

  • Custom Models: Import community-created voice models or train your own
  • Advanced Audio: Configure noise suppression, echo cancellation, and audio enhancement
  • Streaming Integration: Set up professional-grade audio routing for content creation
  • Performance Optimization: Fine-tune settings for your specific hardware and use case
  • Backup and Sync: Configure settings backup and voice model synchronization

Initial Configuration

Optimize VoxityAI™ settings for your specific setup and use case

After completing the quick start guide, this comprehensive configuration section will help you optimize VoxityAI™ for professional-grade performance. These settings can significantly impact both audio quality and system performance.

Audio System Deep Configuration

Professional audio configuration requires understanding the complete signal chain from microphone input through processing to final output. VoxityAI™ provides extensive control over every aspect of this pipeline.

Input Device Optimization

The quality of your voice conversion starts with clean input signals. Different microphone types require specific optimization approaches:

USB Headsets and Gaming Mics

  • Advantages: Plug-and-play convenience, built-in noise cancellation, consistent positioning
  • Configuration: Set input gain to 1.0, enable hardware noise suppression if available
  • Recommended Models: SteelSeries Arctis, HyperX Cloud series, Audio-Technica ATH-G1
  • Optimization Tips: Position 2-3 fingers away from mouth, avoid breathing directly into capsule

Professional Audio Interfaces

  • Advantages: Superior audio quality, XLR microphone support, hardware monitoring
  • Configuration: Set interface as ASIO device, configure low-latency monitoring
  • Recommended Interfaces: Focusrite Scarlett series, PreSonus AudioBox, Behringer UMC series
  • Gain Staging: Adjust interface preamp for -12dB to -6dB peaks, avoid clipping

Studio Microphones

  • Dynamic Mics: Excellent noise rejection, require higher gain, ideal for noisy environments
  • Condenser Mics: Superior sensitivity and detail, require phantom power, best in treated rooms
  • Positioning: Maintain consistent distance, use pop filters, consider room acoustics
  • Popular Choices: Shure SM7B (dynamic), Audio-Technica AT2020 (condenser)

Processing Mode Selection

VoxityAI™ offers multiple processing architectures optimized for different use cases and hardware configurations:

Server Mode (Recommended)

Processes audio on the dedicated server instance, providing optimal performance and resource management.

✓ Lower client system load ✓ Better GPU utilization ✓ Consistent performance ✓ Multiple client support

Best For: Gaming setups, streaming, professional use

Client Mode

Processes audio locally within the browser or client application, useful for distributed setups.

✓ No server dependency ✓ Lower network usage ✓ Privacy advantages ✓ Offline capability

Best For: Portable setups, privacy-focused use, offline processing

AI Model Configuration

Understanding the characteristics and optimal use cases for different AI model types enables you to choose the best algorithm for your specific needs:

RVC (Retrieval-based Voice Conversion)

The most versatile and widely-used voice conversion technology, offering excellent balance between quality and performance.

Quality: Excellent (8.5/10)
Performance: Good (7/10)
VRAM Required: 4-8GB
Typical Latency: 100-250ms
Training Data: 10-60 minutes
Model Size: 50-200MB
Optimal Use Cases:
  • Character voice acting and content creation
  • Gender transformation with natural results
  • Celebrity voice impressions and parodies
  • Language learning and accent modification
Configuration Tips:
  • Use RMVPE f0 detector for best pitch accuracy
  • Index ratio 0.6-0.8 balances source and target characteristics
  • Enable voice activity detection to reduce artifacts during silence
  • Adjust transpose parameter gradually (±1 semitone increments)

Beatrice v2

Lightweight model specifically engineered for real-time applications where low latency is critical.

Quality: Good (7/10)
Performance: Excellent (9.5/10)
VRAM Required: 2-4GB
Typical Latency: 50-120ms
Training Data: 20-120 minutes
Model Size: 20-80MB
Optimal Use Cases:
  • Real-time gaming communication
  • Live streaming with minimal delay
  • Interactive VR/AR applications
  • Low-resource hardware setups
Configuration Tips:
  • Works optimally with smaller chunk sizes (128-256)
  • Less sensitive to pitch adjustments than RVC models
  • Performs well with basic f0 detectors (DIO, Harvest)
  • Enable real-time optimization for lowest latency

MMVC (Many-to-Many Voice Conversion)

High-quality conversion model that excels at preserving emotional nuance and speaking patterns.

Quality: Excellent (9/10)
Performance: Moderate (6/10)
VRAM Required: 6-12GB
Typical Latency: 200-400ms
Training Data: 30-180 minutes
Model Size: 100-500MB
Optimal Use Cases:
  • Professional voiceover and dubbing
  • High-quality content creation
  • Audiobook and podcast production
  • Film and animation voice work

so-vits-svc

Specialized architecture for singing voice conversion, maintaining musical elements while transforming vocal timbre.

Quality: Excellent for singing (9/10)
Performance: Moderate (6/10)
VRAM Required: 4-8GB
Typical Latency: 250-500ms
Optimal Use Cases:
  • AI music covers and vocal synthesis
  • Singing voice transformation
  • Musical content creation
  • Vocal range extension for singers

Performance Optimization

Fine-tuning performance parameters allows you to achieve the optimal balance between audio quality, latency, and system resource usage:

Chunk Size Configuration

The chunk size parameter directly controls the trade-off between processing latency and audio quality:

128-256 samples:
  • Minimum latency (50-100ms additional)
  • Higher CPU/GPU usage
  • May reduce quality on slower hardware
  • Best for: Real-time gaming, live interaction
512-1024 samples:
  • Balanced latency (100-200ms additional)
  • Optimal for most use cases
  • Good quality/performance ratio
  • Best for: Streaming, casual content creation
1536-2048 samples:
  • Highest quality processing
  • Increased latency (200-400ms)
  • Better for complex transformations
  • Best for: Professional content, offline processing

Extra Buffer Management

The extra buffer provides additional processing headroom and can prevent audio artifacts:

  • 4096-8192: Minimal buffering for real-time applications, may cause dropouts under load
  • 8192-16384: Standard buffering, good balance for most users and hardware
  • 16384-32768: Maximum stability, prevents artifacts but increases memory usage
  • Auto-adjustment: VoxityAI™ can automatically optimize buffer size based on system performance

F0 Detector Selection

Pitch detection algorithms vary significantly in quality, performance, and resource requirements:

RMVPE (Recommended):
  • Highest accuracy for pitch detection
  • Excellent handling of vibrato and pitch bends
  • Moderate CPU usage, GPU acceleration available
  • Best for: Professional use, singing, complex speech patterns
Crepe/Crepe-Full:
  • Very high quality, especially for singing
  • Excellent noise robustness
  • Higher CPU/GPU usage than RMVPE
  • Best for: Music applications, noisy environments
Harvest:
  • Good balance of quality and performance
  • Works well on older hardware
  • Reliable for most speech applications
  • Best for: General use, resource-constrained systems
DIO:
  • Fastest processing speed
  • Lowest resource usage
  • Basic quality suitable for simple transformations
  • Best for: Real-time applications on limited hardware

Advanced Audio Processing Features

VoxityAI™ includes sophisticated audio processing capabilities that can significantly enhance voice conversion quality:

Noise Suppression Systems

Multiple noise reduction algorithms work together to provide clean input signals:

Spectral Noise Reduction:
  • Analyzes frequency spectrum to identify and reduce constant background noise
  • Effective against: Air conditioning, computer fans, electrical hum
  • Settings: Mild (preserve naturalness) to Aggressive (maximum reduction)
AI-Powered Noise Suppression:
  • Machine learning algorithms distinguish between voice and noise
  • Effective against: Keyboard typing, mouse clicks, intermittent sounds
  • Adapts in real-time to changing acoustic environments
Echo Cancellation:
  • Removes acoustic feedback and room reflections
  • Essential for speaker-based monitoring setups
  • Automatically adapts to room acoustics and speaker positioning

Dynamic Range and Level Control

Precise control over audio levels ensures optimal voice conversion and prevents artifacts:

  • Input Gain (0.1-3.0): Amplify or attenuate microphone signal before processing
  • Output Gain (0.1-5.0): Control converted voice volume for optimal integration
  • Silence Threshold: Minimum volume level required to trigger voice conversion
  • Voice Activity Detection: Intelligent detection of speech vs. silence periods
  • Auto-Leveling: Automatic gain control to maintain consistent output levels
  • Peak Limiting: Prevents audio clipping and distortion in loud passages

Sample Rate and Quality Settings

Audio quality parameters that balance fidelity with performance requirements:

  • 48kHz/32-bit float: Professional quality, highest fidelity, increased processing load
  • 44.1kHz/24-bit: High quality, good balance for most applications
  • 22kHz/16-bit: Basic quality, minimal processing requirements
  • Adaptive Quality: Automatically adjusts based on system performance and model requirements

Windows Installation Guide

Complete setup guide for Windows 10/11 with GPU optimization and professional configuration

Windows installation of VoxityAI™ offers the most comprehensive feature set and best performance optimization. This guide covers everything from basic installation to advanced GPU configuration and professional audio setup.

Pre-Installation System Preparation

Proper system preparation ensures smooth installation and optimal performance. Complete these steps before installing VoxityAI™:

Essential System Components

GPU-Optimized Installation Paths

VoxityAI™ provides hardware-specific builds to maximize performance on different GPU architectures. Choose the build that matches your system configuration:

🟢 NVIDIA GPU Installation (CUDA)

Recommended

Optimal for: RTX 20/30/40 series, GTX 16 series, and professional NVIDIA GPUs with 4GB+ VRAM

Hardware Requirements

  • NVIDIA GPU with Compute Capability 6.0+ (GTX 1060 or newer)
  • NVIDIA Game Ready or Studio Drivers 472.12+
  • CUDA Toolkit 11.7/11.8 (included with VoxityAI™)
  • 4GB VRAM minimum, 8GB+ recommended for large models
  • PCIe 3.0 x16 slot for optimal bandwidth

Installation Process

1
Access CUDA Build: Locate vcclient_win_cuda_[version].zip (~3.7GB)

This package includes optimized PyTorch with CUDA 11.8 support and cuDNN libraries

2
Extract Installation: Unzip to clean directory path (avoid spaces/special characters)
Recommended: C:\VoxityAI\ or D:\Software\VoxityAI\
3
Initialize System: Run start_http.bat as Administrator

First launch installs CUDA runtime and dependencies (~500MB additional)

4
Verify GPU Detection: Check console output for CUDA initialization
✓ CUDA device detected: GeForce RTX 4070 Ti (12GB VRAM) ✓ cuDNN initialized successfully ✓ PyTorch CUDA backend ready

Expected Performance (CUDA)

  • RTX 4090: 30-60ms latency, handles any model size
  • RTX 4070/4080: 50-100ms latency, excellent for all use cases
  • RTX 3070/3080: 80-150ms latency, very good performance
  • RTX 2060/2070: 120-250ms latency, good for streaming

🔴 AMD GPU Installation (DirectML)

Optimal for: RX 5000/6000/7000 series AMD GPUs with DirectML acceleration support

Hardware Requirements

  • AMD GPU with DirectML support (RX 580 or newer)
  • AMD Adrenalin drivers 22.5.1 or newer
  • Windows 10 version 1903+ (DirectML framework requirement)
  • 6GB VRAM minimum for optimal performance
  • Smart Access Memory enabled (if supported)

Installation Process

1
Access Standard Build: Locate vcclient_win_std_[version].zip

Includes DirectML runtime and ONNX optimization for AMD hardware

2
Verify DirectML: Ensure DirectML is enabled in Windows Features
Windows Features → Machine Learning Platform
3
Optimize for ONNX: Configure VoxityAI™ to prefer ONNX model format

ONNX models provide 20-40% better performance on AMD GPUs

AMD-Specific Optimizations

  • ONNX Conversion: Convert PyTorch models to ONNX for better performance
  • Memory Management: Enable GPU memory optimization in AMD drivers
  • Power Settings: Set GPU power limit to maximum for consistent performance
  • Compute Workloads: Enable GPU compute optimizations in Adrenalin software

🔵 Intel GPU Installation (XPU/DirectML)

Beta Support

Optimal for: Intel Arc discrete GPUs and 12th/13th gen integrated graphics

Hardware Requirements

  • Intel Arc A-series GPU or 11th gen+ CPU with Iris Xe
  • Intel Graphics drivers 30.0.101.1404 or newer
  • Intel XPU runtime and oneAPI toolkit (auto-installed)
  • 4GB+ system memory allocated to integrated graphics

⚠️ Beta Status Notice

Intel GPU support is experimental and may have stability issues. CPU fallback is automatically used if GPU acceleration fails. Performance varies significantly between different Intel architectures.

Expected Performance (Intel)

  • Arc A770: 150-300ms latency, decent for lightweight models
  • Arc A750: 200-400ms latency, basic voice conversion
  • Iris Xe: CPU fallback recommended for real-time use

⚫ CPU-Only Installation

Universal compatibility for systems without supported GPU acceleration

Access the standard version and VoxityAI™ automatically configures CPU-based processing. While slower than GPU acceleration, modern multi-core processors can achieve acceptable performance for many voice conversion tasks.

CPU Performance Optimization

  • Thread Allocation: VoxityAI™ uses 50-75% of available CPU cores
  • Priority Settings: Set process priority to "High" for better responsiveness
  • Power Management: Use "High Performance" power plan during processing
  • Background Apps: Close unnecessary applications to free CPU resources

Performance Expectations (CPU-Only)

  • High-end CPU (i9/Ryzen 9): 300-600ms latency, suitable for content creation
  • Mid-range CPU (i7/Ryzen 7): 500-1000ms latency, adequate for offline processing
  • Entry-level CPU (i5/Ryzen 5): 800ms+ latency, basic functionality only

Post-Installation Configuration

After successful installation, these configuration steps optimize VoxityAI™ for professional use:

System Security and Permissions

Windows Firewall Configuration

VoxityAI™ operates a local web server on port 18888. Configure firewall rules:

Windows Security → Firewall & network protection → Allow an app through firewall Add: MMVCServerSIO.exe (both Private and Public networks)

Antivirus Exclusions

Add VoxityAI™ directory to antivirus exclusions to prevent interference:

  • Exclude entire VoxityAI™ installation folder
  • Exclude voice model directories
  • Exclude temporary processing folders
  • Allow network connections for localhost:18888

Browser Permissions

Configure browser settings for optimal functionality:

  • Allow microphone access for localhost
  • Enable hardware acceleration in browser settings
  • Disable browser audio processing/enhancement
  • Set localhost as trusted site for automatic media playback

Audio System Optimization

Windows Audio Configuration

Optimize Windows audio settings for low-latency performance:

Run → mmsys.cpl → Advanced → Default Format: 48000 Hz, 24-bit Exclusive Mode: Allow applications to take exclusive control Disable all audio enhancements and effects

ASIO Driver Support

For professional audio interfaces, install appropriate ASIO drivers:

  • Interface-specific ASIO: Use manufacturer drivers for best performance
  • ASIO4ALL: Universal ASIO driver for consumer hardware
  • Buffer Settings: 128-256 samples for low latency
  • Sample Rate: Match VoxityAI™ settings (48kHz recommended)

Virtual Audio Cable Setup

Configure virtual audio routing for application integration:

VB-Cable (Free - Basic)
  • ✓ Single virtual cable
  • ✓ Easy installation and setup
  • ✓ Compatible with most applications
  • ✗ Limited to one audio stream
  • ✗ May introduce slight latency

Best for: Discord, basic OBS streaming

VB-Audio Voicemeeter (Free - Advanced)
  • ✓ Full virtual mixing console
  • ✓ Multiple input/output channels
  • ✓ Built-in audio processing
  • ✓ Hardware integration support
  • ✗ More complex configuration

Best for: Complex streaming setups, multiple applications

Virtual Audio Cable (Professional - Paid)
  • ✓ Up to 256 virtual cables
  • ✓ Ultra-low latency design
  • ✓ Professional stability
  • ✓ Advanced configuration options
  • ✗ License cost (~$25)

Best for: Professional studios, commercial use

Performance Tuning

Windows Power Management

  • Set power plan to "High Performance" or "Ultimate Performance"
  • Disable CPU power management (prevent frequency scaling)
  • Set GPU power management to "Prefer maximum performance"
  • Disable Windows Game Mode (can interfere with audio processing)

Process Priority Optimization

Task Manager → Details → MMVCServerSIO.exe → Set Priority: High Set Affinity: Use specific CPU cores for dedicated processing

GPU Driver Optimization

  • NVIDIA: Enable "Prefer maximum performance" in NVIDIA Control Panel
  • AMD: Set GPU power limit to maximum in Adrenalin software
  • Intel: Enable GPU compute scheduling in Windows settings
  • All GPUs: Disable power-saving features during voice processing

Windows-Specific Troubleshooting

Installation and Startup Issues

Application fails to start or crashes immediately

Diagnostic Steps:

  1. Run start_http.bat as Administrator
  2. Check Windows Event Viewer for application errors
  3. Verify all prerequisite software is installed
  4. Temporarily disable antivirus and Windows Defender
  5. Ensure no other software is using port 18888

Common Solutions:

  • Install both x86 and x64 versions of Visual C++ Redistributable
  • Update Windows to latest version
  • Run Windows System File Checker: sfc /scannow
  • Clear Windows audio device cache and restart audio services

CUDA/GPU acceleration not working

NVIDIA GPU Troubleshooting:

  1. Verify GPU compatibility: nvidia-smi in Command Prompt
  2. Update to latest Game Ready or Studio drivers
  3. Check CUDA installation: nvcc --version
  4. Monitor GPU usage during voice processing
  5. Verify sufficient VRAM availability

AMD GPU Troubleshooting:

  1. Ensure DirectML is enabled in Windows Features
  2. Update to latest Adrenalin drivers
  3. Check GPU recognition in Device Manager
  4. Enable GPU scheduling in Windows Graphics settings
  5. Convert models to ONNX format for better compatibility

High latency despite powerful hardware

System-Level Optimizations:

  • Set Windows Timer Resolution to 1ms using TimerTool
  • Disable Windows audio enhancements completely
  • Use dedicated audio hardware with ASIO drivers
  • Reduce Windows DPC latency using LatencyMon tool
  • Disable Windows Update during voice processing sessions

Application-Level Optimizations:

  • Use Server mode instead of Client mode processing
  • Reduce chunk size to 128-256 samples
  • Switch to lighter F0 detector (DIO or Harvest)
  • Enable GPU memory optimization
  • Close unnecessary browser tabs and applications

Audio and Performance Issues

No audio output or silent processing

Audio System Diagnosis:

  1. Test microphone in Windows Sound settings
  2. Verify VoxityAI™ input/output device selection
  3. Check audio format compatibility (48kHz, 24-bit recommended)
  4. Disable exclusive mode temporarily
  5. Test with different audio devices

Voice quality issues (robotic, distorted, unnatural)

Quality Optimization Steps:

  • Reduce pitch adjustment to ±4 semitones maximum
  • Match voice model to your vocal characteristics
  • Use RMVPE or Crepe f0 detector for better pitch accuracy
  • Adjust Index ratio between 0.6-0.8 for natural blending
  • Ensure consistent microphone positioning and distance
  • Enable noise suppression if recording in noisy environment

macOS Installation Guide

Complete setup for Apple Silicon and Intel Macs with optimization tips

macOS installation provides excellent performance on Apple Silicon processors and good compatibility with Intel-based Macs. This guide covers both architectures with specific optimizations for each platform.

macOS Compatibility Overview

🍎 Apple Silicon Macs (Highly Recommended)

Supported Models:

M1, M1 Pro, M1 Max, M1 Ultra, M2, M2 Pro, M2 Max, M3, M3 Pro, M3 Max

macOS Version:

macOS 12.0 (Monterey) or later for optimal performance

Memory Requirements:

8GB unified memory minimum, 16GB+ strongly recommended

Performance Benefits:

Metal Performance Shaders acceleration, unified memory architecture

Apple Silicon Advantages:

  • Unified Memory: No separate VRAM limitations, models load faster
  • Metal Acceleration: Optimized neural network processing
  • Power Efficiency: Excellent performance per watt for battery use
  • Thermal Management: Intelligent scaling prevents overheating

💻 Intel Macs (Legacy Support)

Supported Models:

MacBook Pro 2016+, iMac 2017+, Mac Pro 2019+, iMac Pro

macOS Version:

macOS 10.15 (Catalina) minimum, 11.0+ recommended

Memory Requirements:

16GB RAM minimum for adequate performance

GPU Requirements:

Dedicated GPU recommended (Radeon Pro 560 or better)

Linux Installation Guide

Advanced installation for Linux distributions with manual compilation and GPU optimization

Advanced Users Only: Linux installation requires manual compilation, dependency management, and system configuration. Pre-built binaries are not available for Linux distributions.

Supported Linux Distributions

VoxityAI™ has been successfully compiled and tested on the following distributions. While other distributions may work, these are officially verified:

Ubuntu / Debian Family

Ubuntu 22.04 LTS Ubuntu 20.04 LTS Debian 11 Linux Mint 21 Pop!_OS 22.04

Package Manager: apt, snap

Red Hat Family

Fedora 37+ CentOS Stream 9 RHEL 9+ Rocky Linux 9

Package Manager: dnf, rpm

Arch Family

Arch Linux Manjaro EndeavourOS

Package Manager: pacman, AUR

Other Distributions

openSUSE Leap 15.4+ Gentoo (advanced)

Note: May require additional configuration

Build Dependencies and Prerequisites

Before compiling VoxityAI™, install all required development tools and libraries:

Core Development Tools

Ubuntu/Debian:

sudo apt update && sudo apt upgrade sudo apt install -y build-essential cmake git curl wget unzip sudo apt install -y python3.10 python3.10-dev python3.10-venv python3-pip sudo apt install -y pkg-config libffi-dev libssl-dev zlib1g-dev sudo apt install -y libbz2-dev libreadline-dev libsqlite3-dev llvm sudo apt install -y libncurses5-dev libncursesw5-dev xz-utils tk-dev

Fedora/CentOS:

sudo dnf groupinstall -y "Development Tools" "Development Libraries" sudo dnf install -y cmake git curl wget unzip python3-devel python3-pip sudo dnf install -y pkgconfig libffi-devel openssl-devel zlib-devel sudo dnf install -y bzip2-devel readline-devel sqlite-devel llvm-devel

Arch Linux:

sudo pacman -Syu sudo pacman -S --needed base-devel cmake git curl wget unzip sudo pacman -S python python-pip pkgconf libffi openssl zlib

Audio System Dependencies

Ubuntu/Debian:

# ALSA and PulseAudio sudo apt install -y libasound2-dev portaudio19-dev libportaudio2 sudo apt install -y pulseaudio-dev libjack-jackd2-dev jackd2 sudo apt install -y libsndfile1-dev libfftw3-dev libsamplerate0-dev # Optional: JACK for professional audio sudo apt install -y qjackctl jack-tools

Fedora/CentOS:

sudo dnf install -y alsa-lib-devel portaudio-devel sudo dnf install -y pulseaudio-libs-devel jack-audio-connection-kit-devel sudo dnf install -y libsndfile-devel fftw-devel libsamplerate-devel

Arch Linux:

sudo pacman -S alsa-lib portaudio pulseaudio jack2 sudo pacman -S libsndfile fftw libsamplerate qjackctl

GPU Acceleration Dependencies

NVIDIA CUDA Support:

# Ubuntu CUDA installation wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.0-1_all.deb sudo dpkg -i cuda-keyring_1.0-1_all.deb sudo apt update sudo apt install -y cuda-toolkit-11-8 nvidia-driver-530 # Verify installation nvidia-smi nvcc --version

AMD ROCm Support (Ubuntu 22.04):

# Add ROCm repository wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add - echo 'deb [arch=amd64] https://repo.radeon.com/rocm/apt/5.6 ubuntu main' | sudo tee /etc/apt/sources.list.d/rocm.list sudo apt update sudo apt install -y rocm-dev rocm-libs rocm-utils # Add user to render group sudo usermod -a -G render,video $USER

Compilation Process

Step 1: Source Code and Environment Setup

# Clone the VoxityAI repository (W-Okada base) git clone https://github.com/w-okada/voice-changer.git cd voice-changer # Create Python virtual environment python3.10 -m venv voxityai-env source voxityai-env/bin/activate # Upgrade pip and core packages pip install --upgrade pip wheel setuptools # Install build dependencies pip install cython numpy

Step 2: PyTorch Installation (GPU-Specific)

For NVIDIA CUDA 11.8:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

For AMD ROCm 5.6:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6

For CPU-only Installation:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

Step 3: Install Application Dependencies

# Install Python requirements pip install -r requirements.txt # Install additional dependencies for Linux pip install soundfile librosa pyaudio # For better performance (optional) pip install onnxruntime-gpu # For NVIDIA # OR pip install onnxruntime # For CPU/other GPUs

Step 4: Audio Backend Configuration

Linux supports multiple audio backends. Choose the appropriate one for your setup:

PulseAudio (Recommended for Desktop)

# Configure PulseAudio for low latency sudo tee -a /etc/pulse/daemon.conf << EOF default-sample-rate = 48000 alternate-sample-rate = 44100 default-sample-channels = 2 default-fragments = 2 default-fragment-size-msec = 4 high-priority = yes nice-level = -15 realtime-scheduling = yes realtime-priority = 5 EOF # Restart PulseAudio systemctl --user restart pulseaudio

JACK (Professional Audio)

# Start JACK with optimized settings jackd -d alsa -r 48000 -p 256 -n 2 -D -Chw:0,0 -Phw:0,0 # Or use QjackCtl for GUI configuration # Recommended settings: # Sample Rate: 48000 Hz # Frames/Period: 256 # Periods/Buffer: 2

ALSA (Direct Hardware Access)

# List available audio devices aplay -l arecord -l # Configure ALSA for low latency sudo tee /etc/asound.conf << EOF pcm.!default { type pulse } ctl.!default { type pulse } EOF

Step 5: Launch VoxityAI™

# Activate virtual environment source voxityai-env/bin/activate # Start the main server python MMVCServerSIO.py --host 0.0.0.0 --port 18888 # In another terminal, verify the web interface curl http://localhost:18888

Access the VoxityAI™ web interface at http://localhost:18888 in your browser.

Linux-Specific Optimizations

System Performance Tuning

CPU Governor and Power Management

# Set CPU governor to performance mode echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor # Make permanent by adding to /etc/rc.local sudo tee -a /etc/rc.local << EOF #!/bin/bash echo performance > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor exit 0 EOF sudo chmod +x /etc/rc.local

Real-time Priority and Resource Limits

# Add user to audio group for real-time priority sudo usermod -a -G audio $USER # Configure resource limits for real-time scheduling sudo tee -a /etc/security/limits.conf << EOF @audio - rtprio 95 @audio - memlock unlimited @audio - nice -10 EOF # Enable real-time scheduling in PAM echo "session required pam_limits.so" | sudo tee -a /etc/pam.d/common-session

Kernel Optimization (Advanced)

# Edit GRUB configuration for low-latency kernel sudo nano /etc/default/grub # Add these parameters to GRUB_CMDLINE_LINUX_DEFAULT: # threadirqs processor.max_cstate=1 intel_idle.max_cstate=0 idle=poll # Example: # GRUB_CMDLINE_LINUX_DEFAULT="quiet splash threadirqs processor.max_cstate=1" # Update GRUB and reboot sudo update-grub sudo reboot
Warning: These optimizations may increase power consumption and reduce battery life on laptops.

Audio System Optimization

PulseAudio Advanced Configuration

# Create custom PulseAudio configuration for VoxityAI™ mkdir -p ~/.config/pulse # Create low-latency configuration tee ~/.config/pulse/daemon.conf << EOF default-sample-rate = 48000 alternate-sample-rate = 44100 default-sample-channels = 2 default-channel-map = front-left,front-right default-fragments = 2 default-fragment-size-msec = 1 high-priority = yes nice-level = -15 realtime-scheduling = yes realtime-priority = 5 rlimit-rtprio = 9 daemonize = no EOF

JACK Professional Setup

# Install real-time kernel (Ubuntu) sudo apt install linux-lowlatency # Configure JACK for maximum performance # Create ~/.jackdrc with optimal settings: tee ~/.jackdrc << EOF /usr/bin/jackd -R -P75 -dalsa -dhw:0,0 -r48000 -p128 -n2 -D -Chw:0,0 -Phw:0,0 EOF # Start JACK and verify low latency jack_control start jack_control status

GPU Optimization

NVIDIA GPU Optimization

# Enable persistence mode for consistent performance sudo nvidia-smi -pm 1 # Set maximum performance mode sudo nvidia-smi -ac 877,1455 # Adjust values for your GPU # Monitor GPU usage during voice processing watch -n 1 nvidia-smi

AMD GPU Optimization (ROCm)

# Set GPU power and performance profiles echo "performance" | sudo tee /sys/class/drm/card0/device/power_dpm_force_performance_level # Monitor AMD GPU usage watch -n 1 rocm-smi

Linux Troubleshooting

Common Installation Issues

Permission denied errors during compilation

# Ensure user is in necessary groups sudo usermod -a -G audio,video,render,input $USER # Fix common permission issues sudo chmod 666 /dev/nvidia* sudo chmod 666 /dev/dri/* # Create udev rules for persistent permissions sudo tee /etc/udev/rules.d/99-voxityai.rules << EOF KERNEL=="nvidia*", GROUP="video", MODE="0666" KERNEL=="card*", GROUP="video", MODE="0666" EOF sudo udevadm control --reload-rules

Audio system not detected or high latency

Diagnostic Commands:

# Check audio devices aplay -l arecord -l # Test PulseAudio pactl list sources short pactl list sinks short # Check real-time capabilities ulimit -r groups $USER # Test audio latency pa-info | grep -i latency

Solutions:

  • Install real-time kernel: sudo apt install linux-lowlatency
  • Add user to audio group and configure limits as shown above
  • Use JACK for professional low-latency audio
  • Disable audio power management: echo 0 | sudo tee /sys/module/snd_hda_intel/parameters/power_save

GPU acceleration not working

NVIDIA Troubleshooting:

# Verify NVIDIA driver installation nvidia-smi lsmod | grep nvidia # Check CUDA installation nvcc --version python -c "import torch; print(torch.cuda.is_available())" # Reinstall NVIDIA drivers if needed sudo apt purge nvidia-* sudo apt install nvidia-driver-530 nvidia-dkms-530

AMD Troubleshooting:

# Check AMD GPU recognition lspci | grep -i amd rocm-smi # Verify ROCm installation python -c "import torch; print(torch.version.hip)"

Creating Linux Service (Optional)

For automatic startup and system integration, create a systemd service:

# Create systemd service file sudo tee /etc/systemd/system/voxityai.service << EOF [Unit] Description=VoxityAI Voice Changer Service After=network.target sound.target [Service] Type=simple User=$USER WorkingDirectory=/path/to/voice-changer ExecStart=/path/to/voice-changer/voxityai-env/bin/python MMVCServerSIO.py Restart=always RestartSec=5 [Install] WantedBy=multi-user.target EOF # Enable and start the service sudo systemctl daemon-reload sudo systemctl enable voxityai sudo systemctl start voxityai # Check service status sudo systemctl status voxityai

Audio Routing Setup

Professional audio routing and virtual cable configuration for seamless integration

Audio routing is the foundation of professional VoxityAI™ integration. This comprehensive guide covers virtual audio cable setup, application routing, and advanced multi-destination audio workflows for content creators, streamers, and professional users.

Virtual Audio Solutions by Platform

🪟 Windows Virtual Audio Solutions

VB-Audio Virtual Cable

Free
⭐⭐⭐⭐⭐

Most popular virtual audio solution with excellent compatibility across applications.

Advantages:
  • ✓ Completely free and well-supported
  • ✓ Simple installation and configuration
  • ✓ Excellent compatibility with Discord, OBS, games
  • ✓ Stable and reliable for most users
  • ✓ Low CPU overhead
Limitations:
  • ✗ Single virtual cable only
  • ✗ No built-in mixing capabilities
  • ✗ Limited advanced configuration options
Best For: Basic Discord integration, simple OBS setups, casual streaming
Installation:
  1. Download from VB-Audio.com
  2. Run installer as Administrator
  3. Restart computer after installation
  4. Configure VoxityAI™ output to "CABLE Input (VB-Audio Virtual Cable)"
  5. Set Discord/OBS input to "CABLE Output (VB-Audio Virtual Cable)"

VB-Audio Voicemeeter

Free
⭐⭐⭐⭐⭐

Complete virtual mixing console with multiple inputs, outputs, and professional audio processing.

Advantages:
  • ✓ Full virtual mixing console
  • ✓ Multiple virtual inputs/outputs (VAIO)
  • ✓ Built-in EQ, compression, and effects
  • ✓ Advanced routing and monitoring
  • ✓ Professional-grade features
Limitations:
  • ✗ Steeper learning curve
  • ✗ More complex setup process
  • ✗ Higher CPU usage than simple cables
Best For: Advanced streaming setups, multiple audio sources, professional content creation
Available Versions:
  • Voicemeeter: Basic version with 2 hardware + 1 virtual input
  • Voicemeeter Banana: 3 hardware + 2 virtual inputs
  • Voicemeeter Potato: 5 hardware + 3 virtual inputs (most advanced)

Virtual Audio Cable (Professional)

⭐⭐⭐⭐⭐

Professional-grade virtual audio solution with minimal latency and maximum stability.

Advantages:
  • ✓ Up to 256 virtual audio cables
  • ✓ Ultra-low latency design
  • ✓ Professional stability and reliability
  • ✓ Advanced configuration options
  • ✓ Dedicated technical support
Limitations:
  • ✗ Commercial license required
  • ✗ More complex than free alternatives
Best For: Professional studios, commercial broadcasting, complex multi-channel setups

🍎 macOS Virtual Audio Solutions

BlackHole

Free & Open Source
⭐⭐⭐⭐⭐

Modern virtual audio driver designed specifically for macOS with excellent system integration.

Key Features:
  • Zero latency virtual audio driver
  • Multiple channel configurations (2ch, 16ch, 64ch)
  • No trial period or license restrictions
  • Active development and macOS compatibility
  • Native Apple Silicon and Intel support
Installation Process:
  1. Install via Homebrew: brew install blackhole-2ch
  2. Or download installer from GitHub
  3. Open Audio MIDI Setup (Applications > Utilities)
  4. Create Multi-Output Device combining speakers + BlackHole
  5. Set VoxityAI™ output to BlackHole 2ch
  6. Configure target applications to use BlackHole as input

Loopback by Rogue Amoeba

⭐⭐⭐⭐⭐

Professional audio routing solution with visual interface and advanced mixing capabilities.

Professional Features:
  • Visual cable management interface
  • Multiple virtual devices with custom configurations
  • Real-time audio monitoring and level control
  • Session saving and recall
  • Advanced routing matrix
  • Professional customer support
Best For: Professional podcasters, musicians, complex audio setups requiring visual management

🐧 Linux Virtual Audio Solutions

PulseAudio Virtual Sinks

Native Linux solution using PulseAudio's built-in virtual device capabilities.

# Create virtual sink for VoxityAI output pactl load-module module-null-sink sink_name=VoxityAI_Output sink_properties=device.description="VoxityAI_Output" # Create virtual source from sink monitor pactl load-module module-virtual-source source_name=VoxityAI_Input master=VoxityAI_Output.monitor source_properties=device.description="VoxityAI_Input" # Make configuration persistent echo "load-module module-null-sink sink_name=VoxityAI_Output sink_properties=device.description=\"VoxityAI_Output\"" >> ~/.config/pulse/default.pa echo "load-module module-virtual-source source_name=VoxityAI_Input master=VoxityAI_Output.monitor source_properties=device.description=\"VoxityAI_Input\"" >> ~/.config/pulse/default.pa

JACK Audio Routing

Professional audio routing system with ultra-low latency and advanced connection management.

# Start JACK with optimal settings jackd -d alsa -r 48000 -p 256 -n 2 # Connect VoxityAI output to applications using command line jack_connect VoxityAI:output discord:input jack_connect VoxityAI:output obs:microphone_input # Or use QjackCtl for graphical connection management qjackctl

PipeWire (Modern Alternative)

Next-generation audio server with JACK compatibility and PulseAudio replacement capabilities.

# Install PipeWire (Ubuntu 22.04+) sudo apt install pipewire pipewire-pulse pipewire-jack systemctl --user enable pipewire pipewire-pulse # Create virtual devices using pw-loopback pw-loopback -P "VoxityAI_Output" -C "VoxityAI_Input"

Application-Specific Integration Guides

Discord Integration

Set up VoxityAI™ for seamless Discord voice chat with real-time voice transformation.

1
Configure Virtual Audio

Set up virtual audio cable using your platform's preferred solution (VB-Cable, BlackHole, etc.)

2
VoxityAI™ Output Configuration

Set VoxityAI™ output device to virtual cable input:

  • Windows: "CABLE Input (VB-Audio Virtual Cable)"
  • macOS: "BlackHole 2ch"
  • Linux: "VoxityAI_Output"
3
Discord Audio Settings

Configure Discord to receive processed audio:

  • Open Discord Settings > Voice & Video
  • Set Input Device to virtual cable output
  • Disable Discord's noise suppression and echo cancellation
  • Set Input Sensitivity to manual mode
  • Test microphone to verify voice transformation
4
Optimize for Real-Time Use

Configure settings for minimal latency during voice chat:

  • Enable Push-to-Talk to avoid background processing
  • Use lightweight voice models (Beatrice v2)
  • Set chunk size to 256 or lower
  • Monitor CPU/GPU usage during extended calls

Discord Pro Tips

  • Voice Activity: Fine-tune Discord's input sensitivity to match your converted voice levels
  • Quality Settings: Use Discord's "High" quality mode for best voice transmission
  • Regional Servers: Choose Discord servers geographically close to reduce network latency
  • Bandwidth: Ensure stable internet connection for consistent voice quality

OBS Studio Integration

Professional streaming setup with VoxityAI™ for live content creation and broadcasting.

Basic OBS Integration

  1. Add "Audio Input Capture" source to your scene
  2. Select virtual cable output as the audio device
  3. Configure audio monitoring to "Monitor and Output"
  4. Adjust audio levels in OBS mixer

Advanced Multi-Source Setup

For complex streaming setups with multiple audio sources:

Microphone → VoxityAI™
Virtual Cable
OBS (Stream)
Discord (Chat)
Game Chat
Local Monitor

OBS Performance Optimization

  • Audio Filters: Add noise gate, compressor, and EQ for professional sound
  • Sync Offset: Adjust audio sync if video/audio become misaligned
  • Sample Rate: Match OBS sample rate to VoxityAI™ (48kHz recommended)
  • Monitoring: Use OBS's advanced audio monitoring for real-time feedback

Video Conferencing (Zoom/Teams/Meet)

Professional voice transformation for business meetings and video conferences.

Professional Ethics Notice: Always inform participants when using voice modification in professional settings. Some organizations have policies regarding voice alteration in business communications.

Zoom Configuration

  1. Open Zoom Settings > Audio
  2. Select virtual cable as microphone input
  3. Disable automatic gain control and noise reduction
  4. Test audio using Zoom's microphone test feature
  5. Adjust input volume to optimal levels

Microsoft Teams Configuration

  1. Access Settings > Devices
  2. Choose virtual audio device as microphone
  3. Disable Teams' audio processing features
  4. Test audio quality in device settings

Google Meet Configuration

  1. Click Settings gear during meeting
  2. Select Audio tab
  3. Choose virtual audio device for microphone
  4. Verify audio levels are appropriate

Video Conferencing Best Practices

  • Quality Testing: Test voice transformation before important meetings
  • Backup Plan: Have fallback audio source ready in case of technical issues
  • Bandwidth: Monitor network performance during voice processing
  • Consistency: Use consistent voice settings throughout meetings
  • Transparency: Inform participants about voice modification when appropriate

Gaming Integration

Optimize VoxityAI™ for gaming scenarios with minimal performance impact and seamless voice chat.

Performance-First Gaming Setup

Voice Model Selection:
  • Use Beatrice v2 models for lowest latency
  • Avoid complex RVC models during competitive gaming
  • Pre-load models before gaming sessions
Resource Management:
  • Set VoxityAI™ process priority to "Normal" (not High) during gaming
  • Use dedicated GPU cores if available
  • Monitor system temperature during extended sessions
Audio Settings:
  • Chunk size: 256 samples maximum
  • F0 Detector: DIO or Harvest for speed
  • Disable unnecessary audio processing

Popular Games Integration

Discord-Based Games:

For games using Discord for voice chat, follow Discord integration steps above.

In-Game Voice Chat:
  • Set game's microphone input to virtual cable output
  • Test voice chat in game settings before playing
  • Use Push-to-Talk for better control
Streaming While Gaming:
  • Use server-client setup to offload processing
  • Route game audio separately from voice
  • Monitor system performance continuously

Advanced Audio Routing Scenarios

Multi-Destination Broadcasting

Route VoxityAI™ output to multiple applications simultaneously with independent level control.

Using Voicemeeter for Multi-Routing:

  1. Hardware Input 1: Physical microphone
  2. Virtual Input (VAIO): VoxityAI™ output destination
  3. Hardware Output A1: Speakers/headphones for monitoring
  4. Virtual Output B1: Discord/game chat
  5. Virtual Output B2: OBS/streaming software
  6. Bus Assignment: Route VAIO to A1+B1+B2 with independent level control
Benefits of Multi-Routing:
  • Independent volume control for each destination
  • Different processing for chat vs. stream
  • Backup audio routing for redundancy
  • Real-time level monitoring and adjustment

Professional Studio Integration

Integrate VoxityAI™ into professional recording and broadcast environments.

Hardware Integration Chain:

Microphone
Audio Interface
VoxityAI™ Processing
Virtual Routing
DAW/Broadcast

Professional Considerations:

  • Latency Monitoring: Use professional audio interfaces with hardware monitoring
  • Backup Systems: Always have non-processed audio backup for live broadcasts
  • Quality Control: Monitor output quality continuously during long sessions
  • Documentation: Document all routing and settings for session recall

Remote Collaboration Setup

Configure VoxityAI™ for remote recording sessions and collaborative content creation.

Client-Server Voice Processing:

Run VoxityAI™ server on a powerful desktop while using lightweight clients for actual recording/communication:

Server Machine (High-Performance):
  • Runs MMVCServerSIO with GPU acceleration
  • Hosts voice models and processing engine
  • Provides web interface for multiple clients
Client Machine (Lightweight):
  • Connects to server via web browser
  • Handles only audio input/output
  • Minimal local processing requirements

Network Optimization:

  • Bandwidth: Minimum 1 Mbps upload/download for stable audio streaming
  • Latency: Local network preferred, VPN may add latency
  • Stability: Wired connection recommended over WiFi
  • Security: Use VPN or SSH tunneling for internet-based connections

Audio Routing Troubleshooting

Common Audio Routing Issues

No audio in target application

Systematic Diagnosis:

  1. Verify VoxityAI™ is outputting to correct virtual device
  2. Check target application audio input settings
  3. Test virtual cable with system audio recorder
  4. Restart VoxityAI™ and target application
  5. Verify system audio permissions and privacy settings

Platform-Specific Solutions:

Windows:
  • Check Windows audio privacy settings
  • Verify virtual cable appears in mmsys.cpl
  • Restart Windows Audio service
macOS:
  • Grant microphone permissions to browser and applications
  • Check Audio MIDI Setup for device visibility
  • Verify BlackHole installation and version
Linux:
  • Check PulseAudio/PipeWire device list
  • Verify user audio group membership
  • Test with pavucontrol for visual debugging

Audio feedback and echo issues

Feedback Prevention:

  • Always use headphones instead of speakers when possible
  • Ensure virtual cable output is not set as system default playback
  • Disable "Listen to this device" in Windows sound properties
  • Check for audio loopback in virtual audio software settings
  • Use directional microphones to reduce ambient pickup

Echo Cancellation:

  • Enable VoxityAI™'s built-in echo cancellation
  • Adjust microphone positioning and gain levels
  • Use acoustic treatment in recording environment
  • Configure application-specific echo cancellation settings

High latency in audio routing

Latency Reduction Strategies:

  • Reduce VoxityAI™ chunk size and buffer settings
  • Use ASIO drivers for professional audio interfaces
  • Minimize virtual audio cable buffer sizes
  • Close unnecessary audio applications and effects
  • Use dedicated audio hardware for critical applications
  • Optimize system audio service priority and real-time settings

System-Level Optimizations:

  • Set audio-related processes to high priority
  • Disable Windows audio enhancements completely
  • Use exclusive mode for audio devices when possible
  • Configure system for real-time audio performance

Audio quality degradation through routing

Quality Preservation:

  • Match sample rates across all audio devices (48kHz recommended)
  • Use 24-bit audio depth throughout the signal chain
  • Minimize audio processing chain length
  • Adjust virtual cable buffer sizes for optimal quality/latency balance
  • Update audio drivers to latest versions
  • Use lossless audio formats where possible

Signal Chain Optimization:

  • Avoid multiple format conversions in the audio path
  • Set appropriate gain staging to prevent clipping
  • Monitor audio levels throughout the signal chain
  • Use professional audio interfaces for critical applications