VoxityAI™ Documentation

Professional real-time voice transformation made simple

Welcome to the comprehensive documentation for VoxityAI™, the premier real-time AI voice changer that transforms your voice instantly for gaming, streaming, content creation, and professional applications. Built on the powerful W-Okada engine developed by Watanabe Okada, VoxityAI™ delivers studio-quality voice transformation with unprecedented ease of use.

VoxityAI™ represents a breakthrough in real-time voice conversion technology, utilizing state-of-the-art artificial intelligence models including RVC (Retrieval-based Voice Conversion), Beatrice v2, MMVC, so-vits-svc, and DDSP-SVC. These cutting-edge algorithms enable natural-sounding voice transformations while preserving emotional content, intonation, and speech patterns.

🚀

Instant Setup

Launch in minutes with our streamlined installation process - no complex configuration required

🎯

Real-Time Processing

Sub-150ms latency voice transformation for live applications using advanced AI models

🎭

Multiple Voice Models

Support for RVC, Beatrice v2, MMVC, so-vits-svc, and DDSP-SVC voice models

⚡

High Performance

GPU acceleration for NVIDIA, AMD, and Intel hardware with CPU fallback support

🔒

Privacy First

100% local processing - your voice data never leaves your device

🌐

Cross-Platform

Works seamlessly on Windows, macOS, Linux, and Google Colab environments

Core Technology

At its foundation, VoxityAI™ leverages the W-Okada voice changer engine, a revolutionary open-source project that has redefined real-time voice conversion. Unlike traditional pitch shifters or simple voice effects, our system uses sophisticated neural networks to analyze and transform vocal characteristics while maintaining the naturalness and emotional expression of human speech.

The technology supports server-client architecture, allowing users to distribute computational load across multiple devices. This means you can run the voice processing server on a powerful desktop while using the converted voice on a lighter device, perfect for gaming setups where system resources are precious.

Quick Navigation

📋 Quick Start Guide 💻 Windows Installation 🎵 Voice Models 🔧 Troubleshooting 📺 Streaming Setup 🎓 Train Custom Models

Use Cases and Applications

VoxityAI™ excels across diverse applications where real-time voice transformation enhances user experience and creative expression:

Gaming and Entertainment

Transform your voice to match in-game characters in real-time, creating immersive role-playing experiences. Popular with VTubers and streamers who need consistent character voices across long streaming sessions. The low-latency processing ensures natural conversations without disruptive delays.

Content Creation

Professional voiceover artists use VoxityAI™ to expand their vocal range, creating multiple character voices for animations, podcasts, and audiobooks. The high-quality output rivals traditional studio processing while offering real-time flexibility.

Privacy and Communication

Maintain anonymity during online meetings, gaming sessions, or content creation. Perfect for streamers who want to protect their identity while engaging with audiences, or professionals conducting sensitive discussions.

Language Learning and Accessibility

Simulate different accents for language learning practice, or assist individuals with speech difficulties in achieving their desired vocal expression. The technology preserves speech patterns while transforming vocal characteristics.

Live Streaming and Broadcasting

Seamlessly integrate with OBS Studio, Discord, Twitch, and other platforms. Create engaging content with character voices that respond instantly to your speech, keeping audiences entertained without breaking immersion.

Technical Innovation

VoxityAI™ incorporates several breakthrough technologies that set it apart from conventional voice changers:

Neural Voice Conversion: Advanced AI models that understand vocal characteristics at a fundamental level, enabling natural transformations that preserve emotional content and speaking style.
Real-Time Processing: Optimized inference pipelines that deliver voice conversion with minimal latency, suitable for live conversations and interactive applications.
Cross-Platform Compatibility: Unified codebase that works seamlessly across Windows, macOS, and Linux, with automatic hardware optimization for different GPU vendors.
Modular Architecture: Support for multiple AI model types allows users to choose the best algorithm for their specific needs, from lightweight real-time models to high-quality offline processing.
Advanced Audio Processing: Built-in noise suppression, echo cancellation, and audio enhancement features ensure clean, professional output in any environment.

System Requirements

Ensure optimal performance with the right hardware configuration

VoxityAI™ is designed to work across a wide range of hardware configurations, from basic setups suitable for casual use to high-performance systems optimized for professional content creation. Understanding your system's capabilities helps optimize performance and achieve the best possible voice conversion quality.

Minimum Requirements

Operating System: Windows 10 (64-bit), macOS 10.15+, or Ubuntu 18.04+
Processor: Intel i5-4590 / AMD FX 8350 equivalent (4+ cores recommended)
Memory: 8 GB RAM system memory
Storage: 4 GB available space (additional space for voice models)
Audio Hardware: Compatible audio input/output device or USB headset
Network: Internet connection for initial setup and model access
Browser: Chrome 90+, Firefox 88+, or Safari 14+ for web interface

Expected Performance: 300-800ms latency, suitable for casual use and testing

Recommended Configuration

Operating System: Windows 11 (64-bit), macOS 12+, or Ubuntu 20.04+
Processor: Intel i7-8700K / AMD Ryzen 7 2700X or better (8+ cores)
Memory: 16 GB or more system memory
Graphics: NVIDIA RTX 2060 / AMD RX 6700 XT / Intel Arc A750 or better
Storage: 10+ GB available space on SSD (faster model loading)
Audio Interface: Professional audio interface with low-latency drivers
Network: Stable broadband connection for model access and updates

Expected Performance: 50-200ms latency, ideal for real-time applications

GPU Acceleration Support

VoxityAI™ leverages GPU acceleration to minimize voice conversion latency and maximize quality. Different GPU architectures offer varying levels of performance optimization:

NVIDIA (CUDA)

Best overall performance with RTX series cards. Requires CUDA 11.7+ drivers and compatible PyTorch installation. Supports Tensor RT optimization for maximum performance.

Recommended Models:

RTX 4060/4070/4080/4090 (Latest generation - optimal performance)
RTX 3060/3070/3080/3090 (Excellent performance with 12GB+ VRAM)
RTX 2060/2070/2080 (Good performance for most models)
GTX 1660/1070/1080 (Basic support, limited to smaller models)

Performance Notes:

RTX 4090: 30-60ms latency with complex models
RTX 3070: 80-150ms latency typical
RTX 2060: 150-250ms latency acceptable

Highest Performance

AMD (DirectML/ROCm)

Good performance with RX 6000+ series. DirectML support provides hardware acceleration on Windows, while ROCm enables Linux acceleration.

Recommended Models:

RX 7800 XT/7900 XTX (Latest generation with excellent DirectML)
RX 6700 XT/6800 XT/6900 XT (Proven performance with 12GB+ VRAM)
RX 580/590/5700 XT (Basic support, adequate for lightweight models)

Configuration Tips:

Use ONNX model format for best AMD compatibility
Enable Smart Access Memory for improved performance
DirectML works best on Windows 11 with latest drivers

Good Performance

Intel (DirectML/XPU)

Emerging support with Intel Arc series and integrated graphics. Excellent power efficiency on laptops with Intel processors.

Supported Hardware:

Intel Arc A750/A770 (Good acceleration for discrete cards)
Intel Arc A380 (Basic acceleration, entry-level)
Intel Iris Xe (Limited acceleration, CPU fallback recommended)
12th/13th gen integrated graphics (Basic support)

Emerging Support

Apple Silicon (Metal)

Excellent efficiency on M1/M2/M3 Macs with optimized Metal Performance Shaders integration. Unified memory architecture provides unique advantages.

Performance by Model:

M3 Pro/Max/Ultra (Exceptional performance, 40-100ms latency)
M2 Pro/Max/Ultra (Excellent performance, 60-150ms latency)
M1 Pro/Max/Ultra (Very good performance, 80-200ms latency)
M1/M2 Base (Good performance for lightweight models)

Unique Benefits:

No separate VRAM limitation due to unified memory
Excellent power efficiency for battery-powered use
Automatic thermal management prevents overheating

High Efficiency

Performance Optimization Guidelines

Voice conversion latency and quality depend on several interconnected factors. Understanding these relationships helps optimize your setup for specific use cases:

Hardware Impact on Performance

High-end GPU (RTX 4080+): 50-100ms typical latency, suitable for professional streaming and content creation
Mid-range GPU (RTX 3060/RX 6700 XT): 100-200ms typical latency, excellent for gaming and casual streaming
Entry-level GPU/CPU: 200-500ms typical latency, adequate for offline content creation and testing
Optimal buffer settings: Chunk size 256-512 samples, Extra buffer 8192-16384 samples

Model Complexity vs. Performance

Beatrice v2 models: Lightweight, 50-150ms latency, ideal for real-time applications
RVC models: Balanced quality/performance, 100-300ms latency, most versatile
MMVC models: Highest quality, 200-500ms latency, best for content creation
Custom trained models: Performance varies based on architecture and training parameters

Audio Configuration Impact

Sample Rate: 48kHz recommended for best quality, 44.1kHz for performance
Buffer Size: Smaller buffers reduce latency but require more processing power
Bit Depth: 32-bit float recommended for professional use, 16-bit adequate for casual use
Processing Mode: Server mode generally provides better performance than client mode

Storage and Network Requirements

While VoxityAI™ has modest storage requirements for the base installation, voice models and user data can accumulate significantly over time:

Base Installation: 2-4 GB depending on platform and included components
Voice Models: 100MB-2GB per model, depending on complexity and quality
Model Cache: 1-5 GB for temporary files and conversion cache
User Data: Audio recordings, configurations, and custom settings
Network Usage: Initial model access, updates, and cloud synchronization (optional)

SSD storage is strongly recommended for the main installation and frequently used models, as loading times significantly impact user experience when switching between voice models during live sessions.

Quick Start Guide

Get VoxityAI™ running in under 5 minutes

This quick start guide will have you using VoxityAI™ for real-time voice transformation in just a few minutes. Whether you're a streamer, gamer, or content creator, these steps will get you up and running with professional-quality voice conversion.

1

Launch VoxityAI™

Start the application from your desktop shortcut or applications menu. VoxityAI™ will automatically initialize the web-based interface in your default browser, typically opening at localhost:18888.

Default URL: http://localhost:18888

First Launch Notes: The initial startup may take 2-3 minutes as VoxityAI™ initializes core components, loads voice models, and optimizes settings for your hardware. This delay only occurs during the first launch.

Browser Compatibility: Chrome and Edge provide the best performance, though Firefox and Safari are also supported with slightly reduced features.

2

Configure Audio Devices

Proper audio configuration is crucial for optimal voice conversion. Select your input and output devices in the audio settings panel.

Input Device: Choose your microphone or audio interface. USB headsets generally provide the most consistent results.

Direct Output: Select your headphones/speakers to monitor the transformed voice in real-time.

Virtual Cable Output: Route to virtual audio device for integration with streaming software, Discord, or games.

Pro Setup: For streaming or Discord, install VB-Cable (Windows) or BlackHole (macOS) to route processed audio to broadcasting applications while maintaining separate monitoring.

3

Select Voice Model

VoxityAI™ includes several pre-configured voice models optimized for different use cases. You can also import custom models or access community-created voices.

RVC Models: Excellent for character voices and gender transformation with natural-sounding results

Beatrice v2: Optimized for real-time performance with minimal latency, perfect for gaming

MMVC: Highest quality conversion for professional content creation and voiceover work

Model Selection Tips: Start with Beatrice v2 models for lowest latency, then experiment with RVC models for higher quality. Each model type has different performance characteristics and optimal use cases.

4

Fine-Tune Settings

Adjust voice conversion parameters to achieve your desired sound. Most users can achieve excellent results with minimal adjustments.

Pitch (f0Factor): Adjust voice pitch - positive values raise pitch (male→female), negative values lower it

Index Ratio: Controls model feature influence (0.0-1.0) - higher values follow the model more closely

Chunk Size: Balance latency vs. quality - smaller values reduce delay but require more processing power

F0 Detector: Pitch detection algorithm - RMVPE offers best quality, DIO provides fastest processing

Recommended Starting Settings:

For Gaming: Chunk 256, F0 Detector: RMVPE, Pitch: ±4 semitones
For Streaming: Chunk 512, F0 Detector: Crepe, Index Ratio: 0.7
For Content Creation: Chunk 1024, F0 Detector: Crepe-Full, highest quality settings

✅

Ready to Transform!

VoxityAI™ is now processing your voice in real-time. The converted audio can be used with Discord, OBS Studio, Zoom, games, or any application that accepts audio input. Experiment with different models and settings to discover your perfect voice!

Testing Your Setup

Before using VoxityAI™ in your target application, perform these quick tests to ensure everything is working correctly:

Audio Quality Test

Speak into your microphone and listen to the converted voice through your headphones
Check for audio artifacts, distortion, or unnatural sounds
Adjust the input gain if the voice sounds too quiet or distorted
Test different voice models to find the best match for your voice type

Latency Test

Count from 1 to 10 while monitoring the converted voice
If delay is noticeable, reduce chunk size or switch to a lighter model
For real-time applications, aim for sub-200ms total latency
Consider your internet connection if using remote processing

Integration Test

Open your target application (Discord, OBS, game, etc.)
Select VoxityAI™ output as the microphone input device
Test voice chat or recording functionality
Verify that the converted voice is transmitted clearly

Common First-Launch Issues

Browser doesn't open automatically

Manually navigate to http://localhost:18888 in your preferred browser. Ensure no firewall or antivirus software is blocking the connection.

No microphone detected

Check browser permissions for microphone access. On Windows, verify privacy settings allow microphone access for browsers. Restart the browser after granting permissions.

High latency or poor quality

Start with smaller chunk sizes (256) and lighter models (Beatrice v2). Allow GPU drivers time to optimize, and close unnecessary applications consuming system resources.

Voice sounds robotic or unnatural

Reduce pitch adjustment to ±4 semitones maximum. Try different F0 detectors (RMVPE or Crepe). Ensure the voice model matches your gender and vocal range.

Next Steps

Once you have VoxityAI™ working with basic settings, explore these advanced features:

Custom Models: Import community-created voice models or train your own
Advanced Audio: Configure noise suppression, echo cancellation, and audio enhancement
Streaming Integration: Set up professional-grade audio routing for content creation
Performance Optimization: Fine-tune settings for your specific hardware and use case
Backup and Sync: Configure settings backup and voice model synchronization

Initial Configuration

Optimize VoxityAI™ settings for your specific setup and use case

After completing the quick start guide, this comprehensive configuration section will help you optimize VoxityAI™ for professional-grade performance. These settings can significantly impact both audio quality and system performance.

Audio System Deep Configuration

Professional audio configuration requires understanding the complete signal chain from microphone input through processing to final output. VoxityAI™ provides extensive control over every aspect of this pipeline.

Input Device Optimization

The quality of your voice conversion starts with clean input signals. Different microphone types require specific optimization approaches:

USB Headsets and Gaming Mics

Advantages: Plug-and-play convenience, built-in noise cancellation, consistent positioning
Configuration: Set input gain to 1.0, enable hardware noise suppression if available
Recommended Models: SteelSeries Arctis, HyperX Cloud series, Audio-Technica ATH-G1
Optimization Tips: Position 2-3 fingers away from mouth, avoid breathing directly into capsule

Professional Audio Interfaces

Advantages: Superior audio quality, XLR microphone support, hardware monitoring
Configuration: Set interface as ASIO device, configure low-latency monitoring
Recommended Interfaces: Focusrite Scarlett series, PreSonus AudioBox, Behringer UMC series
Gain Staging: Adjust interface preamp for -12dB to -6dB peaks, avoid clipping

Studio Microphones

Dynamic Mics: Excellent noise rejection, require higher gain, ideal for noisy environments
Condenser Mics: Superior sensitivity and detail, require phantom power, best in treated rooms
Positioning: Maintain consistent distance, use pop filters, consider room acoustics
Popular Choices: Shure SM7B (dynamic), Audio-Technica AT2020 (condenser)

Processing Mode Selection

VoxityAI™ offers multiple processing architectures optimized for different use cases and hardware configurations:

Server Mode (Recommended)

Processes audio on the dedicated server instance, providing optimal performance and resource management.

✓ Lower client system load ✓ Better GPU utilization ✓ Consistent performance ✓ Multiple client support

Best For: Gaming setups, streaming, professional use

Client Mode

Processes audio locally within the browser or client application, useful for distributed setups.

✓ No server dependency ✓ Lower network usage ✓ Privacy advantages ✓ Offline capability

Best For: Portable setups, privacy-focused use, offline processing

AI Model Configuration

Understanding the characteristics and optimal use cases for different AI model types enables you to choose the best algorithm for your specific needs:

RVC (Retrieval-based Voice Conversion)

The most versatile and widely-used voice conversion technology, offering excellent balance between quality and performance.

Quality: Excellent (8.5/10)

Performance: Good (7/10)

VRAM Required: 4-8GB

Typical Latency: 100-250ms

Training Data: 10-60 minutes

Model Size: 50-200MB

Optimal Use Cases:

Character voice acting and content creation
Gender transformation with natural results
Celebrity voice impressions and parodies
Language learning and accent modification

Configuration Tips:

Use RMVPE f0 detector for best pitch accuracy
Index ratio 0.6-0.8 balances source and target characteristics
Enable voice activity detection to reduce artifacts during silence
Adjust transpose parameter gradually (±1 semitone increments)

Beatrice v2

Lightweight model specifically engineered for real-time applications where low latency is critical.

Quality: Good (7/10)

Performance: Excellent (9.5/10)

VRAM Required: 2-4GB

Typical Latency: 50-120ms

Training Data: 20-120 minutes

Model Size: 20-80MB

Optimal Use Cases:

Real-time gaming communication
Live streaming with minimal delay
Interactive VR/AR applications
Low-resource hardware setups

Configuration Tips:

Works optimally with smaller chunk sizes (128-256)
Less sensitive to pitch adjustments than RVC models
Performs well with basic f0 detectors (DIO, Harvest)
Enable real-time optimization for lowest latency

MMVC (Many-to-Many Voice Conversion)

High-quality conversion model that excels at preserving emotional nuance and speaking patterns.

Quality: Excellent (9/10)

Performance: Moderate (6/10)

VRAM Required: 6-12GB

Typical Latency: 200-400ms

Training Data: 30-180 minutes

Model Size: 100-500MB

Optimal Use Cases:

Professional voiceover and dubbing
High-quality content creation
Audiobook and podcast production
Film and animation voice work

so-vits-svc

Specialized architecture for singing voice conversion, maintaining musical elements while transforming vocal timbre.

Quality: Excellent for singing (9/10)

Performance: Moderate (6/10)

VRAM Required: 4-8GB

Typical Latency: 250-500ms

Optimal Use Cases:

AI music covers and vocal synthesis
Singing voice transformation
Musical content creation
Vocal range extension for singers

Performance Optimization

Fine-tuning performance parameters allows you to achieve the optimal balance between audio quality, latency, and system resource usage:

Chunk Size Configuration

The chunk size parameter directly controls the trade-off between processing latency and audio quality:

128-256 samples:

Minimum latency (50-100ms additional)
Higher CPU/GPU usage
May reduce quality on slower hardware
Best for: Real-time gaming, live interaction

512-1024 samples:

Balanced latency (100-200ms additional)
Optimal for most use cases
Good quality/performance ratio
Best for: Streaming, casual content creation

1536-2048 samples:

Highest quality processing
Increased latency (200-400ms)
Better for complex transformations
Best for: Professional content, offline processing

Extra Buffer Management

The extra buffer provides additional processing headroom and can prevent audio artifacts:

4096-8192: Minimal buffering for real-time applications, may cause dropouts under load
8192-16384: Standard buffering, good balance for most users and hardware
16384-32768: Maximum stability, prevents artifacts but increases memory usage
Auto-adjustment: VoxityAI™ can automatically optimize buffer size based on system performance

F0 Detector Selection

Pitch detection algorithms vary significantly in quality, performance, and resource requirements:

RMVPE (Recommended):

Highest accuracy for pitch detection
Excellent handling of vibrato and pitch bends
Moderate CPU usage, GPU acceleration available
Best for: Professional use, singing, complex speech patterns

Crepe/Crepe-Full:

Very high quality, especially for singing
Excellent noise robustness
Higher CPU/GPU usage than RMVPE
Best for: Music applications, noisy environments

Harvest:

Good balance of quality and performance
Works well on older hardware
Reliable for most speech applications
Best for: General use, resource-constrained systems

DIO:

Fastest processing speed
Lowest resource usage
Basic quality suitable for simple transformations
Best for: Real-time applications on limited hardware

Advanced Audio Processing Features

VoxityAI™ includes sophisticated audio processing capabilities that can significantly enhance voice conversion quality:

Noise Suppression Systems

Multiple noise reduction algorithms work together to provide clean input signals:

Spectral Noise Reduction:

Analyzes frequency spectrum to identify and reduce constant background noise
Effective against: Air conditioning, computer fans, electrical hum
Settings: Mild (preserve naturalness) to Aggressive (maximum reduction)

AI-Powered Noise Suppression:

Machine learning algorithms distinguish between voice and noise
Effective against: Keyboard typing, mouse clicks, intermittent sounds
Adapts in real-time to changing acoustic environments

Echo Cancellation:

Removes acoustic feedback and room reflections
Essential for speaker-based monitoring setups
Automatically adapts to room acoustics and speaker positioning

Dynamic Range and Level Control

Precise control over audio levels ensures optimal voice conversion and prevents artifacts:

Input Gain (0.1-3.0): Amplify or attenuate microphone signal before processing
Output Gain (0.1-5.0): Control converted voice volume for optimal integration
Silence Threshold: Minimum volume level required to trigger voice conversion
Voice Activity Detection: Intelligent detection of speech vs. silence periods
Auto-Leveling: Automatic gain control to maintain consistent output levels
Peak Limiting: Prevents audio clipping and distortion in loud passages

Sample Rate and Quality Settings

Audio quality parameters that balance fidelity with performance requirements:

48kHz/32-bit float: Professional quality, highest fidelity, increased processing load
44.1kHz/24-bit: High quality, good balance for most applications
22kHz/16-bit: Basic quality, minimal processing requirements
Adaptive Quality: Automatically adjusts based on system performance and model requirements

Windows Installation Guide

Complete setup guide for Windows 10/11 with GPU optimization and professional configuration

Windows installation of VoxityAI™ offers the most comprehensive feature set and best performance optimization. This guide covers everything from basic installation to advanced GPU configuration and professional audio setup.

Pre-Installation System Preparation

Proper system preparation ensures smooth installation and optimal performance. Complete these steps before installing VoxityAI™:

Essential System Components

Windows 10 Build 1903+ or Windows 11 (64-bit required)

Microsoft Visual C++ Redistributable 2019-2022 (both x86 and x64)

Latest GPU drivers from manufacturer (NVIDIA/AMD/Intel)

.NET Framework 4.8+ or .NET 6.0 Runtime

Administrator privileges for initial installation

Windows Defender exclusions configured for installation directory

GPU-Optimized Installation Paths

VoxityAI™ provides hardware-specific builds to maximize performance on different GPU architectures. Choose the build that matches your system configuration:

🟢 NVIDIA GPU Installation (CUDA)

Recommended

Optimal for: RTX 20/30/40 series, GTX 16 series, and professional NVIDIA GPUs with 4GB+ VRAM

Hardware Requirements

NVIDIA GPU with Compute Capability 6.0+ (GTX 1060 or newer)
NVIDIA Game Ready or Studio Drivers 472.12+
CUDA Toolkit 11.7/11.8 (included with VoxityAI™)
4GB VRAM minimum, 8GB+ recommended for large models
PCIe 3.0 x16 slot for optimal bandwidth

Installation Process

1

Access CUDA Build: Locate vcclient_win_cuda_[version].zip (~3.7GB)

This package includes optimized PyTorch with CUDA 11.8 support and cuDNN libraries

2

Extract Installation: Unzip to clean directory path (avoid spaces/special characters)

Recommended: C:\VoxityAI\ or D:\Software\VoxityAI\

3

Initialize System: Run start_http.bat as Administrator

First launch installs CUDA runtime and dependencies (~500MB additional)

4

Verify GPU Detection: Check console output for CUDA initialization

                                                ✓ CUDA device detected: GeForce RTX 4070 Ti (12GB VRAM)
✓ cuDNN initialized successfully
✓ PyTorch CUDA backend ready
                                            

Expected Performance (CUDA)

RTX 4090: 30-60ms latency, handles any model size
RTX 4070/4080: 50-100ms latency, excellent for all use cases
RTX 3070/3080: 80-150ms latency, very good performance
RTX 2060/2070: 120-250ms latency, good for streaming

🔴 AMD GPU Installation (DirectML)

Optimal for: RX 5000/6000/7000 series AMD GPUs with DirectML acceleration support

Hardware Requirements

AMD GPU with DirectML support (RX 580 or newer)
AMD Adrenalin drivers 22.5.1 or newer
Windows 10 version 1903+ (DirectML framework requirement)
6GB VRAM minimum for optimal performance
Smart Access Memory enabled (if supported)

Installation Process

1

Access Standard Build: Locate vcclient_win_std_[version].zip

Includes DirectML runtime and ONNX optimization for AMD hardware

2

Verify DirectML: Ensure DirectML is enabled in Windows Features

Windows Features → Machine Learning Platform

3

Optimize for ONNX: Configure VoxityAI™ to prefer ONNX model format

ONNX models provide 20-40% better performance on AMD GPUs

AMD-Specific Optimizations

ONNX Conversion: Convert PyTorch models to ONNX for better performance
Memory Management: Enable GPU memory optimization in AMD drivers
Power Settings: Set GPU power limit to maximum for consistent performance
Compute Workloads: Enable GPU compute optimizations in Adrenalin software

🔵 Intel GPU Installation (XPU/DirectML)

Beta Support

Optimal for: Intel Arc discrete GPUs and 12th/13th gen integrated graphics

Hardware Requirements

Intel Arc A-series GPU or 11th gen+ CPU with Iris Xe
Intel Graphics drivers 30.0.101.1404 or newer
Intel XPU runtime and oneAPI toolkit (auto-installed)
4GB+ system memory allocated to integrated graphics

⚠️ Beta Status Notice

Intel GPU support is experimental and may have stability issues. CPU fallback is automatically used if GPU acceleration fails. Performance varies significantly between different Intel architectures.

Expected Performance (Intel)

Arc A770: 150-300ms latency, decent for lightweight models
Arc A750: 200-400ms latency, basic voice conversion
Iris Xe: CPU fallback recommended for real-time use

⚫ CPU-Only Installation

Universal compatibility for systems without supported GPU acceleration

Access the standard version and VoxityAI™ automatically configures CPU-based processing. While slower than GPU acceleration, modern multi-core processors can achieve acceptable performance for many voice conversion tasks.

CPU Performance Optimization

Thread Allocation: VoxityAI™ uses 50-75% of available CPU cores
Priority Settings: Set process priority to "High" for better responsiveness
Power Management: Use "High Performance" power plan during processing
Background Apps: Close unnecessary applications to free CPU resources

Performance Expectations (CPU-Only)

High-end CPU (i9/Ryzen 9): 300-600ms latency, suitable for content creation
Mid-range CPU (i7/Ryzen 7): 500-1000ms latency, adequate for offline processing
Entry-level CPU (i5/Ryzen 5): 800ms+ latency, basic functionality only

Post-Installation Configuration

After successful installation, these configuration steps optimize VoxityAI™ for professional use:

System Security and Permissions

Windows Firewall Configuration

VoxityAI™ operates a local web server on port 18888. Configure firewall rules:

                                        Windows Security → Firewall & network protection → Allow an app through firewall
Add: MMVCServerSIO.exe (both Private and Public networks)
                                    

Antivirus Exclusions

Add VoxityAI™ directory to antivirus exclusions to prevent interference:

Exclude entire VoxityAI™ installation folder
Exclude voice model directories
Exclude temporary processing folders
Allow network connections for localhost:18888

Browser Permissions

Configure browser settings for optimal functionality:

Allow microphone access for localhost
Enable hardware acceleration in browser settings
Disable browser audio processing/enhancement
Set localhost as trusted site for automatic media playback

Audio System Optimization

Windows Audio Configuration

Optimize Windows audio settings for low-latency performance:

                                        Run → mmsys.cpl → Advanced → Default Format: 48000 Hz, 24-bit
Exclusive Mode: Allow applications to take exclusive control
Disable all audio enhancements and effects
                                    

ASIO Driver Support

For professional audio interfaces, install appropriate ASIO drivers:

Interface-specific ASIO: Use manufacturer drivers for best performance
ASIO4ALL: Universal ASIO driver for consumer hardware
Buffer Settings: 128-256 samples for low latency
Sample Rate: Match VoxityAI™ settings (48kHz recommended)

Virtual Audio Cable Setup

Configure virtual audio routing for application integration:

VB-Cable (Free - Basic)

✓ Single virtual cable
✓ Easy installation and setup
✓ Compatible with most applications
✗ Limited to one audio stream
✗ May introduce slight latency

Best for: Discord, basic OBS streaming

VB-Audio Voicemeeter (Free - Advanced)

✓ Full virtual mixing console
✓ Multiple input/output channels
✓ Built-in audio processing
✓ Hardware integration support
✗ More complex configuration

Best for: Complex streaming setups, multiple applications

Virtual Audio Cable (Professional - Paid)

✓ Up to 256 virtual cables
✓ Ultra-low latency design
✓ Professional stability
✓ Advanced configuration options
✗ License cost (~$25)

Best for: Professional studios, commercial use

Performance Tuning

Windows Power Management

Set power plan to "High Performance" or "Ultimate Performance"
Disable CPU power management (prevent frequency scaling)
Set GPU power management to "Prefer maximum performance"
Disable Windows Game Mode (can interfere with audio processing)

Process Priority Optimization

                                        Task Manager → Details → MMVCServerSIO.exe → Set Priority: High
Set Affinity: Use specific CPU cores for dedicated processing
                                    

GPU Driver Optimization

NVIDIA: Enable "Prefer maximum performance" in NVIDIA Control Panel
AMD: Set GPU power limit to maximum in Adrenalin software
Intel: Enable GPU compute scheduling in Windows settings
All GPUs: Disable power-saving features during voice processing

Windows-Specific Troubleshooting

Installation and Startup Issues

Application fails to start or crashes immediately

Diagnostic Steps:

Run start_http.bat as Administrator
Check Windows Event Viewer for application errors
Verify all prerequisite software is installed
Temporarily disable antivirus and Windows Defender
Ensure no other software is using port 18888

Common Solutions:

Install both x86 and x64 versions of Visual C++ Redistributable
Update Windows to latest version
Run Windows System File Checker: sfc /scannow
Clear Windows audio device cache and restart audio services

CUDA/GPU acceleration not working

NVIDIA GPU Troubleshooting:

Verify GPU compatibility: nvidia-smi in Command Prompt
Update to latest Game Ready or Studio drivers
Check CUDA installation: nvcc --version
Monitor GPU usage during voice processing
Verify sufficient VRAM availability

AMD GPU Troubleshooting:

Ensure DirectML is enabled in Windows Features
Update to latest Adrenalin drivers
Check GPU recognition in Device Manager
Enable GPU scheduling in Windows Graphics settings
Convert models to ONNX format for better compatibility

High latency despite powerful hardware

System-Level Optimizations:

Set Windows Timer Resolution to 1ms using TimerTool
Disable Windows audio enhancements completely
Use dedicated audio hardware with ASIO drivers
Reduce Windows DPC latency using LatencyMon tool
Disable Windows Update during voice processing sessions

Application-Level Optimizations:

Use Server mode instead of Client mode processing
Reduce chunk size to 128-256 samples
Switch to lighter F0 detector (DIO or Harvest)
Enable GPU memory optimization
Close unnecessary browser tabs and applications

Audio and Performance Issues

No audio output or silent processing

Audio System Diagnosis:

Test microphone in Windows Sound settings
Verify VoxityAI™ input/output device selection
Check audio format compatibility (48kHz, 24-bit recommended)
Disable exclusive mode temporarily
Test with different audio devices

Voice quality issues (robotic, distorted, unnatural)

Quality Optimization Steps:

Reduce pitch adjustment to ±4 semitones maximum
Match voice model to your vocal characteristics
Use RMVPE or Crepe f0 detector for better pitch accuracy
Adjust Index ratio between 0.6-0.8 for natural blending
Ensure consistent microphone positioning and distance
Enable noise suppression if recording in noisy environment

macOS Installation Guide

Complete setup for Apple Silicon and Intel Macs with optimization tips

macOS installation provides excellent performance on Apple Silicon processors and good compatibility with Intel-based Macs. This guide covers both architectures with specific optimizations for each platform.

macOS Compatibility Overview

🍎 Apple Silicon Macs (Highly Recommended)

Supported Models:

M1, M1 Pro, M1 Max, M1 Ultra, M2, M2 Pro, M2 Max, M3, M3 Pro, M3 Max

macOS Version:

macOS 12.0 (Monterey) or later for optimal performance

Memory Requirements:

8GB unified memory minimum, 16GB+ strongly recommended

Performance Benefits:

Metal Performance Shaders acceleration, unified memory architecture

Apple Silicon Advantages:

Unified Memory: No separate VRAM limitations, models load faster
Metal Acceleration: Optimized neural network processing
Power Efficiency: Excellent performance per watt for battery use
Thermal Management: Intelligent scaling prevents overheating

💻 Intel Macs (Legacy Support)

Supported Models:

MacBook Pro 2016+, iMac 2017+, Mac Pro 2019+, iMac Pro

macOS Version:

macOS 10.15 (Catalina) minimum, 11.0+ recommended

Memory Requirements:

16GB RAM minimum for adequate performance

GPU Requirements:

Dedicated GPU recommended (Radeon Pro 560 or better)

Linux Installation Guide

Advanced installation for Linux distributions with manual compilation and GPU optimization

Advanced Users Only: Linux installation requires manual compilation, dependency management, and system configuration. Pre-built binaries are not available for Linux distributions.

Supported Linux Distributions

VoxityAI™ has been successfully compiled and tested on the following distributions. While other distributions may work, these are officially verified:

Ubuntu / Debian Family

Ubuntu 22.04 LTS Ubuntu 20.04 LTS Debian 11 Linux Mint 21 Pop!_OS 22.04

Package Manager: apt, snap

Red Hat Family

Fedora 37+ CentOS Stream 9 RHEL 9+ Rocky Linux 9

Package Manager: dnf, rpm

Arch Family

Arch Linux Manjaro EndeavourOS

Package Manager: pacman, AUR

Other Distributions

openSUSE Leap 15.4+ Gentoo (advanced)

Note: May require additional configuration

Build Dependencies and Prerequisites

Before compiling VoxityAI™, install all required development tools and libraries:

Core Development Tools

Ubuntu/Debian:

                                    sudo apt update && sudo apt upgrade
sudo apt install -y build-essential cmake git curl wget unzip
sudo apt install -y python3.10 python3.10-dev python3.10-venv python3-pip
sudo apt install -y pkg-config libffi-dev libssl-dev zlib1g-dev
sudo apt install -y libbz2-dev libreadline-dev libsqlite3-dev llvm
sudo apt install -y libncurses5-dev libncursesw5-dev xz-utils tk-dev
                                

Fedora/CentOS:

                                    sudo dnf groupinstall -y "Development Tools" "Development Libraries"
sudo dnf install -y cmake git curl wget unzip python3-devel python3-pip
sudo dnf install -y pkgconfig libffi-devel openssl-devel zlib-devel
sudo dnf install -y bzip2-devel readline-devel sqlite-devel llvm-devel
                                

Arch Linux:

                                    sudo pacman -Syu
sudo pacman -S --needed base-devel cmake git curl wget unzip
sudo pacman -S python python-pip pkgconf libffi openssl zlib
                                

Audio System Dependencies

Ubuntu/Debian:

                                    # ALSA and PulseAudio
sudo apt install -y libasound2-dev portaudio19-dev libportaudio2
sudo apt install -y pulseaudio-dev libjack-jackd2-dev jackd2
sudo apt install -y libsndfile1-dev libfftw3-dev libsamplerate0-dev

# Optional: JACK for professional audio
sudo apt install -y qjackctl jack-tools
                                

Fedora/CentOS:

                                    sudo dnf install -y alsa-lib-devel portaudio-devel
sudo dnf install -y pulseaudio-libs-devel jack-audio-connection-kit-devel
sudo dnf install -y libsndfile-devel fftw-devel libsamplerate-devel
                                

Arch Linux:

                                    sudo pacman -S alsa-lib portaudio pulseaudio jack2
sudo pacman -S libsndfile fftw libsamplerate qjackctl
                                

GPU Acceleration Dependencies

NVIDIA CUDA Support:

                                    # Ubuntu CUDA installation
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt update
sudo apt install -y cuda-toolkit-11-8 nvidia-driver-530

# Verify installation
nvidia-smi
nvcc --version
                                

AMD ROCm Support (Ubuntu 22.04):

                                    # Add ROCm repository
wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -
echo 'deb [arch=amd64] https://repo.radeon.com/rocm/apt/5.6 ubuntu main' | sudo tee /etc/apt/sources.list.d/rocm.list
sudo apt update
sudo apt install -y rocm-dev rocm-libs rocm-utils

# Add user to render group
sudo usermod -a -G render,video $USER
                                

Compilation Process

Step 1: Source Code and Environment Setup

                                # Clone the VoxityAI repository (W-Okada base)
git clone https://github.com/w-okada/voice-changer.git
cd voice-changer

# Create Python virtual environment
python3.10 -m venv voxityai-env
source voxityai-env/bin/activate

# Upgrade pip and core packages
pip install --upgrade pip wheel setuptools

# Install build dependencies
pip install cython numpy
                            

Step 2: PyTorch Installation (GPU-Specific)

For NVIDIA CUDA 11.8:

                                    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
                                

For AMD ROCm 5.6:

                                    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6
                                

For CPU-only Installation:

                                    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
                                

Step 3: Install Application Dependencies

                                # Install Python requirements
pip install -r requirements.txt

# Install additional dependencies for Linux
pip install soundfile librosa pyaudio

# For better performance (optional)
pip install onnxruntime-gpu  # For NVIDIA
# OR
pip install onnxruntime      # For CPU/other GPUs
                            

Step 4: Audio Backend Configuration

Linux supports multiple audio backends. Choose the appropriate one for your setup:

PulseAudio (Recommended for Desktop)

                                        # Configure PulseAudio for low latency
sudo tee -a /etc/pulse/daemon.conf << EOF
default-sample-rate = 48000
alternate-sample-rate = 44100
default-sample-channels = 2
default-fragments = 2
default-fragment-size-msec = 4
high-priority = yes
nice-level = -15
realtime-scheduling = yes
realtime-priority = 5
EOF

# Restart PulseAudio
systemctl --user restart pulseaudio
                                    

JACK (Professional Audio)

                                        # Start JACK with optimized settings
jackd -d alsa -r 48000 -p 256 -n 2 -D -Chw:0,0 -Phw:0,0

# Or use QjackCtl for GUI configuration
# Recommended settings:
# Sample Rate: 48000 Hz
# Frames/Period: 256
# Periods/Buffer: 2
                                    

ALSA (Direct Hardware Access)

                                        # List available audio devices
aplay -l
arecord -l

# Configure ALSA for low latency
sudo tee /etc/asound.conf << EOF
pcm.!default {
    type pulse
}
ctl.!default {
    type pulse
}
EOF
                                    

Step 5: Launch VoxityAI™

                                # Activate virtual environment
source voxityai-env/bin/activate

# Start the main server
python MMVCServerSIO.py --host 0.0.0.0 --port 18888

# In another terminal, verify the web interface
curl http://localhost:18888
                            

Access the VoxityAI™ web interface at http://localhost:18888 in your browser.

Linux-Specific Optimizations

System Performance Tuning

CPU Governor and Power Management

                                        # Set CPU governor to performance mode
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

# Make permanent by adding to /etc/rc.local
sudo tee -a /etc/rc.local << EOF
#!/bin/bash
echo performance > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
exit 0
EOF
sudo chmod +x /etc/rc.local
                                    

Real-time Priority and Resource Limits

                                        # Add user to audio group for real-time priority
sudo usermod -a -G audio $USER

# Configure resource limits for real-time scheduling
sudo tee -a /etc/security/limits.conf << EOF
@audio - rtprio 95
@audio - memlock unlimited
@audio - nice -10
EOF

# Enable real-time scheduling in PAM
echo "session required pam_limits.so" | sudo tee -a /etc/pam.d/common-session
                                    

Kernel Optimization (Advanced)

                                        # Edit GRUB configuration for low-latency kernel
sudo nano /etc/default/grub

# Add these parameters to GRUB_CMDLINE_LINUX_DEFAULT:
# threadirqs processor.max_cstate=1 intel_idle.max_cstate=0 idle=poll

# Example:
# GRUB_CMDLINE_LINUX_DEFAULT="quiet splash threadirqs processor.max_cstate=1"

# Update GRUB and reboot
sudo update-grub
sudo reboot
                                    

Warning: These optimizations may increase power consumption and reduce battery life on laptops.

Audio System Optimization

PulseAudio Advanced Configuration

                                        # Create custom PulseAudio configuration for VoxityAI™
mkdir -p ~/.config/pulse

# Create low-latency configuration
tee ~/.config/pulse/daemon.conf << EOF
default-sample-rate = 48000
alternate-sample-rate = 44100
default-sample-channels = 2
default-channel-map = front-left,front-right
default-fragments = 2
default-fragment-size-msec = 1
high-priority = yes
nice-level = -15
realtime-scheduling = yes
realtime-priority = 5
rlimit-rtprio = 9
daemonize = no
EOF
                                    

JACK Professional Setup

                                        # Install real-time kernel (Ubuntu)
sudo apt install linux-lowlatency

# Configure JACK for maximum performance
# Create ~/.jackdrc with optimal settings:
tee ~/.jackdrc << EOF
/usr/bin/jackd -R -P75 -dalsa -dhw:0,0 -r48000 -p128 -n2 -D -Chw:0,0 -Phw:0,0
EOF

# Start JACK and verify low latency
jack_control start
jack_control status
                                    

GPU Optimization

NVIDIA GPU Optimization

                                        # Enable persistence mode for consistent performance
sudo nvidia-smi -pm 1

# Set maximum performance mode
sudo nvidia-smi -ac 877,1455  # Adjust values for your GPU

# Monitor GPU usage during voice processing
watch -n 1 nvidia-smi
                                    

AMD GPU Optimization (ROCm)

                                        # Set GPU power and performance profiles
echo "performance" | sudo tee /sys/class/drm/card0/device/power_dpm_force_performance_level

# Monitor AMD GPU usage
watch -n 1 rocm-smi
                                    

Linux Troubleshooting

Common Installation Issues

Permission denied errors during compilation

                                        # Ensure user is in necessary groups
sudo usermod -a -G audio,video,render,input $USER

# Fix common permission issues
sudo chmod 666 /dev/nvidia*
sudo chmod 666 /dev/dri/*

# Create udev rules for persistent permissions
sudo tee /etc/udev/rules.d/99-voxityai.rules << EOF
KERNEL=="nvidia*", GROUP="video", MODE="0666"
KERNEL=="card*", GROUP="video", MODE="0666"
EOF

sudo udevadm control --reload-rules
                                    

Audio system not detected or high latency

Diagnostic Commands:

                                        # Check audio devices
aplay -l
arecord -l

# Test PulseAudio
pactl list sources short
pactl list sinks short

# Check real-time capabilities
ulimit -r
groups $USER

# Test audio latency
pa-info | grep -i latency
                                    

Solutions:

Install real-time kernel: sudo apt install linux-lowlatency
Add user to audio group and configure limits as shown above
Use JACK for professional low-latency audio
Disable audio power management: echo 0 | sudo tee /sys/module/snd_hda_intel/parameters/power_save

GPU acceleration not working

NVIDIA Troubleshooting:

                                        # Verify NVIDIA driver installation
nvidia-smi
lsmod | grep nvidia

# Check CUDA installation
nvcc --version
python -c "import torch; print(torch.cuda.is_available())"

# Reinstall NVIDIA drivers if needed
sudo apt purge nvidia-*
sudo apt install nvidia-driver-530 nvidia-dkms-530
                                    

AMD Troubleshooting:

                                        # Check AMD GPU recognition
lspci | grep -i amd
rocm-smi

# Verify ROCm installation
python -c "import torch; print(torch.version.hip)"
                                    

Creating Linux Service (Optional)

For automatic startup and system integration, create a systemd service:

                        # Create systemd service file
sudo tee /etc/systemd/system/voxityai.service << EOF
[Unit]
Description=VoxityAI Voice Changer Service
After=network.target sound.target

[Service]
Type=simple
User=$USER
WorkingDirectory=/path/to/voice-changer
ExecStart=/path/to/voice-changer/voxityai-env/bin/python MMVCServerSIO.py
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

# Enable and start the service
sudo systemctl daemon-reload
sudo systemctl enable voxityai
sudo systemctl start voxityai

# Check service status
sudo systemctl status voxityai
                    

Audio Routing Setup

Professional audio routing and virtual cable configuration for seamless integration

Audio routing is the foundation of professional VoxityAI™ integration. This comprehensive guide covers virtual audio cable setup, application routing, and advanced multi-destination audio workflows for content creators, streamers, and professional users.

Virtual Audio Solutions by Platform

🪟 Windows Virtual Audio Solutions

VB-Audio Virtual Cable

Free

⭐⭐⭐⭐⭐

Most popular virtual audio solution with excellent compatibility across applications.

Advantages:

✓ Completely free and well-supported
✓ Simple installation and configuration
✓ Excellent compatibility with Discord, OBS, games
✓ Stable and reliable for most users
✓ Low CPU overhead

Limitations:

✗ Single virtual cable only
✗ No built-in mixing capabilities
✗ Limited advanced configuration options

Best For: Basic Discord integration, simple OBS setups, casual streaming

Installation:

Download from VB-Audio.com
Run installer as Administrator
Restart computer after installation
Configure VoxityAI™ output to "CABLE Input (VB-Audio Virtual Cable)"
Set Discord/OBS input to "CABLE Output (VB-Audio Virtual Cable)"

VB-Audio Voicemeeter

Free

⭐⭐⭐⭐⭐

Complete virtual mixing console with multiple inputs, outputs, and professional audio processing.

Advantages:

✓ Full virtual mixing console
✓ Multiple virtual inputs/outputs (VAIO)
✓ Built-in EQ, compression, and effects
✓ Advanced routing and monitoring
✓ Professional-grade features

Limitations:

✗ Steeper learning curve
✗ More complex setup process
✗ Higher CPU usage than simple cables

Best For: Advanced streaming setups, multiple audio sources, professional content creation

Available Versions:

Voicemeeter: Basic version with 2 hardware + 1 virtual input
Voicemeeter Banana: 3 hardware + 2 virtual inputs
Voicemeeter Potato: 5 hardware + 3 virtual inputs (most advanced)

Virtual Audio Cable (Professional)

$25

⭐⭐⭐⭐⭐

Professional-grade virtual audio solution with minimal latency and maximum stability.

Advantages:

✓ Up to 256 virtual audio cables
✓ Ultra-low latency design
✓ Professional stability and reliability
✓ Advanced configuration options
✓ Dedicated technical support

Limitations:

✗ Commercial license required
✗ More complex than free alternatives

Best For: Professional studios, commercial broadcasting, complex multi-channel setups

🍎 macOS Virtual Audio Solutions

BlackHole

Free & Open Source

⭐⭐⭐⭐⭐

Modern virtual audio driver designed specifically for macOS with excellent system integration.

Key Features:

Zero latency virtual audio driver
Multiple channel configurations (2ch, 16ch, 64ch)
No trial period or license restrictions
Active development and macOS compatibility
Native Apple Silicon and Intel support

Installation Process:

Install via Homebrew: brew install blackhole-2ch
Or download installer from GitHub
Open Audio MIDI Setup (Applications > Utilities)
Create Multi-Output Device combining speakers + BlackHole
Set VoxityAI™ output to BlackHole 2ch
Configure target applications to use BlackHole as input

Loopback by Rogue Amoeba

$109

⭐⭐⭐⭐⭐

Professional audio routing solution with visual interface and advanced mixing capabilities.

Professional Features:

Visual cable management interface
Multiple virtual devices with custom configurations
Real-time audio monitoring and level control
Session saving and recall
Advanced routing matrix
Professional customer support

Best For: Professional podcasters, musicians, complex audio setups requiring visual management

🐧 Linux Virtual Audio Solutions

PulseAudio Virtual Sinks

Native Linux solution using PulseAudio's built-in virtual device capabilities.

                                        # Create virtual sink for VoxityAI output
pactl load-module module-null-sink sink_name=VoxityAI_Output sink_properties=device.description="VoxityAI_Output"

# Create virtual source from sink monitor
pactl load-module module-virtual-source source_name=VoxityAI_Input master=VoxityAI_Output.monitor source_properties=device.description="VoxityAI_Input"

# Make configuration persistent
echo "load-module module-null-sink sink_name=VoxityAI_Output sink_properties=device.description=\"VoxityAI_Output\"" >> ~/.config/pulse/default.pa
echo "load-module module-virtual-source source_name=VoxityAI_Input master=VoxityAI_Output.monitor source_properties=device.description=\"VoxityAI_Input\"" >> ~/.config/pulse/default.pa
                                    

JACK Audio Routing

Professional audio routing system with ultra-low latency and advanced connection management.

                                        # Start JACK with optimal settings
jackd -d alsa -r 48000 -p 256 -n 2

# Connect VoxityAI output to applications using command line
jack_connect VoxityAI:output discord:input
jack_connect VoxityAI:output obs:microphone_input

# Or use QjackCtl for graphical connection management
qjackctl
                                    

PipeWire (Modern Alternative)

Next-generation audio server with JACK compatibility and PulseAudio replacement capabilities.

                                        # Install PipeWire (Ubuntu 22.04+)
sudo apt install pipewire pipewire-pulse pipewire-jack
systemctl --user enable pipewire pipewire-pulse

# Create virtual devices using pw-loopback
pw-loopback -P "VoxityAI_Output" -C "VoxityAI_Input"
                                    

Application-Specific Integration Guides

Discord Integration

Set up VoxityAI™ for seamless Discord voice chat with real-time voice transformation.

1

Configure Virtual Audio

Set up virtual audio cable using your platform's preferred solution (VB-Cable, BlackHole, etc.)

2

VoxityAI™ Output Configuration

Set VoxityAI™ output device to virtual cable input:

Windows: "CABLE Input (VB-Audio Virtual Cable)"
macOS: "BlackHole 2ch"
Linux: "VoxityAI_Output"

3

Discord Audio Settings

Configure Discord to receive processed audio:

Open Discord Settings > Voice & Video
Set Input Device to virtual cable output
Disable Discord's noise suppression and echo cancellation
Set Input Sensitivity to manual mode
Test microphone to verify voice transformation

4

Optimize for Real-Time Use

Configure settings for minimal latency during voice chat:

Enable Push-to-Talk to avoid background processing
Use lightweight voice models (Beatrice v2)
Set chunk size to 256 or lower
Monitor CPU/GPU usage during extended calls

Discord Pro Tips

Voice Activity: Fine-tune Discord's input sensitivity to match your converted voice levels
Quality Settings: Use Discord's "High" quality mode for best voice transmission
Regional Servers: Choose Discord servers geographically close to reduce network latency
Bandwidth: Ensure stable internet connection for consistent voice quality

OBS Studio Integration

Professional streaming setup with VoxityAI™ for live content creation and broadcasting.

Basic OBS Integration

Add "Audio Input Capture" source to your scene
Select virtual cable output as the audio device
Configure audio monitoring to "Monitor and Output"
Adjust audio levels in OBS mixer

Advanced Multi-Source Setup

For complex streaming setups with multiple audio sources:

Microphone → VoxityAI™

↓

Virtual Cable

↓

OBS (Stream)

Discord (Chat)

Game Chat

Local Monitor

OBS Performance Optimization

Audio Filters: Add noise gate, compressor, and EQ for professional sound
Sync Offset: Adjust audio sync if video/audio become misaligned
Sample Rate: Match OBS sample rate to VoxityAI™ (48kHz recommended)
Monitoring: Use OBS's advanced audio monitoring for real-time feedback

Video Conferencing (Zoom/Teams/Meet)

Professional voice transformation for business meetings and video conferences.

Professional Ethics Notice: Always inform participants when using voice modification in professional settings. Some organizations have policies regarding voice alteration in business communications.

Zoom Configuration

Open Zoom Settings > Audio
Select virtual cable as microphone input
Disable automatic gain control and noise reduction
Test audio using Zoom's microphone test feature
Adjust input volume to optimal levels

Microsoft Teams Configuration

Access Settings > Devices
Choose virtual audio device as microphone
Disable Teams' audio processing features
Test audio quality in device settings

Google Meet Configuration

Click Settings gear during meeting
Select Audio tab
Choose virtual audio device for microphone
Verify audio levels are appropriate

Video Conferencing Best Practices

Quality Testing: Test voice transformation before important meetings
Backup Plan: Have fallback audio source ready in case of technical issues
Bandwidth: Monitor network performance during voice processing
Consistency: Use consistent voice settings throughout meetings
Transparency: Inform participants about voice modification when appropriate

Gaming Integration

Optimize VoxityAI™ for gaming scenarios with minimal performance impact and seamless voice chat.

Performance-First Gaming Setup

Voice Model Selection:

Use Beatrice v2 models for lowest latency
Avoid complex RVC models during competitive gaming
Pre-load models before gaming sessions

Resource Management:

Set VoxityAI™ process priority to "Normal" (not High) during gaming
Use dedicated GPU cores if available
Monitor system temperature during extended sessions

Audio Settings:

Chunk size: 256 samples maximum
F0 Detector: DIO or Harvest for speed
Disable unnecessary audio processing

Popular Games Integration

Discord-Based Games:

For games using Discord for voice chat, follow Discord integration steps above.

In-Game Voice Chat:

Set game's microphone input to virtual cable output
Test voice chat in game settings before playing
Use Push-to-Talk for better control

Streaming While Gaming:

Use server-client setup to offload processing
Route game audio separately from voice
Monitor system performance continuously

Advanced Audio Routing Scenarios

Multi-Destination Broadcasting

Route VoxityAI™ output to multiple applications simultaneously with independent level control.

Using Voicemeeter for Multi-Routing:

Hardware Input 1: Physical microphone
Virtual Input (VAIO): VoxityAI™ output destination
Hardware Output A1: Speakers/headphones for monitoring
Virtual Output B1: Discord/game chat
Virtual Output B2: OBS/streaming software
Bus Assignment: Route VAIO to A1+B1+B2 with independent level control

Benefits of Multi-Routing:

Independent volume control for each destination
Different processing for chat vs. stream
Backup audio routing for redundancy
Real-time level monitoring and adjustment

Professional Studio Integration

Integrate VoxityAI™ into professional recording and broadcast environments.

Hardware Integration Chain:

Microphone

→

Audio Interface

→

VoxityAI™ Processing

→

Virtual Routing

→

DAW/Broadcast

Professional Considerations:

Latency Monitoring: Use professional audio interfaces with hardware monitoring
Backup Systems: Always have non-processed audio backup for live broadcasts
Quality Control: Monitor output quality continuously during long sessions
Documentation: Document all routing and settings for session recall

Remote Collaboration Setup

Configure VoxityAI™ for remote recording sessions and collaborative content creation.

Client-Server Voice Processing:

Run VoxityAI™ server on a powerful desktop while using lightweight clients for actual recording/communication:

Server Machine (High-Performance):

Runs MMVCServerSIO with GPU acceleration
Hosts voice models and processing engine
Provides web interface for multiple clients

Client Machine (Lightweight):

Connects to server via web browser
Handles only audio input/output
Minimal local processing requirements

Network Optimization:

Bandwidth: Minimum 1 Mbps upload/download for stable audio streaming
Latency: Local network preferred, VPN may add latency
Stability: Wired connection recommended over WiFi
Security: Use VPN or SSH tunneling for internet-based connections

Audio Routing Troubleshooting

Common Audio Routing Issues

No audio in target application

Systematic Diagnosis:

Verify VoxityAI™ is outputting to correct virtual device
Check target application audio input settings
Test virtual cable with system audio recorder
Restart VoxityAI™ and target application
Verify system audio permissions and privacy settings

Platform-Specific Solutions:

Windows:

Check Windows audio privacy settings
Verify virtual cable appears in mmsys.cpl
Restart Windows Audio service

macOS:

Grant microphone permissions to browser and applications
Check Audio MIDI Setup for device visibility
Verify BlackHole installation and version

Linux:

Check PulseAudio/PipeWire device list
Verify user audio group membership
Test with pavucontrol for visual debugging

Audio feedback and echo issues

Feedback Prevention:

Always use headphones instead of speakers when possible
Ensure virtual cable output is not set as system default playback
Disable "Listen to this device" in Windows sound properties
Check for audio loopback in virtual audio software settings
Use directional microphones to reduce ambient pickup

Echo Cancellation:

Enable VoxityAI™'s built-in echo cancellation
Adjust microphone positioning and gain levels
Use acoustic treatment in recording environment
Configure application-specific echo cancellation settings

High latency in audio routing

Latency Reduction Strategies:

Reduce VoxityAI™ chunk size and buffer settings
Use ASIO drivers for professional audio interfaces
Minimize virtual audio cable buffer sizes
Close unnecessary audio applications and effects
Use dedicated audio hardware for critical applications
Optimize system audio service priority and real-time settings

System-Level Optimizations:

Set audio-related processes to high priority
Disable Windows audio enhancements completely
Use exclusive mode for audio devices when possible
Configure system for real-time audio performance

Audio quality degradation through routing

Quality Preservation:

Match sample rates across all audio devices (48kHz recommended)
Use 24-bit audio depth throughout the signal chain
Minimize audio processing chain length
Adjust virtual cable buffer sizes for optimal quality/latency balance
Update audio drivers to latest versions
Use lossless audio formats where possible

Signal Chain Optimization:

Avoid multiple format conversions in the audio path
Set appropriate gain staging to prevent clipping
Monitor audio levels throughout the signal chain
Use professional audio interfaces for critical applications