However, despite its impressive capabilities, W-Okada can be complex to install and configure - especially for users without technical experience.
That's where VoxityAI™ steps in: a streamlined, plug-and-play solution designed to remove the friction and deliver results instantly.
✅ Built around the proven W-Okada core, but wrapped in a minimal and intuitive interface
✅ Ships with a curated selection of voice presets ready to use
✅ No complicated setup or technical steps required
✅ Tailored for English-speaking environments and workflows
You can either spend hours configuring open-source tools — or launch VoxityAI™ in minutes and jump straight into voice-changing.
👉 Get started now at voxityai.com
VoxityAI™ Documentation
Professional real-time voice transformation made simple
Welcome to the comprehensive documentation for VoxityAI™, the premier real-time AI voice changer that transforms your voice instantly for gaming, streaming, content creation, and professional applications. Built on the powerful W-Okada engine developed by Watanabe Okada, VoxityAI™ delivers studio-quality voice transformation with unprecedented ease of use.
VoxityAI™ represents a breakthrough in real-time voice conversion technology, utilizing state-of-the-art artificial intelligence models including RVC (Retrieval-based Voice Conversion), Beatrice v2, MMVC, so-vits-svc, and DDSP-SVC. These cutting-edge algorithms enable natural-sounding voice transformations while preserving emotional content, intonation, and speech patterns.
Instant Setup
Launch in minutes with our streamlined installation process - no complex configuration required
Real-Time Processing
Sub-150ms latency voice transformation for live applications using advanced AI models
Multiple Voice Models
Support for RVC, Beatrice v2, MMVC, so-vits-svc, and DDSP-SVC voice models
High Performance
GPU acceleration for NVIDIA, AMD, and Intel hardware with CPU fallback support
Privacy First
100% local processing - your voice data never leaves your device
Cross-Platform
Works seamlessly on Windows, macOS, Linux, and Google Colab environments
Core Technology
At its foundation, VoxityAI™ leverages the W-Okada voice changer engine, a revolutionary open-source project that has redefined real-time voice conversion. Unlike traditional pitch shifters or simple voice effects, our system uses sophisticated neural networks to analyze and transform vocal characteristics while maintaining the naturalness and emotional expression of human speech.
The technology supports server-client architecture, allowing users to distribute computational load across multiple devices. This means you can run the voice processing server on a powerful desktop while using the converted voice on a lighter device, perfect for gaming setups where system resources are precious.
Quick Navigation
Use Cases and Applications
VoxityAI™ excels across diverse applications where real-time voice transformation enhances user experience and creative expression:
Gaming and Entertainment
Transform your voice to match in-game characters in real-time, creating immersive role-playing experiences. Popular with VTubers and streamers who need consistent character voices across long streaming sessions. The low-latency processing ensures natural conversations without disruptive delays.
Content Creation
Professional voiceover artists use VoxityAI™ to expand their vocal range, creating multiple character voices for animations, podcasts, and audiobooks. The high-quality output rivals traditional studio processing while offering real-time flexibility.
Privacy and Communication
Maintain anonymity during online meetings, gaming sessions, or content creation. Perfect for streamers who want to protect their identity while engaging with audiences, or professionals conducting sensitive discussions.
Language Learning and Accessibility
Simulate different accents for language learning practice, or assist individuals with speech difficulties in achieving their desired vocal expression. The technology preserves speech patterns while transforming vocal characteristics.
Live Streaming and Broadcasting
Seamlessly integrate with OBS Studio, Discord, Twitch, and other platforms. Create engaging content with character voices that respond instantly to your speech, keeping audiences entertained without breaking immersion.
Technical Innovation
VoxityAI™ incorporates several breakthrough technologies that set it apart from conventional voice changers:
- Neural Voice Conversion: Advanced AI models that understand vocal characteristics at a fundamental level, enabling natural transformations that preserve emotional content and speaking style.
- Real-Time Processing: Optimized inference pipelines that deliver voice conversion with minimal latency, suitable for live conversations and interactive applications.
- Cross-Platform Compatibility: Unified codebase that works seamlessly across Windows, macOS, and Linux, with automatic hardware optimization for different GPU vendors.
- Modular Architecture: Support for multiple AI model types allows users to choose the best algorithm for their specific needs, from lightweight real-time models to high-quality offline processing.
- Advanced Audio Processing: Built-in noise suppression, echo cancellation, and audio enhancement features ensure clean, professional output in any environment.
System Requirements
Ensure optimal performance with the right hardware configuration
VoxityAI™ is designed to work across a wide range of hardware configurations, from basic setups suitable for casual use to high-performance systems optimized for professional content creation. Understanding your system's capabilities helps optimize performance and achieve the best possible voice conversion quality.
Minimum Requirements
- Operating System: Windows 10 (64-bit), macOS 10.15+, or Ubuntu 18.04+
- Processor: Intel i5-4590 / AMD FX 8350 equivalent (4+ cores recommended)
- Memory: 8 GB RAM system memory
- Storage: 4 GB available space (additional space for voice models)
- Audio Hardware: Compatible audio input/output device or USB headset
- Network: Internet connection for initial setup and model access
- Browser: Chrome 90+, Firefox 88+, or Safari 14+ for web interface
Expected Performance: 300-800ms latency, suitable for casual use and testing
Recommended Configuration
- Operating System: Windows 11 (64-bit), macOS 12+, or Ubuntu 20.04+
- Processor: Intel i7-8700K / AMD Ryzen 7 2700X or better (8+ cores)
- Memory: 16 GB or more system memory
- Graphics: NVIDIA RTX 2060 / AMD RX 6700 XT / Intel Arc A750 or better
- Storage: 10+ GB available space on SSD (faster model loading)
- Audio Interface: Professional audio interface with low-latency drivers
- Network: Stable broadband connection for model access and updates
Expected Performance: 50-200ms latency, ideal for real-time applications
GPU Acceleration Support
VoxityAI™ leverages GPU acceleration to minimize voice conversion latency and maximize quality. Different GPU architectures offer varying levels of performance optimization:
NVIDIA (CUDA)
Best overall performance with RTX series cards. Requires CUDA 11.7+ drivers and compatible PyTorch installation. Supports Tensor RT optimization for maximum performance.
- RTX 4060/4070/4080/4090 (Latest generation - optimal performance)
- RTX 3060/3070/3080/3090 (Excellent performance with 12GB+ VRAM)
- RTX 2060/2070/2080 (Good performance for most models)
- GTX 1660/1070/1080 (Basic support, limited to smaller models)
- RTX 4090: 30-60ms latency with complex models
- RTX 3070: 80-150ms latency typical
- RTX 2060: 150-250ms latency acceptable
AMD (DirectML/ROCm)
Good performance with RX 6000+ series. DirectML support provides hardware acceleration on Windows, while ROCm enables Linux acceleration.
- RX 7800 XT/7900 XTX (Latest generation with excellent DirectML)
- RX 6700 XT/6800 XT/6900 XT (Proven performance with 12GB+ VRAM)
- RX 580/590/5700 XT (Basic support, adequate for lightweight models)
- Use ONNX model format for best AMD compatibility
- Enable Smart Access Memory for improved performance
- DirectML works best on Windows 11 with latest drivers
Intel (DirectML/XPU)
Emerging support with Intel Arc series and integrated graphics. Excellent power efficiency on laptops with Intel processors.
- Intel Arc A750/A770 (Good acceleration for discrete cards)
- Intel Arc A380 (Basic acceleration, entry-level)
- Intel Iris Xe (Limited acceleration, CPU fallback recommended)
- 12th/13th gen integrated graphics (Basic support)
Apple Silicon (Metal)
Excellent efficiency on M1/M2/M3 Macs with optimized Metal Performance Shaders integration. Unified memory architecture provides unique advantages.
- M3 Pro/Max/Ultra (Exceptional performance, 40-100ms latency)
- M2 Pro/Max/Ultra (Excellent performance, 60-150ms latency)
- M1 Pro/Max/Ultra (Very good performance, 80-200ms latency)
- M1/M2 Base (Good performance for lightweight models)
- No separate VRAM limitation due to unified memory
- Excellent power efficiency for battery-powered use
- Automatic thermal management prevents overheating
Performance Optimization Guidelines
Voice conversion latency and quality depend on several interconnected factors. Understanding these relationships helps optimize your setup for specific use cases:
Hardware Impact on Performance
- High-end GPU (RTX 4080+): 50-100ms typical latency, suitable for professional streaming and content creation
- Mid-range GPU (RTX 3060/RX 6700 XT): 100-200ms typical latency, excellent for gaming and casual streaming
- Entry-level GPU/CPU: 200-500ms typical latency, adequate for offline content creation and testing
- Optimal buffer settings: Chunk size 256-512 samples, Extra buffer 8192-16384 samples
Model Complexity vs. Performance
- Beatrice v2 models: Lightweight, 50-150ms latency, ideal for real-time applications
- RVC models: Balanced quality/performance, 100-300ms latency, most versatile
- MMVC models: Highest quality, 200-500ms latency, best for content creation
- Custom trained models: Performance varies based on architecture and training parameters
Audio Configuration Impact
- Sample Rate: 48kHz recommended for best quality, 44.1kHz for performance
- Buffer Size: Smaller buffers reduce latency but require more processing power
- Bit Depth: 32-bit float recommended for professional use, 16-bit adequate for casual use
- Processing Mode: Server mode generally provides better performance than client mode
Storage and Network Requirements
While VoxityAI™ has modest storage requirements for the base installation, voice models and user data can accumulate significantly over time:
- Base Installation: 2-4 GB depending on platform and included components
- Voice Models: 100MB-2GB per model, depending on complexity and quality
- Model Cache: 1-5 GB for temporary files and conversion cache
- User Data: Audio recordings, configurations, and custom settings
- Network Usage: Initial model access, updates, and cloud synchronization (optional)
SSD storage is strongly recommended for the main installation and frequently used models, as loading times significantly impact user experience when switching between voice models during live sessions.
Quick Start Guide
Get VoxityAI™ running in under 5 minutes
This quick start guide will have you using VoxityAI™ for real-time voice transformation in just a few minutes. Whether you're a streamer, gamer, or content creator, these steps will get you up and running with professional-quality voice conversion.
Launch VoxityAI™
Start the application from your desktop shortcut or applications menu. VoxityAI™ will automatically initialize the web-based interface in your default browser, typically opening at localhost:18888.
Default URL: http://localhost:18888
First Launch Notes: The initial startup may take 2-3 minutes as VoxityAI™ initializes core components, loads voice models, and optimizes settings for your hardware. This delay only occurs during the first launch.
Configure Audio Devices
Proper audio configuration is crucial for optimal voice conversion. Select your input and output devices in the audio settings panel.
Select Voice Model
VoxityAI™ includes several pre-configured voice models optimized for different use cases. You can also import custom models or access community-created voices.
Model Selection Tips: Start with Beatrice v2 models for lowest latency, then experiment with RVC models for higher quality. Each model type has different performance characteristics and optimal use cases.
Fine-Tune Settings
Adjust voice conversion parameters to achieve your desired sound. Most users can achieve excellent results with minimal adjustments.
Recommended Starting Settings:
- For Gaming: Chunk 256, F0 Detector: RMVPE, Pitch: ±4 semitones
- For Streaming: Chunk 512, F0 Detector: Crepe, Index Ratio: 0.7
- For Content Creation: Chunk 1024, F0 Detector: Crepe-Full, highest quality settings
Ready to Transform!
VoxityAI™ is now processing your voice in real-time. The converted audio can be used with Discord, OBS Studio, Zoom, games, or any application that accepts audio input. Experiment with different models and settings to discover your perfect voice!
Testing Your Setup
Before using VoxityAI™ in your target application, perform these quick tests to ensure everything is working correctly:
Audio Quality Test
- Speak into your microphone and listen to the converted voice through your headphones
- Check for audio artifacts, distortion, or unnatural sounds
- Adjust the input gain if the voice sounds too quiet or distorted
- Test different voice models to find the best match for your voice type
Latency Test
- Count from 1 to 10 while monitoring the converted voice
- If delay is noticeable, reduce chunk size or switch to a lighter model
- For real-time applications, aim for sub-200ms total latency
- Consider your internet connection if using remote processing
Integration Test
- Open your target application (Discord, OBS, game, etc.)
- Select VoxityAI™ output as the microphone input device
- Test voice chat or recording functionality
- Verify that the converted voice is transmitted clearly
Common First-Launch Issues
Browser doesn't open automatically
Manually navigate to http://localhost:18888
in your preferred browser. Ensure no firewall or antivirus software is blocking the connection.
No microphone detected
Check browser permissions for microphone access. On Windows, verify privacy settings allow microphone access for browsers. Restart the browser after granting permissions.
High latency or poor quality
Start with smaller chunk sizes (256) and lighter models (Beatrice v2). Allow GPU drivers time to optimize, and close unnecessary applications consuming system resources.
Voice sounds robotic or unnatural
Reduce pitch adjustment to ±4 semitones maximum. Try different F0 detectors (RMVPE or Crepe). Ensure the voice model matches your gender and vocal range.
Next Steps
Once you have VoxityAI™ working with basic settings, explore these advanced features:
- Custom Models: Import community-created voice models or train your own
- Advanced Audio: Configure noise suppression, echo cancellation, and audio enhancement
- Streaming Integration: Set up professional-grade audio routing for content creation
- Performance Optimization: Fine-tune settings for your specific hardware and use case
- Backup and Sync: Configure settings backup and voice model synchronization
Initial Configuration
Optimize VoxityAI™ settings for your specific setup and use case
After completing the quick start guide, this comprehensive configuration section will help you optimize VoxityAI™ for professional-grade performance. These settings can significantly impact both audio quality and system performance.
Audio System Deep Configuration
Professional audio configuration requires understanding the complete signal chain from microphone input through processing to final output. VoxityAI™ provides extensive control over every aspect of this pipeline.
Input Device Optimization
The quality of your voice conversion starts with clean input signals. Different microphone types require specific optimization approaches:
USB Headsets and Gaming Mics
- Advantages: Plug-and-play convenience, built-in noise cancellation, consistent positioning
- Configuration: Set input gain to 1.0, enable hardware noise suppression if available
- Recommended Models: SteelSeries Arctis, HyperX Cloud series, Audio-Technica ATH-G1
- Optimization Tips: Position 2-3 fingers away from mouth, avoid breathing directly into capsule
Professional Audio Interfaces
- Advantages: Superior audio quality, XLR microphone support, hardware monitoring
- Configuration: Set interface as ASIO device, configure low-latency monitoring
- Recommended Interfaces: Focusrite Scarlett series, PreSonus AudioBox, Behringer UMC series
- Gain Staging: Adjust interface preamp for -12dB to -6dB peaks, avoid clipping
Studio Microphones
- Dynamic Mics: Excellent noise rejection, require higher gain, ideal for noisy environments
- Condenser Mics: Superior sensitivity and detail, require phantom power, best in treated rooms
- Positioning: Maintain consistent distance, use pop filters, consider room acoustics
- Popular Choices: Shure SM7B (dynamic), Audio-Technica AT2020 (condenser)
Processing Mode Selection
VoxityAI™ offers multiple processing architectures optimized for different use cases and hardware configurations:
Server Mode (Recommended)
Processes audio on the dedicated server instance, providing optimal performance and resource management.
Best For: Gaming setups, streaming, professional use
Client Mode
Processes audio locally within the browser or client application, useful for distributed setups.
Best For: Portable setups, privacy-focused use, offline processing
AI Model Configuration
Understanding the characteristics and optimal use cases for different AI model types enables you to choose the best algorithm for your specific needs:
RVC (Retrieval-based Voice Conversion)
The most versatile and widely-used voice conversion technology, offering excellent balance between quality and performance.
- Character voice acting and content creation
- Gender transformation with natural results
- Celebrity voice impressions and parodies
- Language learning and accent modification
- Use RMVPE f0 detector for best pitch accuracy
- Index ratio 0.6-0.8 balances source and target characteristics
- Enable voice activity detection to reduce artifacts during silence
- Adjust transpose parameter gradually (±1 semitone increments)
Beatrice v2
Lightweight model specifically engineered for real-time applications where low latency is critical.
- Real-time gaming communication
- Live streaming with minimal delay
- Interactive VR/AR applications
- Low-resource hardware setups
- Works optimally with smaller chunk sizes (128-256)
- Less sensitive to pitch adjustments than RVC models
- Performs well with basic f0 detectors (DIO, Harvest)
- Enable real-time optimization for lowest latency
MMVC (Many-to-Many Voice Conversion)
High-quality conversion model that excels at preserving emotional nuance and speaking patterns.
- Professional voiceover and dubbing
- High-quality content creation
- Audiobook and podcast production
- Film and animation voice work
so-vits-svc
Specialized architecture for singing voice conversion, maintaining musical elements while transforming vocal timbre.
- AI music covers and vocal synthesis
- Singing voice transformation
- Musical content creation
- Vocal range extension for singers
Performance Optimization
Fine-tuning performance parameters allows you to achieve the optimal balance between audio quality, latency, and system resource usage:
Chunk Size Configuration
The chunk size parameter directly controls the trade-off between processing latency and audio quality:
- Minimum latency (50-100ms additional)
- Higher CPU/GPU usage
- May reduce quality on slower hardware
- Best for: Real-time gaming, live interaction
- Balanced latency (100-200ms additional)
- Optimal for most use cases
- Good quality/performance ratio
- Best for: Streaming, casual content creation
- Highest quality processing
- Increased latency (200-400ms)
- Better for complex transformations
- Best for: Professional content, offline processing
Extra Buffer Management
The extra buffer provides additional processing headroom and can prevent audio artifacts:
- 4096-8192: Minimal buffering for real-time applications, may cause dropouts under load
- 8192-16384: Standard buffering, good balance for most users and hardware
- 16384-32768: Maximum stability, prevents artifacts but increases memory usage
- Auto-adjustment: VoxityAI™ can automatically optimize buffer size based on system performance
F0 Detector Selection
Pitch detection algorithms vary significantly in quality, performance, and resource requirements:
- Highest accuracy for pitch detection
- Excellent handling of vibrato and pitch bends
- Moderate CPU usage, GPU acceleration available
- Best for: Professional use, singing, complex speech patterns
- Very high quality, especially for singing
- Excellent noise robustness
- Higher CPU/GPU usage than RMVPE
- Best for: Music applications, noisy environments
- Good balance of quality and performance
- Works well on older hardware
- Reliable for most speech applications
- Best for: General use, resource-constrained systems
- Fastest processing speed
- Lowest resource usage
- Basic quality suitable for simple transformations
- Best for: Real-time applications on limited hardware
Advanced Audio Processing Features
VoxityAI™ includes sophisticated audio processing capabilities that can significantly enhance voice conversion quality:
Noise Suppression Systems
Multiple noise reduction algorithms work together to provide clean input signals:
- Analyzes frequency spectrum to identify and reduce constant background noise
- Effective against: Air conditioning, computer fans, electrical hum
- Settings: Mild (preserve naturalness) to Aggressive (maximum reduction)
- Machine learning algorithms distinguish between voice and noise
- Effective against: Keyboard typing, mouse clicks, intermittent sounds
- Adapts in real-time to changing acoustic environments
- Removes acoustic feedback and room reflections
- Essential for speaker-based monitoring setups
- Automatically adapts to room acoustics and speaker positioning
Dynamic Range and Level Control
Precise control over audio levels ensures optimal voice conversion and prevents artifacts:
- Input Gain (0.1-3.0): Amplify or attenuate microphone signal before processing
- Output Gain (0.1-5.0): Control converted voice volume for optimal integration
- Silence Threshold: Minimum volume level required to trigger voice conversion
- Voice Activity Detection: Intelligent detection of speech vs. silence periods
- Auto-Leveling: Automatic gain control to maintain consistent output levels
- Peak Limiting: Prevents audio clipping and distortion in loud passages
Sample Rate and Quality Settings
Audio quality parameters that balance fidelity with performance requirements:
- 48kHz/32-bit float: Professional quality, highest fidelity, increased processing load
- 44.1kHz/24-bit: High quality, good balance for most applications
- 22kHz/16-bit: Basic quality, minimal processing requirements
- Adaptive Quality: Automatically adjusts based on system performance and model requirements
Windows Installation Guide
Complete setup guide for Windows 10/11 with GPU optimization and professional configuration
Windows installation of VoxityAI™ offers the most comprehensive feature set and best performance optimization. This guide covers everything from basic installation to advanced GPU configuration and professional audio setup.
Pre-Installation System Preparation
Proper system preparation ensures smooth installation and optimal performance. Complete these steps before installing VoxityAI™:
Essential System Components
GPU-Optimized Installation Paths
VoxityAI™ provides hardware-specific builds to maximize performance on different GPU architectures. Choose the build that matches your system configuration:
🟢 NVIDIA GPU Installation (CUDA)
RecommendedOptimal for: RTX 20/30/40 series, GTX 16 series, and professional NVIDIA GPUs with 4GB+ VRAM
Hardware Requirements
- NVIDIA GPU with Compute Capability 6.0+ (GTX 1060 or newer)
- NVIDIA Game Ready or Studio Drivers 472.12+
- CUDA Toolkit 11.7/11.8 (included with VoxityAI™)
- 4GB VRAM minimum, 8GB+ recommended for large models
- PCIe 3.0 x16 slot for optimal bandwidth
Installation Process
vcclient_win_cuda_[version].zip
(~3.7GB)
This package includes optimized PyTorch with CUDA 11.8 support and cuDNN libraries
Recommended: C:\VoxityAI\ or D:\Software\VoxityAI\
start_http.bat
as Administrator
First launch installs CUDA runtime and dependencies (~500MB additional)
✓ CUDA device detected: GeForce RTX 4070 Ti (12GB VRAM)
✓ cuDNN initialized successfully
✓ PyTorch CUDA backend ready
Expected Performance (CUDA)
- RTX 4090: 30-60ms latency, handles any model size
- RTX 4070/4080: 50-100ms latency, excellent for all use cases
- RTX 3070/3080: 80-150ms latency, very good performance
- RTX 2060/2070: 120-250ms latency, good for streaming
🔴 AMD GPU Installation (DirectML)
Optimal for: RX 5000/6000/7000 series AMD GPUs with DirectML acceleration support
Hardware Requirements
- AMD GPU with DirectML support (RX 580 or newer)
- AMD Adrenalin drivers 22.5.1 or newer
- Windows 10 version 1903+ (DirectML framework requirement)
- 6GB VRAM minimum for optimal performance
- Smart Access Memory enabled (if supported)
Installation Process
vcclient_win_std_[version].zip
Includes DirectML runtime and ONNX optimization for AMD hardware
Windows Features → Machine Learning Platform
ONNX models provide 20-40% better performance on AMD GPUs
AMD-Specific Optimizations
- ONNX Conversion: Convert PyTorch models to ONNX for better performance
- Memory Management: Enable GPU memory optimization in AMD drivers
- Power Settings: Set GPU power limit to maximum for consistent performance
- Compute Workloads: Enable GPU compute optimizations in Adrenalin software
🔵 Intel GPU Installation (XPU/DirectML)
Beta SupportOptimal for: Intel Arc discrete GPUs and 12th/13th gen integrated graphics
Hardware Requirements
- Intel Arc A-series GPU or 11th gen+ CPU with Iris Xe
- Intel Graphics drivers 30.0.101.1404 or newer
- Intel XPU runtime and oneAPI toolkit (auto-installed)
- 4GB+ system memory allocated to integrated graphics
⚠️ Beta Status Notice
Intel GPU support is experimental and may have stability issues. CPU fallback is automatically used if GPU acceleration fails. Performance varies significantly between different Intel architectures.
Expected Performance (Intel)
- Arc A770: 150-300ms latency, decent for lightweight models
- Arc A750: 200-400ms latency, basic voice conversion
- Iris Xe: CPU fallback recommended for real-time use
⚫ CPU-Only Installation
Universal compatibility for systems without supported GPU acceleration
Access the standard version and VoxityAI™ automatically configures CPU-based processing. While slower than GPU acceleration, modern multi-core processors can achieve acceptable performance for many voice conversion tasks.
CPU Performance Optimization
- Thread Allocation: VoxityAI™ uses 50-75% of available CPU cores
- Priority Settings: Set process priority to "High" for better responsiveness
- Power Management: Use "High Performance" power plan during processing
- Background Apps: Close unnecessary applications to free CPU resources
Performance Expectations (CPU-Only)
- High-end CPU (i9/Ryzen 9): 300-600ms latency, suitable for content creation
- Mid-range CPU (i7/Ryzen 7): 500-1000ms latency, adequate for offline processing
- Entry-level CPU (i5/Ryzen 5): 800ms+ latency, basic functionality only
Post-Installation Configuration
After successful installation, these configuration steps optimize VoxityAI™ for professional use:
System Security and Permissions
Windows Firewall Configuration
VoxityAI™ operates a local web server on port 18888. Configure firewall rules:
Windows Security → Firewall & network protection → Allow an app through firewall
Add: MMVCServerSIO.exe (both Private and Public networks)
Antivirus Exclusions
Add VoxityAI™ directory to antivirus exclusions to prevent interference:
- Exclude entire VoxityAI™ installation folder
- Exclude voice model directories
- Exclude temporary processing folders
- Allow network connections for localhost:18888
Browser Permissions
Configure browser settings for optimal functionality:
- Allow microphone access for localhost
- Enable hardware acceleration in browser settings
- Disable browser audio processing/enhancement
- Set localhost as trusted site for automatic media playback
Audio System Optimization
Windows Audio Configuration
Optimize Windows audio settings for low-latency performance:
Run → mmsys.cpl → Advanced → Default Format: 48000 Hz, 24-bit
Exclusive Mode: Allow applications to take exclusive control
Disable all audio enhancements and effects
ASIO Driver Support
For professional audio interfaces, install appropriate ASIO drivers:
- Interface-specific ASIO: Use manufacturer drivers for best performance
- ASIO4ALL: Universal ASIO driver for consumer hardware
- Buffer Settings: 128-256 samples for low latency
- Sample Rate: Match VoxityAI™ settings (48kHz recommended)
Virtual Audio Cable Setup
Configure virtual audio routing for application integration:
Performance Tuning
Windows Power Management
- Set power plan to "High Performance" or "Ultimate Performance"
- Disable CPU power management (prevent frequency scaling)
- Set GPU power management to "Prefer maximum performance"
- Disable Windows Game Mode (can interfere with audio processing)
Process Priority Optimization
Task Manager → Details → MMVCServerSIO.exe → Set Priority: High
Set Affinity: Use specific CPU cores for dedicated processing
GPU Driver Optimization
- NVIDIA: Enable "Prefer maximum performance" in NVIDIA Control Panel
- AMD: Set GPU power limit to maximum in Adrenalin software
- Intel: Enable GPU compute scheduling in Windows settings
- All GPUs: Disable power-saving features during voice processing
Windows-Specific Troubleshooting
Installation and Startup Issues
Application fails to start or crashes immediately
Diagnostic Steps:
- Run
start_http.bat
as Administrator - Check Windows Event Viewer for application errors
- Verify all prerequisite software is installed
- Temporarily disable antivirus and Windows Defender
- Ensure no other software is using port 18888
Common Solutions:
- Install both x86 and x64 versions of Visual C++ Redistributable
- Update Windows to latest version
- Run Windows System File Checker:
sfc /scannow
- Clear Windows audio device cache and restart audio services
CUDA/GPU acceleration not working
NVIDIA GPU Troubleshooting:
- Verify GPU compatibility:
nvidia-smi
in Command Prompt - Update to latest Game Ready or Studio drivers
- Check CUDA installation:
nvcc --version
- Monitor GPU usage during voice processing
- Verify sufficient VRAM availability
AMD GPU Troubleshooting:
- Ensure DirectML is enabled in Windows Features
- Update to latest Adrenalin drivers
- Check GPU recognition in Device Manager
- Enable GPU scheduling in Windows Graphics settings
- Convert models to ONNX format for better compatibility
High latency despite powerful hardware
System-Level Optimizations:
- Set Windows Timer Resolution to 1ms using TimerTool
- Disable Windows audio enhancements completely
- Use dedicated audio hardware with ASIO drivers
- Reduce Windows DPC latency using LatencyMon tool
- Disable Windows Update during voice processing sessions
Application-Level Optimizations:
- Use Server mode instead of Client mode processing
- Reduce chunk size to 128-256 samples
- Switch to lighter F0 detector (DIO or Harvest)
- Enable GPU memory optimization
- Close unnecessary browser tabs and applications
Audio and Performance Issues
No audio output or silent processing
Audio System Diagnosis:
- Test microphone in Windows Sound settings
- Verify VoxityAI™ input/output device selection
- Check audio format compatibility (48kHz, 24-bit recommended)
- Disable exclusive mode temporarily
- Test with different audio devices
Voice quality issues (robotic, distorted, unnatural)
Quality Optimization Steps:
- Reduce pitch adjustment to ±4 semitones maximum
- Match voice model to your vocal characteristics
- Use RMVPE or Crepe f0 detector for better pitch accuracy
- Adjust Index ratio between 0.6-0.8 for natural blending
- Ensure consistent microphone positioning and distance
- Enable noise suppression if recording in noisy environment
macOS Installation Guide
Complete setup for Apple Silicon and Intel Macs with optimization tips
macOS installation provides excellent performance on Apple Silicon processors and good compatibility with Intel-based Macs. This guide covers both architectures with specific optimizations for each platform.
macOS Compatibility Overview
🍎 Apple Silicon Macs (Highly Recommended)
M1, M1 Pro, M1 Max, M1 Ultra, M2, M2 Pro, M2 Max, M3, M3 Pro, M3 Max
macOS 12.0 (Monterey) or later for optimal performance
8GB unified memory minimum, 16GB+ strongly recommended
Metal Performance Shaders acceleration, unified memory architecture
Apple Silicon Advantages:
- Unified Memory: No separate VRAM limitations, models load faster
- Metal Acceleration: Optimized neural network processing
- Power Efficiency: Excellent performance per watt for battery use
- Thermal Management: Intelligent scaling prevents overheating
💻 Intel Macs (Legacy Support)
MacBook Pro 2016+, iMac 2017+, Mac Pro 2019+, iMac Pro
macOS 10.15 (Catalina) minimum, 11.0+ recommended
16GB RAM minimum for adequate performance
Dedicated GPU recommended (Radeon Pro 560 or better)
Linux Installation Guide
Advanced installation for Linux distributions with manual compilation and GPU optimization
Supported Linux Distributions
VoxityAI™ has been successfully compiled and tested on the following distributions. While other distributions may work, these are officially verified:
Ubuntu / Debian Family
Package Manager: apt, snap
Red Hat Family
Package Manager: dnf, rpm
Arch Family
Package Manager: pacman, AUR
Other Distributions
Note: May require additional configuration
Build Dependencies and Prerequisites
Before compiling VoxityAI™, install all required development tools and libraries:
Core Development Tools
Ubuntu/Debian:
sudo apt update && sudo apt upgrade
sudo apt install -y build-essential cmake git curl wget unzip
sudo apt install -y python3.10 python3.10-dev python3.10-venv python3-pip
sudo apt install -y pkg-config libffi-dev libssl-dev zlib1g-dev
sudo apt install -y libbz2-dev libreadline-dev libsqlite3-dev llvm
sudo apt install -y libncurses5-dev libncursesw5-dev xz-utils tk-dev
Fedora/CentOS:
sudo dnf groupinstall -y "Development Tools" "Development Libraries"
sudo dnf install -y cmake git curl wget unzip python3-devel python3-pip
sudo dnf install -y pkgconfig libffi-devel openssl-devel zlib-devel
sudo dnf install -y bzip2-devel readline-devel sqlite-devel llvm-devel
Arch Linux:
sudo pacman -Syu
sudo pacman -S --needed base-devel cmake git curl wget unzip
sudo pacman -S python python-pip pkgconf libffi openssl zlib
Audio System Dependencies
Ubuntu/Debian:
# ALSA and PulseAudio
sudo apt install -y libasound2-dev portaudio19-dev libportaudio2
sudo apt install -y pulseaudio-dev libjack-jackd2-dev jackd2
sudo apt install -y libsndfile1-dev libfftw3-dev libsamplerate0-dev
# Optional: JACK for professional audio
sudo apt install -y qjackctl jack-tools
Fedora/CentOS:
sudo dnf install -y alsa-lib-devel portaudio-devel
sudo dnf install -y pulseaudio-libs-devel jack-audio-connection-kit-devel
sudo dnf install -y libsndfile-devel fftw-devel libsamplerate-devel
Arch Linux:
sudo pacman -S alsa-lib portaudio pulseaudio jack2
sudo pacman -S libsndfile fftw libsamplerate qjackctl
GPU Acceleration Dependencies
NVIDIA CUDA Support:
# Ubuntu CUDA installation
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt update
sudo apt install -y cuda-toolkit-11-8 nvidia-driver-530
# Verify installation
nvidia-smi
nvcc --version
AMD ROCm Support (Ubuntu 22.04):
# Add ROCm repository
wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -
echo 'deb [arch=amd64] https://repo.radeon.com/rocm/apt/5.6 ubuntu main' | sudo tee /etc/apt/sources.list.d/rocm.list
sudo apt update
sudo apt install -y rocm-dev rocm-libs rocm-utils
# Add user to render group
sudo usermod -a -G render,video $USER
Compilation Process
Step 1: Source Code and Environment Setup
# Clone the VoxityAI repository (W-Okada base)
git clone https://github.com/w-okada/voice-changer.git
cd voice-changer
# Create Python virtual environment
python3.10 -m venv voxityai-env
source voxityai-env/bin/activate
# Upgrade pip and core packages
pip install --upgrade pip wheel setuptools
# Install build dependencies
pip install cython numpy
Step 2: PyTorch Installation (GPU-Specific)
Step 3: Install Application Dependencies
# Install Python requirements
pip install -r requirements.txt
# Install additional dependencies for Linux
pip install soundfile librosa pyaudio
# For better performance (optional)
pip install onnxruntime-gpu # For NVIDIA
# OR
pip install onnxruntime # For CPU/other GPUs
Step 4: Audio Backend Configuration
Linux supports multiple audio backends. Choose the appropriate one for your setup:
PulseAudio (Recommended for Desktop)
# Configure PulseAudio for low latency
sudo tee -a /etc/pulse/daemon.conf << EOF
default-sample-rate = 48000
alternate-sample-rate = 44100
default-sample-channels = 2
default-fragments = 2
default-fragment-size-msec = 4
high-priority = yes
nice-level = -15
realtime-scheduling = yes
realtime-priority = 5
EOF
# Restart PulseAudio
systemctl --user restart pulseaudio
JACK (Professional Audio)
# Start JACK with optimized settings
jackd -d alsa -r 48000 -p 256 -n 2 -D -Chw:0,0 -Phw:0,0
# Or use QjackCtl for GUI configuration
# Recommended settings:
# Sample Rate: 48000 Hz
# Frames/Period: 256
# Periods/Buffer: 2
ALSA (Direct Hardware Access)
# List available audio devices
aplay -l
arecord -l
# Configure ALSA for low latency
sudo tee /etc/asound.conf << EOF
pcm.!default {
type pulse
}
ctl.!default {
type pulse
}
EOF
Step 5: Launch VoxityAI™
# Activate virtual environment
source voxityai-env/bin/activate
# Start the main server
python MMVCServerSIO.py --host 0.0.0.0 --port 18888
# In another terminal, verify the web interface
curl http://localhost:18888
Access the VoxityAI™ web interface at http://localhost:18888
in your browser.
Linux-Specific Optimizations
System Performance Tuning
CPU Governor and Power Management
# Set CPU governor to performance mode
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
# Make permanent by adding to /etc/rc.local
sudo tee -a /etc/rc.local << EOF
#!/bin/bash
echo performance > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
exit 0
EOF
sudo chmod +x /etc/rc.local
Real-time Priority and Resource Limits
# Add user to audio group for real-time priority
sudo usermod -a -G audio $USER
# Configure resource limits for real-time scheduling
sudo tee -a /etc/security/limits.conf << EOF
@audio - rtprio 95
@audio - memlock unlimited
@audio - nice -10
EOF
# Enable real-time scheduling in PAM
echo "session required pam_limits.so" | sudo tee -a /etc/pam.d/common-session
Kernel Optimization (Advanced)
# Edit GRUB configuration for low-latency kernel
sudo nano /etc/default/grub
# Add these parameters to GRUB_CMDLINE_LINUX_DEFAULT:
# threadirqs processor.max_cstate=1 intel_idle.max_cstate=0 idle=poll
# Example:
# GRUB_CMDLINE_LINUX_DEFAULT="quiet splash threadirqs processor.max_cstate=1"
# Update GRUB and reboot
sudo update-grub
sudo reboot
Audio System Optimization
PulseAudio Advanced Configuration
# Create custom PulseAudio configuration for VoxityAI™
mkdir -p ~/.config/pulse
# Create low-latency configuration
tee ~/.config/pulse/daemon.conf << EOF
default-sample-rate = 48000
alternate-sample-rate = 44100
default-sample-channels = 2
default-channel-map = front-left,front-right
default-fragments = 2
default-fragment-size-msec = 1
high-priority = yes
nice-level = -15
realtime-scheduling = yes
realtime-priority = 5
rlimit-rtprio = 9
daemonize = no
EOF
JACK Professional Setup
# Install real-time kernel (Ubuntu)
sudo apt install linux-lowlatency
# Configure JACK for maximum performance
# Create ~/.jackdrc with optimal settings:
tee ~/.jackdrc << EOF
/usr/bin/jackd -R -P75 -dalsa -dhw:0,0 -r48000 -p128 -n2 -D -Chw:0,0 -Phw:0,0
EOF
# Start JACK and verify low latency
jack_control start
jack_control status
GPU Optimization
NVIDIA GPU Optimization
# Enable persistence mode for consistent performance
sudo nvidia-smi -pm 1
# Set maximum performance mode
sudo nvidia-smi -ac 877,1455 # Adjust values for your GPU
# Monitor GPU usage during voice processing
watch -n 1 nvidia-smi
AMD GPU Optimization (ROCm)
# Set GPU power and performance profiles
echo "performance" | sudo tee /sys/class/drm/card0/device/power_dpm_force_performance_level
# Monitor AMD GPU usage
watch -n 1 rocm-smi
Linux Troubleshooting
Common Installation Issues
Permission denied errors during compilation
# Ensure user is in necessary groups
sudo usermod -a -G audio,video,render,input $USER
# Fix common permission issues
sudo chmod 666 /dev/nvidia*
sudo chmod 666 /dev/dri/*
# Create udev rules for persistent permissions
sudo tee /etc/udev/rules.d/99-voxityai.rules << EOF
KERNEL=="nvidia*", GROUP="video", MODE="0666"
KERNEL=="card*", GROUP="video", MODE="0666"
EOF
sudo udevadm control --reload-rules
Audio system not detected or high latency
Diagnostic Commands:
# Check audio devices
aplay -l
arecord -l
# Test PulseAudio
pactl list sources short
pactl list sinks short
# Check real-time capabilities
ulimit -r
groups $USER
# Test audio latency
pa-info | grep -i latency
Solutions:
- Install real-time kernel:
sudo apt install linux-lowlatency
- Add user to audio group and configure limits as shown above
- Use JACK for professional low-latency audio
- Disable audio power management:
echo 0 | sudo tee /sys/module/snd_hda_intel/parameters/power_save
GPU acceleration not working
NVIDIA Troubleshooting:
# Verify NVIDIA driver installation
nvidia-smi
lsmod | grep nvidia
# Check CUDA installation
nvcc --version
python -c "import torch; print(torch.cuda.is_available())"
# Reinstall NVIDIA drivers if needed
sudo apt purge nvidia-*
sudo apt install nvidia-driver-530 nvidia-dkms-530
AMD Troubleshooting:
# Check AMD GPU recognition
lspci | grep -i amd
rocm-smi
# Verify ROCm installation
python -c "import torch; print(torch.version.hip)"
Creating Linux Service (Optional)
For automatic startup and system integration, create a systemd service:
# Create systemd service file
sudo tee /etc/systemd/system/voxityai.service << EOF
[Unit]
Description=VoxityAI Voice Changer Service
After=network.target sound.target
[Service]
Type=simple
User=$USER
WorkingDirectory=/path/to/voice-changer
ExecStart=/path/to/voice-changer/voxityai-env/bin/python MMVCServerSIO.py
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
# Enable and start the service
sudo systemctl daemon-reload
sudo systemctl enable voxityai
sudo systemctl start voxityai
# Check service status
sudo systemctl status voxityai
Audio Routing Setup
Professional audio routing and virtual cable configuration for seamless integration
Audio routing is the foundation of professional VoxityAI™ integration. This comprehensive guide covers virtual audio cable setup, application routing, and advanced multi-destination audio workflows for content creators, streamers, and professional users.
Virtual Audio Solutions by Platform
🪟 Windows Virtual Audio Solutions
VB-Audio Virtual Cable
FreeMost popular virtual audio solution with excellent compatibility across applications.
- ✓ Completely free and well-supported
- ✓ Simple installation and configuration
- ✓ Excellent compatibility with Discord, OBS, games
- ✓ Stable and reliable for most users
- ✓ Low CPU overhead
- ✗ Single virtual cable only
- ✗ No built-in mixing capabilities
- ✗ Limited advanced configuration options
- Download from VB-Audio.com
- Run installer as Administrator
- Restart computer after installation
- Configure VoxityAI™ output to "CABLE Input (VB-Audio Virtual Cable)"
- Set Discord/OBS input to "CABLE Output (VB-Audio Virtual Cable)"
VB-Audio Voicemeeter
FreeComplete virtual mixing console with multiple inputs, outputs, and professional audio processing.
- ✓ Full virtual mixing console
- ✓ Multiple virtual inputs/outputs (VAIO)
- ✓ Built-in EQ, compression, and effects
- ✓ Advanced routing and monitoring
- ✓ Professional-grade features
- ✗ Steeper learning curve
- ✗ More complex setup process
- ✗ Higher CPU usage than simple cables
- Voicemeeter: Basic version with 2 hardware + 1 virtual input
- Voicemeeter Banana: 3 hardware + 2 virtual inputs
- Voicemeeter Potato: 5 hardware + 3 virtual inputs (most advanced)
Virtual Audio Cable (Professional)
$25Professional-grade virtual audio solution with minimal latency and maximum stability.
- ✓ Up to 256 virtual audio cables
- ✓ Ultra-low latency design
- ✓ Professional stability and reliability
- ✓ Advanced configuration options
- ✓ Dedicated technical support
- ✗ Commercial license required
- ✗ More complex than free alternatives
🍎 macOS Virtual Audio Solutions
BlackHole
Free & Open SourceModern virtual audio driver designed specifically for macOS with excellent system integration.
- Zero latency virtual audio driver
- Multiple channel configurations (2ch, 16ch, 64ch)
- No trial period or license restrictions
- Active development and macOS compatibility
- Native Apple Silicon and Intel support
- Install via Homebrew:
brew install blackhole-2ch
- Or download installer from GitHub
- Open Audio MIDI Setup (Applications > Utilities)
- Create Multi-Output Device combining speakers + BlackHole
- Set VoxityAI™ output to BlackHole 2ch
- Configure target applications to use BlackHole as input
Loopback by Rogue Amoeba
$109Professional audio routing solution with visual interface and advanced mixing capabilities.
- Visual cable management interface
- Multiple virtual devices with custom configurations
- Real-time audio monitoring and level control
- Session saving and recall
- Advanced routing matrix
- Professional customer support
🐧 Linux Virtual Audio Solutions
PulseAudio Virtual Sinks
Native Linux solution using PulseAudio's built-in virtual device capabilities.
# Create virtual sink for VoxityAI output
pactl load-module module-null-sink sink_name=VoxityAI_Output sink_properties=device.description="VoxityAI_Output"
# Create virtual source from sink monitor
pactl load-module module-virtual-source source_name=VoxityAI_Input master=VoxityAI_Output.monitor source_properties=device.description="VoxityAI_Input"
# Make configuration persistent
echo "load-module module-null-sink sink_name=VoxityAI_Output sink_properties=device.description=\"VoxityAI_Output\"" >> ~/.config/pulse/default.pa
echo "load-module module-virtual-source source_name=VoxityAI_Input master=VoxityAI_Output.monitor source_properties=device.description=\"VoxityAI_Input\"" >> ~/.config/pulse/default.pa
JACK Audio Routing
Professional audio routing system with ultra-low latency and advanced connection management.
# Start JACK with optimal settings
jackd -d alsa -r 48000 -p 256 -n 2
# Connect VoxityAI output to applications using command line
jack_connect VoxityAI:output discord:input
jack_connect VoxityAI:output obs:microphone_input
# Or use QjackCtl for graphical connection management
qjackctl
PipeWire (Modern Alternative)
Next-generation audio server with JACK compatibility and PulseAudio replacement capabilities.
# Install PipeWire (Ubuntu 22.04+)
sudo apt install pipewire pipewire-pulse pipewire-jack
systemctl --user enable pipewire pipewire-pulse
# Create virtual devices using pw-loopback
pw-loopback -P "VoxityAI_Output" -C "VoxityAI_Input"
Application-Specific Integration Guides
Discord Integration
Set up VoxityAI™ for seamless Discord voice chat with real-time voice transformation.
Set up virtual audio cable using your platform's preferred solution (VB-Cable, BlackHole, etc.)
Set VoxityAI™ output device to virtual cable input:
- Windows: "CABLE Input (VB-Audio Virtual Cable)"
- macOS: "BlackHole 2ch"
- Linux: "VoxityAI_Output"
Configure Discord to receive processed audio:
- Open Discord Settings > Voice & Video
- Set Input Device to virtual cable output
- Disable Discord's noise suppression and echo cancellation
- Set Input Sensitivity to manual mode
- Test microphone to verify voice transformation
Configure settings for minimal latency during voice chat:
- Enable Push-to-Talk to avoid background processing
- Use lightweight voice models (Beatrice v2)
- Set chunk size to 256 or lower
- Monitor CPU/GPU usage during extended calls
Discord Pro Tips
- Voice Activity: Fine-tune Discord's input sensitivity to match your converted voice levels
- Quality Settings: Use Discord's "High" quality mode for best voice transmission
- Regional Servers: Choose Discord servers geographically close to reduce network latency
- Bandwidth: Ensure stable internet connection for consistent voice quality
OBS Studio Integration
Professional streaming setup with VoxityAI™ for live content creation and broadcasting.
Basic OBS Integration
- Add "Audio Input Capture" source to your scene
- Select virtual cable output as the audio device
- Configure audio monitoring to "Monitor and Output"
- Adjust audio levels in OBS mixer
Advanced Multi-Source Setup
For complex streaming setups with multiple audio sources:
OBS Performance Optimization
- Audio Filters: Add noise gate, compressor, and EQ for professional sound
- Sync Offset: Adjust audio sync if video/audio become misaligned
- Sample Rate: Match OBS sample rate to VoxityAI™ (48kHz recommended)
- Monitoring: Use OBS's advanced audio monitoring for real-time feedback
Video Conferencing (Zoom/Teams/Meet)
Professional voice transformation for business meetings and video conferences.
Zoom Configuration
- Open Zoom Settings > Audio
- Select virtual cable as microphone input
- Disable automatic gain control and noise reduction
- Test audio using Zoom's microphone test feature
- Adjust input volume to optimal levels
Microsoft Teams Configuration
- Access Settings > Devices
- Choose virtual audio device as microphone
- Disable Teams' audio processing features
- Test audio quality in device settings
Google Meet Configuration
- Click Settings gear during meeting
- Select Audio tab
- Choose virtual audio device for microphone
- Verify audio levels are appropriate
Video Conferencing Best Practices
- Quality Testing: Test voice transformation before important meetings
- Backup Plan: Have fallback audio source ready in case of technical issues
- Bandwidth: Monitor network performance during voice processing
- Consistency: Use consistent voice settings throughout meetings
- Transparency: Inform participants about voice modification when appropriate
Gaming Integration
Optimize VoxityAI™ for gaming scenarios with minimal performance impact and seamless voice chat.
Performance-First Gaming Setup
- Use Beatrice v2 models for lowest latency
- Avoid complex RVC models during competitive gaming
- Pre-load models before gaming sessions
- Set VoxityAI™ process priority to "Normal" (not High) during gaming
- Use dedicated GPU cores if available
- Monitor system temperature during extended sessions
- Chunk size: 256 samples maximum
- F0 Detector: DIO or Harvest for speed
- Disable unnecessary audio processing
Popular Games Integration
For games using Discord for voice chat, follow Discord integration steps above.
- Set game's microphone input to virtual cable output
- Test voice chat in game settings before playing
- Use Push-to-Talk for better control
- Use server-client setup to offload processing
- Route game audio separately from voice
- Monitor system performance continuously
Advanced Audio Routing Scenarios
Multi-Destination Broadcasting
Route VoxityAI™ output to multiple applications simultaneously with independent level control.
Using Voicemeeter for Multi-Routing:
- Hardware Input 1: Physical microphone
- Virtual Input (VAIO): VoxityAI™ output destination
- Hardware Output A1: Speakers/headphones for monitoring
- Virtual Output B1: Discord/game chat
- Virtual Output B2: OBS/streaming software
- Bus Assignment: Route VAIO to A1+B1+B2 with independent level control
- Independent volume control for each destination
- Different processing for chat vs. stream
- Backup audio routing for redundancy
- Real-time level monitoring and adjustment
Professional Studio Integration
Integrate VoxityAI™ into professional recording and broadcast environments.
Hardware Integration Chain:
Professional Considerations:
- Latency Monitoring: Use professional audio interfaces with hardware monitoring
- Backup Systems: Always have non-processed audio backup for live broadcasts
- Quality Control: Monitor output quality continuously during long sessions
- Documentation: Document all routing and settings for session recall
Remote Collaboration Setup
Configure VoxityAI™ for remote recording sessions and collaborative content creation.
Client-Server Voice Processing:
Run VoxityAI™ server on a powerful desktop while using lightweight clients for actual recording/communication:
- Runs MMVCServerSIO with GPU acceleration
- Hosts voice models and processing engine
- Provides web interface for multiple clients
- Connects to server via web browser
- Handles only audio input/output
- Minimal local processing requirements
Network Optimization:
- Bandwidth: Minimum 1 Mbps upload/download for stable audio streaming
- Latency: Local network preferred, VPN may add latency
- Stability: Wired connection recommended over WiFi
- Security: Use VPN or SSH tunneling for internet-based connections
Audio Routing Troubleshooting
Common Audio Routing Issues
No audio in target application
Systematic Diagnosis:
- Verify VoxityAI™ is outputting to correct virtual device
- Check target application audio input settings
- Test virtual cable with system audio recorder
- Restart VoxityAI™ and target application
- Verify system audio permissions and privacy settings
Platform-Specific Solutions:
- Check Windows audio privacy settings
- Verify virtual cable appears in mmsys.cpl
- Restart Windows Audio service
- Grant microphone permissions to browser and applications
- Check Audio MIDI Setup for device visibility
- Verify BlackHole installation and version
- Check PulseAudio/PipeWire device list
- Verify user audio group membership
- Test with pavucontrol for visual debugging
Audio feedback and echo issues
Feedback Prevention:
- Always use headphones instead of speakers when possible
- Ensure virtual cable output is not set as system default playback
- Disable "Listen to this device" in Windows sound properties
- Check for audio loopback in virtual audio software settings
- Use directional microphones to reduce ambient pickup
Echo Cancellation:
- Enable VoxityAI™'s built-in echo cancellation
- Adjust microphone positioning and gain levels
- Use acoustic treatment in recording environment
- Configure application-specific echo cancellation settings
High latency in audio routing
Latency Reduction Strategies:
- Reduce VoxityAI™ chunk size and buffer settings
- Use ASIO drivers for professional audio interfaces
- Minimize virtual audio cable buffer sizes
- Close unnecessary audio applications and effects
- Use dedicated audio hardware for critical applications
- Optimize system audio service priority and real-time settings
System-Level Optimizations:
- Set audio-related processes to high priority
- Disable Windows audio enhancements completely
- Use exclusive mode for audio devices when possible
- Configure system for real-time audio performance
Audio quality degradation through routing
Quality Preservation:
- Match sample rates across all audio devices (48kHz recommended)
- Use 24-bit audio depth throughout the signal chain
- Minimize audio processing chain length
- Adjust virtual cable buffer sizes for optimal quality/latency balance
- Update audio drivers to latest versions
- Use lossless audio formats where possible
Signal Chain Optimization:
- Avoid multiple format conversions in the audio path
- Set appropriate gain staging to prevent clipping
- Monitor audio levels throughout the signal chain
- Use professional audio interfaces for critical applications