Skip to content

PtiCalin/vault_voice

Repository files navigation

AI Voice Clone

Overview

AI Voice Clone is an open-source project aimed at developing advanced voice synthesis and cloning technologies using artificial intelligence. The project focuses on creating realistic voice replicas from audio samples, enabling applications in entertainment, accessibility, education, and more.

Technologies

  • Programming Language: Python 3.8+
  • Deep Learning Framework: PyTorch
  • Audio Processing: Torchaudio, Librosa
  • Voice Synthesis: Tacotron2, WaveGlow, or similar TTS models
  • Machine Learning: Scikit-learn for preprocessing
  • Web Framework (future): FastAPI or Flask for API deployment

Project Scope

Current Features

  • Basic voice recording and preprocessing
  • Audio feature extraction (MFCC, spectrograms)
  • Model training pipeline setup

Development Roadmap

  • Phase 1: Implement basic voice cloning with pre-trained models
  • Phase 2: Custom model training from user audio samples
  • Phase 3: Real-time voice conversion
  • Phase 4: Multi-speaker voice cloning
  • Phase 5: Web interface and API deployment

Model Improvements

The model architecture is currently a compact encoder/decoder skeleton. Studying Tacotron2-style architectures, attention mechanisms, and multi-speaker conditioning will align with upcoming roadmap goals.

Key Components

  • Data collection and preprocessing pipeline
  • Neural network architectures for voice synthesis
  • Training scripts and utilities
  • Evaluation metrics and testing framework
  • Deployment and inference scripts

Installation

git clone https://github.com/PtiCalin/ai_voice-clone.git
cd ai_voice-clone
pip install -r requirements.txt

Usage

Command Line Interface

# Record a voice sample
python ai_voice-clone/main.py --mode record --duration 5 --output my_voice.wav

# Train a model with your voice
python ai_voice-clone/main.py --mode train --input my_voice.wav

# Generate cloned voice
python ai_voice-clone/main.py --mode clone --input my_voice.wav --text "Hello, this is my cloned voice!" --output cloned_voice.wav

Python API

from ai_voice_clone import VoiceCloner, AudioInput, FeatureExtractor, Trainer, InferenceEngine, Config

# Initialize components
config = Config()
config.load()

audio_input = AudioInput()
feature_extractor = FeatureExtractor(config)
model = VoiceCloner(config)
trainer = Trainer(model, feature_extractor, config)
inference_engine = InferenceEngine(model, feature_extractor, config)

# Record or load audio
audio_data = audio_input.record_audio(duration=5)
# or
audio_data, sr = audio_input.load_audio("path/to/audio.wav")

# Train model (if needed)
trainer.train("path/to/training/audio.wav")

# Generate cloned voice
cloned_audio = inference_engine.generate_voice("Hello, world!", "path/to/reference/audio.wav")

# Save result
audio_input.save_audio(cloned_audio, "output.wav")

Project Structure

ai_voice-clone/
├── main.py                 # CLI entry point
├── config.py              # Configuration management
├── audio_input.py         # Audio recording and loading
├── feature_extraction.py  # Audio feature extraction
├── model.py               # Neural network models
├── training.py            # Model training logic
├── inference.py           # Voice generation
├── __init__.py            # Package initialization
├── requirements.txt       # Dependencies
└── ...

Configuration

The system uses a YAML configuration file (config.yaml) with the following main sections:

  • audio: Audio processing parameters
  • features: Feature extraction settings
  • model: Neural network architecture
  • training: Training hyperparameters
  • inference: Generation parameters
  • vocoder: Mel-to-audio backend selection (Griffin-Lim, HiFi-GAN, or WaveGlow)

Example vocoder configuration:

vocoder:
  backend: hifigan
  hifigan:
    model_path: /path/to/hifigan-torchscript.pt

Contributing

See CONTRIBUTION.md for guidelines.

License

This project is licensed under the MIT License - see LICENSE.md for details.

Testing

See TESTING.md for testing procedures, automated test expansion guidance, and evaluation metric reporting.

Update Log

See UPDATE-LOG.md for version history.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages