AI Voice Clone

Overview

AI Voice Clone is an open-source project aimed at developing advanced voice synthesis and cloning technologies using artificial intelligence. The project focuses on creating realistic voice replicas from audio samples, enabling applications in entertainment, accessibility, education, and more.

Technologies

Programming Language: Python 3.8+
Deep Learning Framework: PyTorch
Audio Processing: Torchaudio, Librosa
Voice Synthesis: Tacotron2, WaveGlow, or similar TTS models
Machine Learning: Scikit-learn for preprocessing
Web Framework (future): FastAPI or Flask for API deployment

Project Scope

Current Features

Basic voice recording and preprocessing
Audio feature extraction (MFCC, spectrograms)
Model training pipeline setup

Development Roadmap

Phase 1: Implement basic voice cloning with pre-trained models
Phase 2: Custom model training from user audio samples
Phase 3: Real-time voice conversion
Phase 4: Multi-speaker voice cloning
Phase 5: Web interface and API deployment

Model Improvements

The model architecture is currently a compact encoder/decoder skeleton. Studying Tacotron2-style architectures, attention mechanisms, and multi-speaker conditioning will align with upcoming roadmap goals.

Key Components

Data collection and preprocessing pipeline
Neural network architectures for voice synthesis
Training scripts and utilities
Evaluation metrics and testing framework
Deployment and inference scripts

Installation

git clone https://github.com/PtiCalin/ai_voice-clone.git
cd ai_voice-clone
pip install -r requirements.txt

Usage

Command Line Interface

# Record a voice sample
python ai_voice-clone/main.py --mode record --duration 5 --output my_voice.wav

# Train a model with your voice
python ai_voice-clone/main.py --mode train --input my_voice.wav

# Generate cloned voice
python ai_voice-clone/main.py --mode clone --input my_voice.wav --text "Hello, this is my cloned voice!" --output cloned_voice.wav

Python API

from ai_voice_clone import VoiceCloner, AudioInput, FeatureExtractor, Trainer, InferenceEngine, Config

# Initialize components
config = Config()
config.load()

audio_input = AudioInput()
feature_extractor = FeatureExtractor(config)
model = VoiceCloner(config)
trainer = Trainer(model, feature_extractor, config)
inference_engine = InferenceEngine(model, feature_extractor, config)

# Record or load audio
audio_data = audio_input.record_audio(duration=5)
# or
audio_data, sr = audio_input.load_audio("path/to/audio.wav")

# Train model (if needed)
trainer.train("path/to/training/audio.wav")

# Generate cloned voice
cloned_audio = inference_engine.generate_voice("Hello, world!", "path/to/reference/audio.wav")

# Save result
audio_input.save_audio(cloned_audio, "output.wav")

Project Structure

ai_voice-clone/
├── main.py                 # CLI entry point
├── config.py              # Configuration management
├── audio_input.py         # Audio recording and loading
├── feature_extraction.py  # Audio feature extraction
├── model.py               # Neural network models
├── training.py            # Model training logic
├── inference.py           # Voice generation
├── __init__.py            # Package initialization
├── requirements.txt       # Dependencies
└── ...

Configuration

The system uses a YAML configuration file (config.yaml) with the following main sections:

audio: Audio processing parameters
features: Feature extraction settings
model: Neural network architecture
training: Training hyperparameters
inference: Generation parameters
vocoder: Mel-to-audio backend selection (Griffin-Lim, HiFi-GAN, or WaveGlow)

Example vocoder configuration:

vocoder:
  backend: hifigan
  hifigan:
    model_path: /path/to/hifigan-torchscript.pt

Contributing

See CONTRIBUTION.md for guidelines.

License

This project is licensed under the MIT License - see LICENSE.md for details.

Testing

See TESTING.md for testing procedures, automated test expansion guidance, and evaluation metric reporting.

Update Log

See UPDATE-LOG.md for version history.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Voice Clone

Overview

Technologies

Project Scope

Current Features

Development Roadmap

Model Improvements

Key Components

Installation

Usage

Command Line Interface

Python API

Project Structure

Configuration

Contributing

License

Testing

Update Log

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
GitHub		GitHub
ai_voice-clone		ai_voice-clone
.gitignore		.gitignore
CONTRIBUTION.md		CONTRIBUTION.md
DATA-DICTIONARY.md		DATA-DICTIONARY.md
LICENSE.md		LICENSE.md
README.md		README.md
TESTING.md		TESTING.md
UPDATE-LOG.md		UPDATE-LOG.md
config.yaml		config.yaml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

AI Voice Clone

Overview

Technologies

Project Scope

Current Features

Development Roadmap

Model Improvements

Key Components

Installation

Usage

Command Line Interface

Python API

Project Structure

Configuration

Contributing

License

Testing

Update Log

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages