Welcome to the official documentation for DATAMIMIC, a powerful synthetic data generation platform that combines domain-driven approaches with weighted distributions to produce high-quality, realistic data for development, testing, and demonstration.
Note: For comprehensive documentation including detailed model descriptions, exporters, importers, platform UI, and more, please visit our official online documentation at https://docs.datamimic.io/
This project documentation focuses specifically on:
- Using data domains
- Command-line interface
- Getting started with the Community Edition
- Developer integration
- Core Concepts
- Getting Started
- Domain-Driven Framework
- Examples
- DateTime Generator (weighted + DSL): examples/datetime_generator.md
- API Reference
- Advanced Topics
- Controlling Generator Caching
- Developer Guide
- Dataset Loading Standard
- Seeding & Reproducibility: see Developer Guide ("Seeding & Reproducibility") and DateTime examples
DATAMIMIC is built around several core concepts:
- Domain-Driven Generation - Industry-specific data models with realistic properties and relationships
- Weighted Distributions - Statistical accuracy through real-world distribution patterns
- Entity Relationships - Create complex, interconnected data entities
- Data Privacy - Built-in tools for anonymization and pseudonymization
- Multi-Domain Support - Healthcare, Finance, Insurance, E-commerce, and more
For detailed information on DATAMIMIC's architecture and model descriptions, please refer to our online documentation.
To install DATAMIMIC Community Edition:
pip install datamimic-ceThe fastest way to get started with DATAMIMIC is using the Python API:
from datamimic_ce.domains.common.services import PersonService
# Create a service instance
person_service = PersonService(dataset="US")
# Generate a single person
person = person_service.generate()
# Access person attributes
print(f"Name: {person.name}")
print(f"Age: {person.age}")
print(f"Email: {person.email}")
print(f"Address: {person.address.street}, {person.address.city}, {person.address.state}")DATAMIMIC provides specialized services for different domains:
from datamimic_ce.domains.healthcare.services import PatientService
# Create a patient service
patient_service = PatientService()
# Generate a patient with medical information
patient = patient_service.generate()
# Access patient-specific attributes
print(f"Patient ID: {patient.patient_id}")
print(f"Blood Type: {patient.blood_type}")
print(f"Medical Conditions: {patient.conditions}")You can easily generate multiple entities at once:
# Generate a batch of people
people = person_service.generate_batch(count=100)
print(f"Generated {len(people)} unique people")
# Export to other formats if needed
for person in people[:5]:
print(f"{person.name}, {person.age} years old")DATAMIMIC favors explicit seeding via injected RNGs. Pass a seeded random.Random to services (when available) or directly to generators to get deterministic output:
from random import Random
from datamimic_ce.domains.common.services import PersonService
# Deterministic people for the same seed
svc_a = PersonService(dataset="US", rng=Random(123))
svc_b = PersonService(dataset="US", rng=Random(123))
assert svc_a.generate().to_dict() == svc_b.generate().to_dict()
# Seed a generator directly when a service doesn't expose rng
from datamimic_ce.domains.ecommerce.generators import ProductGenerator
from datamimic_ce.domains.ecommerce.models import Product
gen = ProductGenerator(dataset="US", rng=Random(42))
prod = Product(gen)
print(prod.to_dict()) # stable for the same seedDATAMIMIC's domain-driven framework organizes synthetic data generation by industry domains:
- Common Domain - Person, Address, Company
- Healthcare Domain - Patient, Doctor, Hospital
- Finance Domain - BankAccount, Transaction, Loan
- Insurance Domain - Policy, Claim, Insured
- E-commerce Domain - Product, Order, Customer
See also the dataset loading rules in Dataset Loading Standard:
- Use
dataset_path(...)or lightweight loaders for all file access - Pass base filenames (helpers append
_{CC}.csv) - Keep I/O in generators, not models
- Declare supported datasets in services using
compute_supported_datasets([...], start=Path(__file__))
For detailed documentation on the domain-driven framework, see the domain-specific documentation.
DATAMIMIC includes detailed examples demonstrating various use cases:
- Person Generation - Generate and work with person entities
- Healthcare Data Generation - Create realistic healthcare data
- DateTime Generator - Deterministic, weighted date/time sampling
The API reference documentation covers:
- Python SDK for all domains
- Configuration options
- Extension APIs
For complete API documentation, see the API Reference.
DATAMIMIC offers advanced capabilities for power users:
- Enterprise Features - Advanced features available in the Enterprise Edition
- Custom generators and extensions
- Performance optimization
- Database integration
For detailed information on advanced topics, see the Advanced Topics documentation.
For comprehensive information on the DATAMIMIC platform UI, advanced models, and enterprise capabilities, please visit our official online documentation.
For developers looking to integrate DATAMIMIC into their applications or extend its functionality, we provide a comprehensive Developer Guide.
All the examples included in this documentation have been tested and verified to work with the current version of DATAMIMIC. The test results can be found in our test scripts that validate each example.