Olewave - Crunchbase Company Profile & Funding

Overview

Olewave

Heat Score

Company Performance Metrics

Score

Trend

About These Metrics

AI Content may contain mistakes and is not legal, financial or investment advice.

Learn more.

Growth Score - The measure of growth based on company activity, operational metrics, and investments

Growth Trend - The change in the growth score over the given time period

Heat Score - The measure of the market interest, media activity, and Crunchbase profile activity

Heat Trend - The change in the heat score over the given time period

Recent Milestones

Raised Funding Round

Jan 1, 1970

Lorem ipsum dolor sit amet, consectetur

Raised Funding Round

Jan 1, 1970

Lorem ipsum dolor sit amet, consectetur

Raised Funding Round

Jan 1, 1970

Lorem ipsum dolor sit amet, consectetur

See more interactive data and historical milestones on desktop for the best experience

About Olewave

CB Rank

415005

Voice Datasets, Voice Data Procesing Pipeline, and Voice AI Solutions.

Founded

obfuscation

Private

San Francisco, California, United States

1-10

www.olewave.com

Computer Vision

Consulting

Data Mining

Natural Language Processing

Software

Speech Recognition

Details

Legal Name

Olewave LLC

Operating Status

Active

Company Type

For Profit

Founders

Wei Chu

About the Company

Olewave prouldy offers voice datasets, voice data processing pipeline, and voice AI solutions:

1. Voice datasets

Comparing with NVIDIA released pre-labeled Granary speech dataset, our Olewave dataset outperforms it in label accuracy, annotation richness, and flexibility: * Transcript quality – They use raw Whisper-v3 transcripts. We correct ASR

errors with extra metadata. * Transcript validation – Whisper often hallucinates. We validate transcripts with our Olign tool, which provides reliable word- and utterance-level confidence scores. * Enriched labels – They do not include speaker names or talk-turns. We provide both. * Original data – They give you segmented audio only. We deliver full recordings with precise timestamps, and metadata, giving you more flexibility. * Customized Services – They leave you on your own. We provide tailored data processing services.

We released a small portion of tier IV of Olewave datasets for researchers to train their conversation voice AI models for free: https://huggingface.co/datasets/olewave/OleSpeech-IV-2025-EN-AR-100

The whole datasets are available for only US and Japan entities to purchase. We are a California company with no satellite office in China and no plans to establish one.

2. Voice data processing pipeline

Olewave’s innovative pipeline takes raw speech recordings, corresponding transcripts, and optional metadata as inputs, and robustly cleans noisy transcriptions and labels. It delivers validated speaker labels and transcripts, and rich features such as reliable word-level timestamps and confidence scores. By leveraging advanced alignment techniques, our pipeline ensures the highest levels of accuracy and usability, making it an indispensable tool for building high-performance but cost-efficient speech-based systems.

This pipeline or the modules in the pipeline are licensable.

3. Voice AI services

We build avant-garde but cost-effective in-house voice AI solutions such as ASR, TTS, speaker diariazation, conversational speech models ... in multiple languages by using our signature voice datasets and processing our client's voice data with our data processing pipeline. We have 20 years of experience in voice processing and 10 years of experience in tech consulting. We strictly adhere to our NDA, which is why client names are not shown on our website: https://www.olewave.com/

Contact Email

info@olewave.com

Predictions & Insights

Growth Prediction

See this Growth Prediction

Products & Services

Customizable Pre-Labeled Speech/Audio/Video/Multimodal Datasets

Pre-labeled datasets of natural speech and multimodal data customized to client needs, covering multiple languages and diverse topics for AI training and evaluation.

Speech Foundation Models

Pre-trained voice generative and recognition models supporting multiple languages and accents, available for licensing to avoid third-party data exposure.