For AI labs
All modalities Voice Robotics Expert evaluation Multimodal & text
For partners
Join the network How partners earn Apply
Live
VoiceTibetan Ü-Tsang pilot launched — Dharamshala RoboticsRobotics teleop cohorts expanding across SE Asia Expert EvalExpert evaluator network expanding across domains VoiceSardinian pilot kicked off via Cagliari + Sassari studios MultimodalCross-modality embodied-AI dataset scoping with frontier lab VoiceCebuano production cohort scaling
FOR AI LABS

Data your existing vendors can't quote on.

Underrepresented-language voice. Robotics demonstrations from regions your team can't recruit in. Credentialed expert evaluation. Multimodal RLHF traces. Quartz Labs sources, annotates, and ships across modalities your default vendors won't cover.

Request a sample bundle
BUILT BY OPERATORS AND RESEARCHERS FROM
Caltech
Harvard Business School
Goldman Sachs
McKinsey & Company
Tsinghua University
D. E. Shaw & Co.
LinkedIn
Microsoft
Y Combinator
Broadcom

Modality 01 — Voice

Voice.

Multispeaker conversational audio with full human-generated transcripts in underrepresented languages worldwide, with hundreds more available on request. Dialect annotation, speaker diarization, demographic balance on request.

Native-speaker sourcing through universities, recording studios, language NGOs, and home-region and diaspora partner networks worldwide.

Pilot scope: scoped per engagement. Production scope: scaled per language on demand.

Sample language coverage
LanguageRegionFamilySourcingStatusTier
TagalogLuzon, PHAustronesianStudioLIVET1
CebuanoVisayas, PHAustronesianStudioLIVET1
HiligaynonWestern Visayas, PHAustronesianStudioLIVET2
IlocanoNorthern Luzon, PHAustronesianStudioLIVET2
Hmong (Daw)Diaspora · MN, USHmong-MienDiasporaLIVET2
MadureseEast Java, IDAustronesianUniversityLIVET2
TamazightAtlas, MAAfro-AsiaticNGOLIVET2
KabyleKabylia, DZAfro-AsiaticNGOLIVET2
SepediLimpopo, ZABantuUniversityLIVET2
GuaraniAsunción, PYTupianUniversityLIVET2
Antillean CreoleGuadeloupe, FRFrench CreoleNGOLIVET2
Jamaican CreoleKingston, JMEnglish CreoleStudioLIVET2
SardinianCagliari · Sassari, ITRomanceUniversityLIVET2
LuxembourgishLuxembourgGermanicUniversityLIVET2
BouyeiGuizhou, CNTai-KadaiUniversityPILOTNEXT
RangpuriNorth Bengal, BD/INIndo-AryanNGOPILOTT3
BhiliGujarat · MP, INIndo-AryanNGOPILOTT3
Tibetan (Ü-Tsang)Dharamshala, INSino-TibetanDiasporaPILOTT3
Balochi (Makrani)Diaspora · Karachi · MuscatIndo-IranianDiasporaPILOTT3
TurkmenDiaspora · TR · DETurkicDiasporaPILOTT3
UyghurDiaspora · KZ · DC, USTurkicDiasporaPILOTT3
Modality 02 — Robotics

Robotics.

Teleoperation demonstration data for foundation-model robotics teams. Sourced through our SE Asia operator network — Hanoi and Jakarta cohorts of trained teleoperators with standard manipulation and locomotion setups.

Per-task structured demonstrations with full state, action, and sensory logging. Operator demographic data, task success metadata, and provenance per session.

Pilot scope: 200 hour engagement. Production scope: scale by cohort expansion.

Modality 03 — Expert evaluation

Expert evaluation.

Credentialed domain experts evaluating frontier model outputs in high-stakes domains. Medical (MD, PA, NP), legal (JD, bar-admitted), financial (CFA, CPA, regulatory), and scientific (PhD with current publication record).

Per-query structured evaluation with rubric, reasoning trace, and audit log. All evaluators under NDA. Provenance per evaluation verifiable on request.

Pilot scope: scoped per engagement. Production scope: standing evaluator networks per domain, scaling on demand.

Modality 04 — Multimodal & text

Multimodal & text.

RLHF, instruction tuning, multilingual classification, code corpora, vision-language pairs, cross-modality datasets. Bespoke pipelines built around your spec and schema — not an off-the-shelf product.

Same network applies. Same operational layer: specify, source, annotate, ship, with consent log per row.

Scope: per engagement. Quoted within 48 hours of the first scoping call.


Deliverable

An abstracted voice sample.

Multitrack audio, word-level timestamps, speaker diarization, demographic metadata, and a signed consent log. Same operational layer in robotics, eval, and multimodal — different schemas.

Sample bundle
Consent IDQLB-VOX-2026-04417
LanguageCebuano (ceb)
RegionCebu City, PH
Speakers2 · F/M
Duration00:00:12.4
Sample rate48 kHz · 16-bit
00:0000:0200:0400:0600:0800:1000:12
00:00.42SPK_01Maayong buntag. Asa man ta moadto karong adlawa, sa merkado o sa playa?
00:03.18SPK_02Adto ta sa merkado sa, dayon sa hapon sa playa. Init kaayo karon.
00:06.91SPK_01Sige. Magpalit pud ko og isda nga bag-o, para mag-sugba ta sa gabii.
00:10.04SPK_02Hala, sulayi ang tindahan ni Nang Lita. Pinakabaratuhon didto.

The operational layer

Request, source, deliver.

Per-row signed consent. Per-batch sample QA before scale. Daily drops or batched delivery to your bucket on your schema.

01 — Request

Your spec.

Language list, hour count, demographic targets, domain constraints, schema, rights model. We work from a spec you write or one we draft together on the first call.

Pilot quote
Within 48 hours. Indicative pricing published.

02 — Sourcing

Our network.

Universities, in-country studios, diaspora cohorts, operator networks, credentialed experts, dataset partners. Native-speaker or domain-expert sourcing per modality.

Quality bar
Acoustic, annotation, or domain QA at sample level before scale.

03 — Deliverable

Your bucket.

S3, GCS, or SFTP. Multitrack WAV plus JSON sidecars for voice. Per-task logs for robotics. Structured rubrics for eval. Daily drops or batched.

Consent log
Per-row signed release. Provenance verifiable on request.

How we're different

The default vendor wasn't built for frontier data.

Off-the-shelf vendors are built for languages, modalities, and timelines the last AI cycle paid for. We're built for the cases the next one requires.

DimensionOff-shelf vendorQuartz Labs
Language coverage20 major world languagesUnderrepresented languages live, hundreds more quotable
ModalitiesVoice or text, rarely bothVoice + robotics + expert eval + multimodal
Ethical sourcingMass-scraped, opaque provenancePer-row signed consent log, verifiable
Pilot to quote4-week scoping minimum48-hour quote, 14-day kickoff
Commitment6-month minimum, exclusivityTwo-week off-ramp, no exclusivity

Work we've shipped.

One live case so far. New cases land as each engagement matures past production milestones.

Case 01 · Voice · 2026 · Frontier voice AI lab

Seven languages, intro to first shipment in eighteen days.

The lab needed multispeaker conversational audio plus human transcripts in seven Southeast Asian and minority languages. We delivered all seven.

Modality: Voice · Status: Production
Read the full case

Get in touch

Tell us what you're training.

Book a 30-minute scoping call. We map your modality, scope, languages, and timeline — and quote indicative pricing within 48 hours of the intro.