Blog

Explore the blog

Field Notes is the Way With Words blog: long-form guidance on human transcription, broadcast and corporate captioning, interview and research audio, and speech dataset design for ASR and conversational AI. We write for programme managers, researchers, legal and compliance teams, and product leaders who care about accuracy, turnaround, and defensible data handling.

Newest posts appear first. Browse by topic to follow a theme across many articles, or use search on this page to filter titles, descriptions, authors, and tags. When you are ready to price work or talk through scope, the service pages and contact form are linked from the site header and footer.

Search posts

Showing 12 posts on this page (527 total)

2 September 2025

Crowdsourced Speech Data: A Cornerstone of Dataset Acquisition

By Way With Words Team

This article explores the benefits of crowdsourced speech data collection, the platforms that enable it, dataset quality, and the ethical considerations involved.

Read article

1 September 2025

Scripted vs Unscripted Speech: Analysing Benefits and Challenges

By Way With Words Team

This article explores the differences between scripted vs unscripted speech, their advantages, limitations, and practical applications in today’s AI landscape.

Read article

29 August 2025

How Do You Test for Fairness in Speech AI?

By Way With Words Team

How Do You Test for Fairness in Multilingual Speech Data? Building Equitable Systems that Function for Everyone Artificial intelligence has rapidly become...

Read article

28 August 2025

What Multilingual Open Speech Corpora Exist for Research?

By Way With Words Team

What Multilingual Speech Corpora Exist for Open Research? Why is Access to Open Multilingual Speech Datasets Important? Artificial intelligence systems tha...

Read article

27 August 2025

Synthetic Dialect Generation: Training Machine Learning Models

By Way With Words Team

Machine learning has made remarkable progress in synthetic dialect generation, opening new opportunities for speech synthesis, data inclusivity, and language preservation.

Read article

26 August 2025

Best Practices for Tagging Multilingual Code-mixing in Audio

By Way With Words Team

Tagging multilingual code-mixing in audio files is one of the most complex but rewarding tasks in speech annotation.

Read article

25 August 2025

How Loanwords Impact Automatic Speech Recognition Models

By Way With Words Team

What is the Impact of Loanwords on ASR Models? Understanding How Loanwords Affect Automatic Speech Recognition Automatic Speech Recognition (ASR) has becom...

Read article

22 August 2025

Why Is Phonetic Transcription Useful in Multilingual Datasets?

By Way With Words Team

This article explores the foundations of phonetic transcription, its benefits in multilingual ASR and TTS systems.

Read article

21 August 2025

Effectively Manage Multilingual ASR Training Pipelines at Scale

By Way With Words Team

This article explores the key components of multilingual ASR training pipelines, and outlines the strategies and tools that engineers and developers use to manage complex datasets at scale.

Read article

20 August 2025

Avoid Common Speech Data Errors in Cross-Linguistic Corpora

By Way With Words Team

Understanding the most common types of speech data errors, their causes, and how they can be mitigated is essential for any team working on multilingual or cross-linguistic voice datasets.

Read article

19 August 2025

Practical Steps to Preserve Cultural Context in Speech Data

By Way With Words Team

Preserving cultural context in speech data has far-reaching effects, particularly in localisation and conversational AI.

Read article

18 August 2025

Why is Tonal Language Essential in African and Asian Speech Data?

By Way With Words Team

This article explores the essence of tonal languages, the challenges of recording them accurately and the wide-ranging applications of tone-aware ASR models.

Read article