Common Voice: The Language Learning Resource Created by the People

One of the main frustrations among language learners is that after spending so much time working to acquire the language, we still find ourselves unable to actually understand and converse with native speakers in a natural environment.
And there is a very obvious reason that this happens: Despite our hard work and effort, we rarely train using authentic language, instead relying solely on the watered down version found in our learning material or classes. This leaves us unprepared to confront the real thing in real life.
But what if there were a way to supplement our learning material with authentic voices speaking their language naturally, thus better preparing us for the real world?
Better yet, what if these came in bite-sized, individual audio files, allowing us to use them to create our own short, controlled lessons.
If you think this would be ideal (as I do), then I’ve got some great news for you.
Welcome to the Common Voice project.
You may have never heard of Common Voice but if you are a serious language learner, please pay heed for this resource is golden.
It not only provides us language resources that take us far beyond our learning material or apps, but also puts them directly into our laps allowing us to – in essence – create our own materials consisting of authentic mini-lessons that we can adapt to train ourselves in real language.
So Common Voice…What Is It?
Common Voice is a project established to collect a database of human voice recordings in multiple languages and created by us, the public. These are not voice actors nor instructors attempting to annunciate every single word in a sentence for easier intelligibility.
These are real people speaking real language at a real pace. And this is exactly the material we need to train ourselves to be able to participate in the language we want to learn.
It is a crowd-sourced project meaning anyone anywhere can participate by recording themselves reading a given sentence or by validating the accuracy and quality of the sentences already recorded.
You may have heard about a concept called sentence mining, which refers to the practice of collecting sentences from various authentic sources thus giving the learner new material to work with. Traditionally these sentences were taken from written materials such as newspapers, magazines, or blogs, which meant that they were text-based only with no audio recordings to accompany them.
With Common Voice we now have it all.
This resource, dear readers, is legit. And if you want to learn a language well, get yourself over to the Common Voice site now and get moving.
How to Use It
But before we get started, let’s first consider something very important. The mere fact that we have access to these individual media files of native speakers demonstrating how to say these sentences at a natural pace, doesn’t mean much if we don’t understand how to use those files to train ourselves in the language.
We must ask ourselves what it is we want to be able to do with this material and what the desired outcomes will be. Those are the questions that will prompt us to begin turning these files into actual learning material.
In this case, we will want to be able to:
- Say the sentences as closely as possible to how the native speaker says them. This will include proper pronunciation, maintaining the correct rhythm, and keeping a similar speed.
- Understand the message of the sentence, phrase or word.
- Produce these sentences to express real meaning.
- Use them as material to prompt us to create new and unique sentences.
There are many possibilities to achieve the above and I encourage you to brainstorm and get creative. But, for the sake of this article, I am providing you with a basic plan on how to work toward accomplishing the above-mentioned goals.
Say It Like the Native Speaker
When learning a foreign language we want to be able to speak as a native speaker does. We are not going for perfection, but rather an intelligible ability in the language that allows us to participate in the community. So we must train ourselves toward that goal.
With audio files from Common Voice we have ready-made clips in a variety of accents, from people of varying ages, genders, and backgrounds. This is ideal since we are exposed to real-life language in all its messy glory and this takes us beyond the idealistic language often used in learning material.
So how do we get started?
It all begins with the two pillars of language learning: Listening and Repeating. This sounds simple and certainly not revolutionary (especially in today’s world of educational technology), but these two actions will actually help cement the material in your brain.
I offer a few suggestions to consider:
- Repeat, repeat, and then repeat some more. Seriously, listen to these files and continue to say them over and over again until you can do so at the pace of the speaker.
- Focus on your pronunciation and try to get it as close to the speaker’s as possible. Record yourself and then compare it to the recordings to find the areas you have to work on.
- Focus on the rhythm and cadence and again try to mimic the speaker as much as possible. This is what will get you sounding much more like a native.
- Challenge yourself – tell yourself that you’ll have to say the sentence ten times fast within the next minute, for example. This may seem childish, but these types of self-challenges actually motivate us. They also help us improve.
- Set a plan for yourself and say that, for example, you are going to try and work on a new sentence everyday.
- Put them on your phone so you can practice while commuting, jogging, working out, shopping.
Let’s look at a quick example. In the chart below you can see that I have three sentences in Irish that I would like to work on. My document (more on this later) includes the name of the audio file, the sentence in Irish, the translation into English, and then a quality score.

Since I am fairly new to the Irish language, I set a goal of three sentences a week.
I work on one sentence at a time and will move on to the next only after mastering the first. I listen to the recording a few times and then try to repeat after the speaker. This is challenging, but I do this at various intervals throughout the day working in periods of 5 – 10 minutes at a time. After a week, I was able to get through only two sentences, but I did so with proper pronunciation and am beginning to build an ability to speak at the pace of the native speaker. The following week, I decide to tackle three sentences again.
Understand the Meaning
So mimicking sentences is all well and good (and may be a fun party trick), but if I don’t understand what I’m actually saying, the results are not very useful. Roger that.
We obviously want to be able to understand the meaning behind these sentences, so we must work on that too. This isn’t necessarily a different strategy from our repetition exercises above. We should be working on these two goals simultaneously.
In the Irish example just presented, we have the sentences in one column and their translations in another, so do the following:
- While listening to the sentences, focus on the meaning at the same time. Try to visualize the situation as you are listening in order to begin associating meaning to them.
- While repeating them as in the steps above, also mentally focus on the meaning. The more you do this the more the link will be made between what you are mimicking and what you are actually saying.
- Take a break from the repeating and simply listen to them throughout the day, trying to envision what the sentence actually means.
- Once you’ve listened to a few and have a good idea of what they mean, try playing them in random order. This is an additional challenge since you are making sure you haven’t simply memorized them in order.
Produce These Sentences on Your Own to Express Meaning
Ok, you can say them pretty well and you have an idea of what you’re saying. Good, good, good. Now try to do a backwards translation.
What this means is that you need to challenge yourself in a different way (and this definitely is a challenge) by looking at the meaning in your native language and then saying the sentence perfectly in your target language. Place your hand over the foreign sentence, look at the translation in your native language, and then try to say the sentence fully, quickly, and perfectly in the target language.
Saying it perfectly means that you don’t leave anything out – not even those pesky small words which can often trip us up.
Once you can do that without flaw, you are demonstrating that you are able to express that full idea in your new language.
Well done, you!
Use These as Material to Prompt You to Create New and Unique Sentences
Obviously we don’t want to learn a language by simply parroting what other people say. In order to have an interesting and useful conversation, we have to produce language based on our own thoughts. So let’s use these sentences as a trigger to build that capability.
In my book, I refer to these types of exercises as guided production exercises and they are designed to help us master the patterns of a language and therefore express new ideas correctly.
We simply take a source sentence or idea such as those we are getting from Common Voice and then we manipulate it in a way that allows us to practice expressing new thoughts.
This is a classic model used in drilling exercises and is incredibly effective at building fluency.
You can, for example:
- Repeat the sentence and then swap out vocabulary in order to create new meaning and an original sentence.
- Change the tense of the verb so that you are now saying that sentence and practicing the past, present or future tenses.
- Turn the sentence into a question or a negative statement.
- Combine various sentences you find in Common Voice to create longer, more complicated sentences with new meanings.
The above are just a few examples, but you can see how you can use these sentences to practice anything you need to work on; vocabulary, grammatical patterns, or fluency building.
Note for Beginners
You may be saying, “Whoa, Mister. Let’s slow that down and take it back a couple notches. After all, I’ve just begun studying Swahili (or whatever language you’re studying).”
Touché. This might seem overwhelming since you are just starting out and are still getting used to the sounds of the language, learning some basic vocabulary, and have not yet learned how to create sentences in order to do the guided production exercises.
Understood.
But here’s the truth – even as a beginner you can do almost everything outlined above. That is, you can repeat until you can say the sentence well, you can listen until you understand it, and you can then work on translating it from your native language back into your target language.
It will be hard. It will. But take it slowly and know that the effort you are putting in will lead you fluency much more quickly than if you simply stick to your learning material.
How to Organize the Material
I left this part to the end, since it really just deals with the how-to of accessing and downloading these files and then setting up a document that allows you to organize and work with them.
This is obviously important, but will read much more like an instructional manual and may not be of interest to everyone since for many of you this will be obvious.
Basically there are two things we must do in order to get started:
- Download a dataset of language files to our computer
- Create a spreadsheet of some type to help us organize our files and act as a type of learning manual
Downloading the Datasets:
Visit the Common Voice website and click on the “Datasets” tab at the top. Scroll down until you find the menu which allows you to choose the language you want to work with.

You will see a list of datasets for each language with the most recent version at the top and then descending in reverse order.
You will notice two different types: one called “Common Voice Corpus” and the other called “Common Voice Delta Segment.” The first one, Common Voice Corpus, is the dataset that includes almost all of the files for that language, so choose the one with the most recent date to download.
The files entitled “Delta Segment” consist of any additional audio files that were added after the main dataset was created. This means that they are basically updates that allow you to download them as they come in, without having to download the full dataset again.
So, for example, in the image above, I will download Common Voice Corpus 12.0 for Irish dated 12/15/2022 and this will give me nearly the full data set. I will then download the Delta Segment 12.0 which will give me the additional files that were added later. With that I have all of the files for Irish. Later, as new recordings are added, I will only need to download the newest Delta Segments to get the updates.
So, simply select the file you want to download and scroll to the bottom, enter your email and agree to the terms. Then click “download dataset bundle.”

Note: Some of these files are very large. In the case of these files for Irish the dataset is only 224MB. Totally doable. But, in the case of English, for example, we have a set which is 74GB.
When the file downloads to your computer, unzip it, and you will see a folder containing all the audio and text files.

We will be focusing on the “clips” folder which contains all the audio files and the document called “validated.tsv,” which contains the text for those recordings that have been validated as legitimate.
Creating a Useable Document to Help You Study
We now have all of our audio and text files, but we will need to turn one of those text files into a spread sheet that will act as our learner manual.
To do so, we will be using the file called “validated.tsv.”
The “validated.tsv” file contains all of the information about the recordings, such as the sentences themselves, the names of their respective audio files, information about the speakers, and the votes of confidence for correctness and quality that each sentence has received from the public.
But the “TSV” format is not one that we simply open up and read. We must import it into a program such as Excel, LibreOffice, or Google Sheets which will allow us to read these in a column and row format, and therefore help us organize the information in a manner that makes it easy to work with.
For languages that do not use the Roman alphabet or for languages that contain unique symbols, accent marks, or diacritics, we will have to make sure to import these using the Unicode character set. This is simply a way for these documents to then display the appropriate characters instead a set of indecipherable symbols.
Depending on the program you are using and the version you have, this may happen automatically. If not, you may have some fiddling around to do.
Let’s look at the sample below:
I’ve downloaded the Japanese dataset and find my TSV file. I’m on a Mac (and an older one at that) so what I’m showing you will differ based on your operating system and version. But the basics will still be the same.
Also for this example I will be using LibreOffice since my version of Excel has not been good at importing Unicode characters. The reason why is beyond my technical knowledge, and to be honest, is beyond my interest level too. The last thing I want to do is fight with an old version of Excel to get it to do what I want. I seriously have more important things to do with my life.
But, I will leave you here with some links which explain how to import a file and render it in Unicode characters for Excel or Google Sheets.
So back to me and my old computer and operating system. I click on the TSV file and choose to open it with LibreOffice. As you can see it automatically gives me the option to import in Unicode-8, which is exactly what I want.

Once my file opens, the first column will be something called Client ID. This is of no interest to us, so we can simply delete this column and move on to the actual content we want.

This leaves us with the following columns:
- Path: Name of the audio file (which are located within the files folder)
- Sentence: The actual sentence being spoken
- Up Votes: How many people have voted for this sentence as reliable, correct, and of fair quality
- Down Votes: the opposite of Up Votes
- Age: Age of the speaker if noted
- Gender: Gender of the speaker if noted
- Accents: For example, Southern German vs. Northern German
- Locale: No idea what this means, but not important to me
- Segment: Ditto
What we will want to do now is to sort our files by “up votes” so that we are beginning our training with audio that has been voted as the best examples. This certainly isn’t necessary, but I prefer to organize my files this way. I then work my way down the list.
So, simply click in the “up votes” column and sort in descending order meaning that the ones with the most “up votes” will appear at the top.

Now you may notice something missing. That’s right, there are no actual translations of the sentences themselves. These are not provided in Common Voice since translating is not the goal of this project. No problem, though. I add those translations in myself.
I insert another blank column into the document next to the sentences and call it English.
I then take the first ten sentences (since that is how many I want to work with first) and put them into Google Translate. I then copy those back into my document.

Now I have something I can work with.
You’ll notice that the translations aren’t always very good, but we are all aware that any online translation tool is flawed – sometimes comically so. But in most cases they will give us an idea of what the sentence is about and this is most often sufficient for our leaning purposes.
In the case of this Japanese example, there are quite a few that I may choose to skip since the translation doesn’t make much sense to me.
“Battle content was implemented and retirees continued,” anyone?
But there are others that are perfectly fine and I will use them as learning material.
You’ll also notice that in many cases, we don’t actually have a complete sentence, but rather a mere fragment. For example:
“In addition, an authority on Edo literature.”
I still find this useful however, since I am learning how to say a phrase such as “in addition” which is very helpful when I want to add information to a statement I just made. These are the types of little phrases and binders that actually help us speak more naturally.
I can also use that sentence fragment to help me practice building a full sentence, such as this:
“In addition, she was an authority on Edo literature.”
Remember, any type of authentic language material is learning material. If you use it as such, that is.
So, when I have my new spreadsheet set up, I save it as a LibreOffice or Excel document since I no longer want to use a TSV format.
And there we go! I’m ready to start training myself based on the goals we discussed above.
Before I go, I do want to mention one last thing. As we discussed, these are not professional recordings so the sound quality will vary. I find that even some of the highly up-voted recordings can be difficult to work with. So, I simply put those aside and move on to the next ones on the list. We are talking about thousands of sentences here – you will not be running out of material any time soon.
Using These in an SRS such as Anki
Many of you out there may be using Anki as a learning tool. I do.
These audio files are excellent material to add to your Anki decks providing you with cards that have both the audio and the written text to help you learn.
And the good news is that you certainly do not have to add this material and information manually. With a little bit of pre-work and organization, you can import both the text and the audio into an Anki deck in bulk and create hundreds of cards quickly and efficiently.
I’m not going to go into the details on how to set that up since the Anki support documentation explains how to do this.
You can find that here:
https://docs.ankiweb.net/importing.html
And…Over to You
We’ve covered a lot in this article and I hope you are excited, or at least curious, about the possibilities that Common Voice offers.
This ability to easily source content and create our own learning material is transformative and will be, I believe, what drives the future of education as we work to encourage more autonomy and independence on the part of the learner.
We must not forget, however, that this material on its own simply exists. We must put in the effort to think about how it can act as a learning tool and then put those ideas into action.
So I’m leaving you with that. Get out there and start exploring.
Sign up for the newsletter
Become a part of The Independent Language Learner community. Sign up for the newsletter to get advice, read success stories, discover new learning and training techniques, find news from the language learning world, and get continuous encouragement to motivate you to keep going.