3Play Media: Paving the Way to a Better Captioning Future
How do you start a company?
First, you probably need an idea for a product. Then you need a team to build it. After that, you get customers...right? Not 3Play Media.
“We sort-of chose team before we had a product, and we sort-of had customers before we had a product, which is one of the reasons why we were able to bootstrap so early,” said Co-Founder and COO Chris Antunes on the founding of the Boston-based transcription, captioning, and subtitling company.
Such a thing sounds rare, and it took a unique set of circumstances for the company’s four co-founders—Antunes, Chief Revenue Officer Josh Miller, Chief Customer Officer Jeremy Barron, and CTO Chris “CJ” Johnson—that allowed this to become a reality.
This was in 2007, and the four were all MIT Sloan MBA students connected in various ways. Three of them were all in the same cohort, and they had previously worked on other projects together—including an entrepreneurial venture that didn’t ultimately pan out.
Johnson had previously worked at MIT OpenCourseWare, the part of the school that publishes all of its undergrad and grad-level materials online for free.
“Serendipitously, OpenCourseWare reached out to CJ with this problem while we were at Sloan. OpenCourseWare was publishing courses online for free, they were one of the first to do that,” Antunes said. “Part of their funding stipulated that the video had to be accessible, so captioned, and back then captioning was pretty hard and expensive, and a substantial amount of time would be spent using traditional captioning measures.”
Traditional captioning measures would involve humans listening to audio and typing each word manually, which is costly and takes far too much time to scale. OpenCourseWare presented the problem to Johnson, who in turn approached the other three now-co-founders to try to solve this problem.
They also gained the help of MIT Senior Research Scientist Jim Glass, who leads the school’s Spoken Language Systems Group at CSAIL. Johnson had the idea to use speech recognition as a starting point to create faster captions, and with Glass’s mentorship, the four co-founders got to work and a new, budding company was formed.
“Fundamentally, what we were trying to do was make captioning faster, meaning less human time required to create the captions, and if the human time went down, then we could bring the cost to a place where it made sense for our initial customer, meaning OCW,” Antunes said. “So that’s what I meant by having customers before product. There’s this need, and this customer—now we just need to build the product.”
What they built over the next year was a much more basic version of what they use today, involved using speech recognition technology to create a quick, imperfect transcription that would be polished into a finished product using human editors. This would give captioners a solid base to work from, and ultimately save a lot of time.
While such technology is more well-known today, 3Play Media was extremely early to this market. In this 2007-2008 timeframe, a lot of eventual customers were too busy figuring out how to get video online to worry about captioning. “Now obviously that mindset has changed a ton, and as our customers got video online, they were looking for solutions,” Miller said.
Their work was a success, and the rest, as they say, is history.
As the captioning platform grew and grew, they began to add more features, and they’ve grown significantly. Now at 40 employees (excluding captioners), the company offers a number of services today, including captioning, transcription, audio description, translation & subtitling, and Spanish captioning to over 2,500 customers around the world.
Their current, evolved captioning method works through a three-step process. First, the audio or video file is put through proprietary speech recognition software to get a basic transcription. Second, a human captioner does a full scrub of the transcription, fixing errors and properly formatting it. Lastly, another human does a round of QA testing and refinement.
3Play Media’s range of work is extensive; their three core customer types are media companies, corporate content publishers, and educational institutions, so their work ranges from lectures to TV shows and beyond.
In order to respond to scale, the company utilizes a large number of contracted captioners. Antunes explained that they only do so ensuring that their captioners are well-paid and treated well. “We are really passionate about defending them and defending the rates we pay them, making their experience as good as possible by releasing new features and making their captioning experience better,” he said.
As for future outlook, 3Play Media is looking to make a further push into the audio description space this year. While they offer some audio description services now, it’s a challenging business because, as Miller explained, audio descriptions traditionally require a voice actor and cost in the area of $20-30 per minute, and it’s hard to edit. They’re working on using synthesized speech, which Antunes said could be “even more disruptive” than their current captioning services.
The company is especially excited about this and the future of their company as a whole because, while their customers and captioners are two of their core constituents, their focus is always on the end-user—primarily those who are hard of hearing or low-vision—and improving their lives in every way they can.
“We want to facilitate the process of making every video accessible,” Miller said. “We do believe we as a society are going down a path where every video is going to be transcribed and captioned. There's a good reason to do that, and we want to be the leaders in it.”