Google ai video transcription

11/12/2023

Interestingly, the three national stations have the highest correlations and the four network stations the lowest. Kalev LeetaruĪcross all seven stations the total correlation was r=0.95, ranging from 0.96 for CNN and MSNBC and 0.95 for Fox News down through 0.75 for CBS. The Pearson correlation between captioning entities and transcript entities. Only entities that did not include a number and appeared at least five times across the combined airtime of the seven stations were considered. To test this, a master histogram of all extracted entities was compiled and the Pearson correlation computed for each station between its captioning entities and transcript entities, seen in the graph below. One limitation of this graph is that it shows only the density of entity mentions, not how well they match up between the captioning and transcript. They could have similar number of entities but due to human or machine error the extracted entities could be completely different from one another. This means that stations devoting a greater proportion of their airtime to ads will show a greater difference. In fact, examining the station-provided captioning word-by-word for each of the stations, the graph above reflects to some degree the level of error in the captioning, with the closer it matches the machine transcript the higher its fidelity and lower its error rate.Īn even greater driving factor is that captioning typically does not include advertisements, while the machine transcript includes all spoken words. The significant difference between the transcript and captioning entities for PBS appears to be due to a particularly high density of typographical errors in the station-provided captioning which both affected entity mentions themselves and sufficiently interrupted the grammatical flow of the transcript such that it impacted the API’s ability to identify entity boundaries. Google’s Natural Language API relies on proper capitalization to correctly identify entities and their textual boundaries and to distinguish proper nouns from ordinary words.

The primary reason for this appears to be that the station-provided captioning is entirely uppercase, while the machine transcripts are correctly capitalized, using the linguistic capitalization model built into Google’s Speech-to-Text API. This ranges from 1.4 times more for Fox News to 2.2 times more for PBS. Immediately noticeable is that the automated captioning consistently produces a greater density of recognized entities compared with the station-provided captioning. The average seconds per entity by station between Speech-to-Text and closed captioning.

0 Comments

I'm James. This is my year of travel.

Google ai video transcription

Leave a Reply.

Author

Archives

Categories