IceWalkers.com - Linux Software downloads and news
Name : Password :
Linux SoftwareLinux RPMLinux HowtosLink UsAboutAdvertise

HOWTOs

Search Howtos :Match :

3. Introduction

3.1. Speech Recognition Basics

Speech recognition is the process by which a computer (or other type of machine) identifies spoken words. Basically, it means talking to your computer, AND having it correctly recognize what you are saying.

The following definitions are the basics needed for understanding speech recognition technology.

Utterance

An utterance is the vocalization (speaking) of a word or words that represent a single meaning to the computer. Utterances can be a single word, a few words, a sentence, or even multiple sentences.

Speaker Dependance

Speaker dependent systems are designed around a specific speaker. They generally are more accurate for the correct speaker, but much less accurate for other speakers. They assume the speaker will speak in a consistent voice and tempo. Speaker independent systems are designed for a variety of speakers. Adaptive systems usually start as speaker independent systems and utilize training techniques to adapt to the speaker to increase their recognition accuracy.

Vocabularies

Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by the SR system. Generally, smaller vocabularies are easier for a computer to recognize, while larger vocabularies are more difficult. Unlike normal dictionaries, each entry doesn't have to be a single word. They can be as long as a sentence or two. Smaller vocabularies can have as few as 1 or 2 recognized utterances (e.g."Wake Up"), while very large vocabularies can have a hundred thousand or more!

Accuract

The ability of a recognizer can be examined by measuring its accuracy - or how well it recognizes utterances. This includes not only correctly identifying an utterance but also identifying if the spoken utterance is not in its vocabulary. Good ASR systems have an accuracy of 98% or more! The acceptable accuracy of a system really depends on the application.

Training

Some speech recognizers have the ability to adapt to a speaker. When the system has this ability, it may allow training to take place. An ASR system is trained by having the speaker repeat standard or common phrases and adjusting its comparison algorithms to match that particular speaker. Training a recognizer usually improves its accuracy.

Training can also be used by speakers that have difficulty speaking, or pronouncing certain words. As long as the speaker can consistently repeat an utterance, ASR systems with training should be able to adapt.

3.2. Types of Speech Recognition

Speech recognition systems can be separated in several different classes by describing what types of utterances they have the ability to recognize. These classes are based on the fact that one of the difficulties of ASR is the ability to determine when a speaker starts and finishes an utterance. Most packages can fit into more than one class, depending on which mode they're using.

Isolated Words

Isolated word recognizers usually require each utterance to have quiet (lack of an audio signal) on BOTH sides of the sample window. It doesn't mean that it accepts single words, but does require a single utterance at a time. Often, these systems have "Listen/Not-Listen" states, where they require the speaker to wait between utterances (usually doing processing during the pauses). Isolated Utterance might be a better name for this class.

Connected Words

Connect word systems (or more correctly 'connected utterances') are similar to Isolated words, but allow separate utterances to be 'run-together' with a minimal pause between them.

Continuous Speech

Continuous recognition is the next step. Recognizers with continuous speech capabilities are some of the most difficult to create because they must utilize special methods to determine utterance boundaries. Continuous speech recognizers allow users to speak almost naturally, while the computer determines the content. Basically, it's computer dictation.

Spontaneous Speech

There appears to be a variety of definitions for what spontaneous speech actually is. At a basic level, it can be thought of as speech that is natural sounding and not rehearsed. An ASR system with spontaneous speech ability should be able to handle a variety of natural speech features such as words being run together, "ums" and "ahs", and even slight stutters.

Voice Verification/Identification

Some ASR systems have the ability to identify specific users. This document doesn't cover verification or security systems.

Search Howtos :Match :
VLC media player 0.9.7
Cross-platform media player and streaming server
Ruby 1.9.1 p2
Interpreted scripting language
NASM 2.06rc1
NASM is an 80x86 assembler designed for portability
Veejay 1.4.3
A Visual 'music' instrument and video tracking tool.
Evolution 2.25.2
GNOME mailer, calendar, contact manager and communications tool
Sylpheed 2.6.0rc
Mail User Agent based on GTK+
Nautilus 2.25.1
The Nautilus Environment -- Delivering a Richer User Experience
GtkHTML 3.25.2
HTML rendering/editing library
Pybliographer 1.2.12
Tool for managing bibliographic databases
GFTP 2.0.19
Free multithreaded ftp client
Free IT Magazines, White Papers, eBooks, and more !
Oracle Magazine

Contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more.

eWeek

The essential technology information source for builders of e-business.

BusinessWeek (Digital Edition)

Provides readers a deeper understanding of the trends that drive growth, and what best practices keep them ahead of the competition.

Linux Software Map
Find Linux RPM
Best Rated Linux Software
Most Rated Linux Software
Linux Distributions
Linux Howtos
Quick Survey

Please take our survey and help us improve our website to serve you better.

Thank you.
Linux Software
Linux / IT Resources
Site Resources
Google
Privacy Policy
Contact Us
Submit Software
Advertising info