Home | Fedora Core 4 Tutorial | Linux Tutorials | Linux Games | Linux Java | Linux Kernal | Linux Firewall | Linux Database | Linux Distributions | Linux Firewall GUI | Linux Distributions | Linux Firewall

 


 

Search Host

Monthly Fee($)
Disk Space (MB)
Register With us for Newsletter!
Visit Forum! Post Questions!
Jobs At RoseIndia.net!

Have tutorials?
Add your tutorial to our Java Resource and get tons of hits.

We offer free hosting for your tutorials. and exposure for thousands of readers. drop a mail
roseindia_net@yahoo.com
 
   

Tutorials

Java Server Pages

JAXB

Java Beans

JDBC

MySQL

Java Servlets

Struts

Bioinformatics

Java Code Examples

Interview Questions

 
Join For Newsletter

Powered by groups.yahoo.com
Visit Group! Post Questions!

Web Promotion

Web Submission

Submit Sites

Manual Submission?

Web Promotion Guide

Hosting Companies

Web Hosting Guide

Web Hosting

Linux

Beginner Guide to Linux Server

Linux Distribution

Major Linux Distribution

Linux FTP Software

Frameworks

Persistence Framework

Web Frameworks

Free EAI Tools

Web Servers

Aspect Oriented Programming

Free Proxy Servers

Softwares

Adware & Spyware Remover

Open Source Softwares

6. Inside Speech Recognition

6.1. How Recognizers Work

Recognition systems can be broken down into two main types. Pattern Recognition systems compare patterns to known/trained patterns to determine a match. Acoustic Phonetic systems use knowledge of the human body (speech production, and hearing) to compare speech features (phonetics such as vowel sounds). Most modern systems focus on the pattern recognition approach because it combines nicely with current computing techniques and tends to have higher accuracy.

Most recognizers can be broken down into the following steps:

  1. Audio recording and Utterance detection

  2. Pre-Filtering (pre-emphasis, normalization, banding, etc.)

  3. Framing and Windowing (chopping the data into a usable format)

  4. Filtering (further filtering of each window/frame/freq. band)

  5. Comparison and Matching (recognizing the utterance)

  6. Action (Perform function associated with the recognized pattern)

Although each step seems simple, each one can involve a multitude of different (and sometimes completely opposite) techniques.

(1) Audio/Utterance Recording: can be accomplished in a number of ways. Starting points can be found by comparing ambient audio levels (acoustic energy in some cases) with the sample just recorded. Endpoint detection is harder because speakers tend to leave "artifacts" including breathing/sighing,teeth chatters, and echoes.

(2) Pre-Filtering: is accomplished in a variety of ways, depending on other features of the recognition system. The most common methods are the "Bank-of-Filters" method which utilizes a series of audio filters to prepare the sample, and the Linear Predictive Coding method which uses a prediction function to calculate differences (errors). Different forms of spectral analysis are also used.

(3) Framing/Windowing involves separating the sample data into specific sizes. This is often rolled into step 2 or step 4. This step also involves preparing the sample boundaries for analysis (removing edge clicks, etc.)

(4) Additional Filtering is not always present. It is the final preparation for each window before comparison and matching. Often this consists of time alignment and normalization.

There are a huge number of techniques available for (5), Comparison and Matching. Most involve comparing the current window with known samples. There are methods that use Hidden Markov Models (HMM), frequency analysis, differential analysis, linear algebra techniques/shortcuts, spectral distortion, and time distortion methods. All these methods are used to generate a probability and accuracy match.

(6) Actions can be just about anything the developer wants. *GRIN*

6.2. Digital Audio Basics

Audio is inherently an analog phenomenon. Recording a digital sample is done by converting the analog signal from the microphone to an digital signal through the A/D converter in the sound card. When a microphone is operating, sound waves vibrate the magnetic element in the microphone, causing an electrical current to the sound card (think of a speaker working in reverse). Basically, the A/D converter records the value of the electrical voltage at specific intervals.

There are two important factors during this process. First is the "sample rate", or how often to record the voltage values. Second, is the "bits per sample", or how accurate the value is recorded. A third item is the number of channels (mono or stereo), but for most ASR applications mono is sufficient. Most applications use pre-set values for these parameters and user's shouldn't change them unless the documentation suggests it. Developers should experiment with different values to determine what works best with their algorithms.

So what is a good sample rate for ASR? Because speech is relatively low bandwidth (mostly between 100Hz-8kHz), 8000 samples/sec (8kHz) is sufficient for most basic ASR. But, some people prefer 16000 samples/sec (16kHz) because it provides more accurate high frequency information. If you have the processing power, use 16kHz. For most ASR applications, sampling rates higher than about 22kHz is a waste.

And what is a good value for "bits per sample"? 8 bits per sample will record values between 0 and 255, which means that the position of the microphone element is in one of 256 positions. 16 bits per sample divides the element position into 65536 possible values. Similar to sample rate, if you have enough processing power and memory, go with 16 bits per sample. For comparison, an audio Compact Disc is encoded with 16 bits per sample at about 44kHz.

The encoding format used should be simple - linear signed or unsigned. Using a U-Law/A-Law algorithm or some other compression scheme is usually not worth it, as it will cost you in computing power, and not gain you much.

Search Tutorials

Linux Distributions

Fedora

Slackware
SuSe
Mandrake
Knoppix
Mepis
Debian
All Distors....
 

 

 

Send your comments, Suggestions or Queries regarding this site at roseindia_net@yahoo.com.

Copyright © 2004. All rights reserved.