Steves Computer Vision Blog: PiAUISuite Update and Voicecommand v3.1

Voicecommand allows you to control your raspberry pi using only your voice. More information and videos on this can be found on my YouTube channel or the original blog posts.

I've made a couple of key changes and their is a new option for people who want to help with the sox implementation to make the speech recognition more continuous rather than chunk-based.

The bug in the voicecommand -s hardware has been fixed.
Allows multilingual support with !lang and !language
Fixed casing bug when matching multiple variables
Install, Uninstall, and Update scripts are now seperated by project. So now if you want to only update youtube, just run UpdateAUISuite.sh youtube
tts and tts-nofill have been combined.
Moving away from yt.js to browse youtube in the browser. Now adding node.js youtube browsing API. See https://github.com/StevenHickson/RaspberryPiTV
Building https://github.com/StevenHickson/RaspberryPiTV to work with voicecommand and adding omxcontrols using https://github.com/StevenHickson/omxplayer_fifo
With the above, this allows a control panel that can control videos, play pandora, browse youtube, control music, and run voicecommand. Note that this is in beta and will require a lot of manual installation as their is no installation or readme yet (Hopefully soon to come).
Added youtube-dl cron update so that youtube-dl updates automatically every night. Often if someone says the youtube script doesn't work, it is because youtube-dl is out of date and YouTube has updated their security algorithms. Running sudo youtube-dl -U often fixes this problem.
Added an option in speech-recog.sh to use sox instead of arecord. Simply uncomment out the sox portion and comment the arecord portion in /usr/bin/speech-recog.sh as below:


sox -r 16000 -t alsa $hardware /dev/shm/out.flac silence 1 0.3 1% 1 0.5 1%

wget -q -U "rate=16000" -O - --post-file /dev/shm/out.flac --header="Content-Type: audio/x-flac; rate=16000" "http://www.google.com/speech-api/v1/recognize?lang=en&client=Mozilla/5.0" | sed -e 's/[{}]/''/g'| awk -v k="text" '{n=split($0,a,","); for (i=1; i<=n; i++) print a[i]; exit }' | awk -F: 'NR==3 { print $3; exit }'

#arecord -D $hardware -f cd -t wav -d $duration -r 16000 | flac - -f --best --sample-rate 16000 -o /dev/shm/out.flac 1>/dev/shm/voice.log 2>/dev/shm/voice.log; wget -O - -o /dev/null --post-file /dev/shm/out.flac --header="Content-Type: audio/x-flac; rate=16000" http://www.google.com/speech-api/v1/recognize?lang="$lang" | sed -e 's/[{}]/''/g'| awk -v k="text" '{n=split($0,a,","); for (i=1; i<=n; i++) print a[i]; exit }' | awk -F: 'NR==3 { print $3; exit }'

rm /dev/shm/out.flac



Please let me know how this works for people so I can debug and get this working permanently. 
As always, you can find the install, update, and new YouTube videos at my YouTube channel here:

https://www.youtube.com/channel/UCxa9JQjCl8ij_7za1_sRCVQ/videos





If you are wondering why I've been so quiet, it's because I moved, started grad school at Georgia Tech, and have been doing a technical review for a computer vision book.




Since I'm a poor graduate student, please support my tinkering:

at Yescom USA. Valid until October 2013! Retractable Banner Stand at Yescom USA . Valid until October 2013!

Steves Computer Vision Blog

Monday, September 16, 2013

PiAUISuite Update and Voicecommand v3.1

No comments:

Post a Comment