Mobile Speech Recognition

A project of the Robotics 2020 class of the School of Information Science and Technology (SIST) of ShanghaiTech University. Course Instructor: Prof. Sören Schwertfeger.

Tianyi Zhang, Yixiao Feng, Yuezi Chen

System description

We use the baidu_speech package to realize the TTS and voice recognition. After the voice recognition is completed, we want the robot to execute the task smoothly which means we need to implement the state machine using FlexBe. The first problem we met is that we need to implement the voice recognition in ROS. Because the environment and the version of the ROS we tried different package and finally we decide to choose the Baidu_speech package. Baidu_speech provides TTS and voice recognition which means we can make the robot listen and speak. Then we need the robot to understand the commands to do the corresponding work.

After the voice recognition, we receive the sentence in English, we parse the commands into 3 different types which are action, place, time. The action and the place is necessary, we store the different words into the different dictionaries, then we can easily find out the different word's type. After we parse the commands and tag them, we can implement the commands in FlexBe just like the fetch operation in homework 3.

The pipeline of our project:

System evaluation

The project is to test the robot by carrying out simple conversation and execute some easy missions. Also, we are to test the reaction of the robot to the ambiguous orders and how fast it can react.

To test our robot, we have designed:

  • Three positions: office, kitchen, lab
  • Two actions: go, fetch
  • Three responses: hi/hello, what's your name, say a joke
  • One object: can

Normal speech command is given like: go (to the) office / fetch (the) can  in (the) lab (for me)

Not completed speech command is given like: go (where)  / fetch (what)  in (the) lab (for me)

Speech command with unknown key-words is given like: fetch (an) apple in (the) lab (for me)

We test how robot react to these three types of commands. Here are the results we expect:

  • For normal commands: execute commands correctly;
  • For not completed speech commands: ask for more information until the robot have collected all information needed. Then execute commands correctly;
  • For commands with unknown key-words: We need to implement the corresponding commands manually and the robot cannot learn the commands individually.

The robot: