Updated:

This page contains notes from my various tests and experiments. It is a raw record of what I did, without correction for errors, or later update for things that I learn. Use at your own risk.

Jetbot ROS2 #6: ROS2 voice nodes

I want to make a node that will do voice recognition on a jetson. But first let me make a node that does voice detection. I can also compare that to the result from the respeaker mike.

2022-05-04

jetson_voice already has much of what I want. Let me experiment with that. That should also get me some experience running Docker-based ROS2 nodes on different systems.

Try to publish from the microphone on nano

I’m trying to get the asr node to run on jetson_voice_ros under docker using the launch file. This seems to start but dies:

docker run -it --rm --network=host --privileged jetson-voice:r32.7.1-ros-galactic ros2 launch jetson_voice_ros asr.launch.py input_device:="'ReSpeaker 4 Mic Array (UAC1.0): USB Audio (hw:2,0)'"

Note the name used for the device. Had to be wrapped in double quotes, otherwise yaml though it was a dictionary.

Getting FileNotFoundError: [Errno 2] No such file or directory: '/jetson-voice/data/networks/manifest.json' which I assume means there is an issue finding the data file. Checking, there indeed is no data file there.

I can make it work by starting with docker/run.sh --ros galactic --run bash then loading the command above. But I cannot seem to get a direct launch to work using:

docker/run.sh --ros galactic --run ros2 launch jetson_voice_ros asr.launch.py input_device:="'ReSpeaker 4 Mic Array (UAC1.0): USB Audio (hw:2,0)'"

I tried all sorts of things, but nothing seems to work. Finally this worked:

docker/run.sh --ros galactic --run "ros2 launch jetson_voice_ros asr.launch.py input_device:='11'"

Now try input/output.

docker/run.sh --ros galactic --run "ros2 launch jetson_voice_ros audio_playback.launch.py input_device:='11' output_device:='11'"

works, but it reads then plays back its own noise.

2022-05-05

I had trouble installing librosa which is used by audio. This is a well known issue in jetson. I’m going to follow recommendations from https://github.com/epicmario7133/jetson-nano-tricks/blob/main/README.md which they claim might take 30 hours! (Following has been slightly adapted by me).

git clone https://github.com/wjakob/tbb.git
cd tbb/build
cmake ..
make -j
sudo make install
sudo apt install -y llvm-10
export LLVM_CONFIG=/usr/bin/llvm-config-10
python3 -m pip install llvmlite
python3 -m pip install Cython
python3 -m pip install numba

Tried it once, not enough memory. Trying again with 8GB of swap. that stopped the compiler fail, but build of numba still fails.

I tried switching to root, that seemed to let compile proceed. But it failed because it tried to install a new numpy. wanted >= 1.15 but we have 1.13.3 installed. the version it wanted to install required python > 3.8

2022-05-06

Switching out of root, user kent sees numpy version 1.19.5 But trying to import that into python3 gives illegal instruction. Uninstall then install -U gives same result. Now trying to use pip to install 1.15.0. Worked. OK, how about numpy–1.19.4? Also worked. I’ll do that for root.

Nope, first retried bdbd2_jetbot/scripts/install.sh Now it works! Only thing I really changed was numpy. Let me add that line, then be done.

2022-05-07

I’ve been looking through various references, trying to figure out what should be default ways to approach audio.

In https://peps.python.org/pep-0594/ audioop is deprecated - see comment to use instead “NumPy-based projects to deal with audio processing”. Looks like the wave module though will survive, but all it does is read/write .wav files.

I see that librosa is also used in a patch to torch functional. That seems to be disabled in current versions, but generally librosa seems to be a reasonable choice.

As to formats, librosa seems to focus on float (their load command says ‘Load an audio file as a floating point time series.’)

If you have to resample in jetson_voice/utils/audio/AudioMicStream, then it converts the original int16 samples into float then leaves them that way. Not clear what happens to chunk size then. Let me investigate. Per http://librosa.org/doc/main/generated/librosa.resample.html#librosa.resample

I tested with this script:

# Testing of resampling

import librosa
import time

FILENAME='/usr/share/sounds/alsa/Front_Center.wav'
TYPES=('kaiser_best', 'kaiser_fast', 'linear')

samples, sr = librosa.load(FILENAME, sr=44100, mono=True)
print(f'sr: {sr} samples: {type(samples)} shape: {samples.shape} dtype: {samples.dtype}')

# resample timing test

new_sr = 16000
count = 10

for type in TYPES:
    start = time.time()
    for c in range(count):
        new_samples = librosa.resample(samples, orig_sr=sr, target_sr=new_sr, res_type=type)
    end = time.time()

    print(f'type {type} new_samples.shape {new_samples.shape} delta_time {(end-start)/count}')

On the nano, results were:

(main) $ python3 resample.py 
sr: 44100 samples: <class 'numpy.ndarray'> shape: (62976,) dtype: float32
type kaiser_best new_samples.shape (22849,) delta_time 0.18325407505035402
type kaiser_fast new_samples.shape (22849,) delta_time 0.046108794212341306
type linear new_samples.shape (22849,) delta_time 0.005861401557922363

To use ‘linear’ you have to make sure that samplerate is installed. For now, I think that kaiser_fast should be fine, with a latency of about 1/20th real time.

More work modifying audio.py to use callbacks. I’m learning that it makes sense to keep things in float if needed, so that you can do operations on it if needed.

As I experimented, I faced an issue of how to deal with different bitrates of the input and output stream. I used a wav file that I did not convert to a desired output format. Where do I do conversion? And what about chunk sizes? Do I buffer to keep at a desired value after bitrate conversion?

2022-05-10

Another observation: AudioInfo.msg has a format field claiming wav, mp3, etc. But IIRC jetson_voice did not support any of that. Checking the code, the ROS msg defines a ‘coding format’ which could be mp3 or wav, but it is never used in the demos.

I had trouble getting audacity to play through my headphones. I fixed it (I think) by restarting computer with headphones plugged in.

I created some simple voice files with different formats on audacity. I could play htem with:

aplay -D plughw:CARD=ArrayUAC10,DEV=0 1234_float32_16000.wav
aplay -D plughw:CARD=ArrayUAC10,DEV=0 1234_16bit.wav

I could not play the .mp3 or the .ogg files (got loud noise). Also I could not use the direct devices, it claimed :

(main) $ aplay -D hw:CARD=ArrayUAC10,DEV=0 1234_16bit.wav 
Playing WAVE '1234_16bit.wav' : Signed 16 bit Little Endian, Rate 44100 Hz, Mono
aplay: set_params:1299: Sample format non available
Available formats:
- S24_3LE

I tried to create a file that would work directly per this format. First try failed, channel count not supported. When I created it as stereo it worked:

aplay -D hw:CARD=ArrayUAC10,DEV=0 24bit_16000st.wav

The format that works is “s24le: signed 24-bit little-endian integer (note: ALSA calls this “S24_3LE”)” per https://www.freedesktop.org/wiki/Software/PulseAudio/Documentation/User/SupportedAudioFormats/

I wrote a short script to read a wav file. I learned that the build-in python module wave does not support a 32 bit float format! (There is a Pypi project PyWave that does, but it seems odd to require that to be installed).

I rewrote with scipy.io.wav That works, but when I read I get:

./show_wav.py:12: WavFileWarning: Chunk (non-data) not understood, skipping it.

Googling, that is because audacity adds extra metadata that scipy does not understand. Harmless warning.

scipy is not a standard dependency in https://github.com/ros/rosdistro/blob/master/rosdep/base.yaml but it is used in jetson_voice, so including it is probably OK.

Oops, apparently python packages are in https://github.com/ros/rosdistro/blob/master/rosdep/python.yaml and scipy IS there. So fine.

I see that soundfile is used by jetson_voice. I tried it, it works.

But apparently there is a well-known bug in writing ogg files.

Also note that librosa under the hood uses soundfile, so maybe I should try that instead.

2022-05-16

Back to work after a long weekend with grandkids. Where am I at?

I’m switching from running tests in bdbd2_jetbot/bdbd2_jetbot/libpy/audio.py to bdbd2_jetbot/test/test_audio.py. Just have to build the worksapce, and source it.

Restablish plan for documentation and testing

Previously I developed fqdemo, but now looking back I see that I no longer have any clue how to make it work. I want basic documentation and testing on bdbd2 but it is all missing.

What I am migrating toward is this. Each execution platform (e.g.robot, workstation, cloud) has its own github repo that contains stuff needed to build on that platform. Below under src, there will be checkouts of any repos that are needed for the build. Specific scripts will e provided at the base level for various operations like building, testing, doc generation, etc.

2022-05-17

I setup a script to run tests. Zillions of failures, mostly formatting types. It’s going to be a project to clean this up. Let me instead just run pytest directly on audio so that I do not get distracted today.