Updated:

This page contains notes from my various tests and experiments. It is a raw record of what I did, without correction for errors, or later update for things that I learn. Use at your own risk.

Jetbot ROS2 #5: Working with dusty-nv stuff

jetsonvoice

On the xavier, let me try to work with jetsonvoice.

I downloaded, tried to run the asr example, and got this:

docker/run.sh
examples/asr.py --wav data/audio/dusty.wav
... lots of output
[04/26/2022-12:49:16] [TRT] [V] *************** Autotuning Reformat: Float((* 64 (# 2 (SHAPE audio_signal))),1,64,64) -> Float((* 64 (# 2 (SHAPE audio_signal))),(# 2 (SHAPE audio_signal)),1,1) ***************
[04/26/2022-12:49:16] [TRT] [V] --------------- Timing Runner: Optimizer Reformat((Unnamed Layer* 5) [Shuffle]_output -> <out>) (Reformat)
Illegal instruction (core dumped)

Let me try on the nano

kent@nano:~/github/dusty-nv/jetson-voice…
(master) $ docker/run.sh 
ARCH:  aarch64
reading L4T version from /etc/nv_tegra_release
L4T BSP Version:  L4T R32.5.2
[sudo] password for kent: 
CONTAINER:     dustynv/jetson-voice:r32.5.0-ros-galactic
DEV_VOLUME:    
DATA_VOLUME:   --volume /home/kent/github/dusty-nv/jetson-voice/data:/jetson-voice/data
USER_VOLUME:   
USER_COMMAND:  
Unable to find image 'dustynv/jetson-voice:r32.5.0-ros-galactic' locally
docker: Error response from daemon: manifest for dustynv/jetson-voice:r32.5.0-ros-galactic not found: manifest unknown: manifest unknown.
See 'docker run --help'.

But note, this https://forums.developer.nvidia.com/t/illegal-instruction-core-dumped/165488 implies the problem is with numpy 1.19.5 (https://github.com/numpy/numpy/issues/18131) and can be fixed by installing an earlier version.

Or by OPENBLAS_CORETYPE=ARMV8 python3 -c "import numpy"

On nano I tried docker/build.sh galactic nvcr.io/nvidia/l4t-ml:r32.5.0-py3

This failed with:

[ 89%] Linking CXX shared library ../../../release/libarrow_cuda.so
/usr/lib/gcc/aarch64-linux-gnu/7/../../../aarch64-linux-gnu/libcuda.so: file not recognized: File truncated
collect2: error: ld returned 1 exit status

Trying to get xavier working, I made an rkent fork of jetson-voice and I will try to set that environment variable.

Tried to build, for some reason the link to nvcr.io in build.sh was commented out. Fixed, trying to build dockerfiles for xavier so I can see where I need to intervene to fix numpy.

Tried to build jetson-voice, getting errors building pyarrow. I think this is why they built 3.0.0 from source (which works) but transformers needs >= 5.0.0 I’ll try a newer pyarrow.

2022-04-27

OK pyarrow 5.0.0 worked, 7.0.0 failed (it does not support python 3.6). Now trying 6.0.1

I need to figure out what I really need to change in all of this. I need:

  • Support for l4t 32.5.2 which is missing in original
  • updated pyarrow as above.
  • correct base image for 32.7.1

These changes will be in rkent/jetson-voice. Now running compiles on both xavier (32.7.1) and nano (32.5.2) to see what works.


nano build failed, same error I have been seeing:

 90%] Linking CXX shared library ../../../release/libarrow_cuda.so
/usr/lib/gcc/aarch64-linux-gnu/7/../../../aarch64-linux-gnu/libcuda.so: file not recognized: File truncated
collect2: error: ld returned 1 exit status
src/arrow/gpu/CMakeFiles/arrow_cuda_shared.dir/build.make:81: recipe for target 'release/libarrow_cuda.so.600.1.0' failed

Why does xavier succeed but nano fail? Both have the same content at /usr/lib/gcc/aarch64-linux-gnu

Ahah. Actual file is /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1.1 This is length 0 on nano image nvcr.io/nvidia/l4t-ml:r32.5.0-py3, full size on Xavier. Also full size on actual nano.

What gives? https://forums.developer.nvidia.com/t/libcublas-file-size-is-0-in-jetson-docker-image/180676 shows that we need to set the default docker to use nvidia. I’d alredy one that on xavieer, so that is the difference.

To enable access to the CUDA compiler (nvcc) during docker build operations, add "default-runtime": "nvidia" to your /etc/docker/daemon.json configuration file before attempting to build the containers:

{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    },

    "default-runtime": "nvidia"
}

You will then want to restart the Docker service or reboot your system before proceeding.

Nano build failed, with lots of errors like:

pyarrow/lib.pxd:470:55: unknown type in template argument

Error compiling Cython file:
------------------------------------------------------------
...
    # extension classes are technically virtual in the C++ sense) we can expose
    # the arrow::io abstract file interfaces to other components throughout the
    # suite of Arrow C++ libraries
    cdef set_random_access_file(self, shared_ptr[CRandomAccessFile] handle)
    cdef set_input_stream(self, shared_ptr[CInputStream] handle)
    cdef set_output_stream(self, shared_ptr[COutputStream] handle)
                                                        ^

Maybe it is a version issue for gcc? Let me retry with pyarrow version 5.0.0 which is the minimum accepted.

Sorry, still failing with the same errors.

This https://github.com/google/jax/issues/375 suggests that these errors might be due to Cyhon versiion. Running manually in the Docker container, I did pip3 install -U Cython which installed a newer version. Now the compile is proceeding. The Cython that works is version 0.29.28

Back to GCP, I tried to create a GPU instance but that failed, claiming there were not enough resources in the region. Isn’t that the same thing I got before? I tried again with a different region, same results.

xavier build of jetson-voice now complete, works really well.

Back to nano: apparently the version of torchtext must by tied to pytorch. The pytorch value was changed for later jetpack, and the current jetson-voice only works with that. I am adding that option in the files.

More issues: rapidfuzz which is a nemo prereq requires some additional modules to build. Installed these in Dockerfile. But I need to change the dockerfile so that it will work on multiple L4T versions.

Issue is nemo 1.0.0rc1 is the last to support the version of torch we are using. Some patches are for that - but those are not going to be compatible with later versions of nemo, so we need to improve the scripting of all of this.

BUT Riva documentationsays it needs at least a xavier NX and Jetpack 4.6 to work. Not sure what will happen on the Nano.

OK it builds. I run the jetson-voice tests and many of them fail, though when I try things manually they seem to work. Not sure what to make of that.

2022-04-30

I’d like to use the same versions of things on the nano as on the xavier if possible. What was that nano issue again that required a downgrade? Form 2022-03-07. I have “Checking through issues, it may be that jetson inference is not compatible with tensorrt 8. To downgrade to tensorrt7”. Where are we on that now?

Checking current jetson-inference, it claims to be compatible now with Jetpack 7.1. So I should be able to reflash the nano to run Jetpack 7.1 just like the current xavier. Let me do that, rather than try to keep two difference versions of things working. Correction: I need to run Jetpack 4.6.1, which is L4T32.7.1 (just to keep things confusing).

Failed to get SDK manager working. Trying again with +5v power connected directly. I tried the4 download only step, download is proceeding.

Understanding the asr code

While I am trying to load new OS on the nano, I’m trying to understand the asr (that is stt speech to text) in jetson_voice.

examples/asr.py gets results from asr(samples). asr is ASR(args.model) which is ASR(‘quartznet’).

ASR returns simple load_resource(…)

load_resource generates a config which is an instance of ConfigDict

_default_global_config default_backend’ : ‘tensorrt’

config.backend is ‘tensorrt’

factory_map[config.backend] = ‘jetson_voice.models.asr.ASREngine’

I thin what that mean is the class_type becomes the same as:

import jetson_voice.models.asr.ASREngine

so load_resource is returning an instance of ASREngine passing in a ConfigDict

2022-05-02

Back to the nano jetpack install: I seem to have installed Jetpack 4.6.2 (L4T 32.7.2) but xavier (and dustynv support) is 4.6.1. But I know how to do it now, so let me try again. I’ll make sure I document how to do it.

Run sdk manager on an Ubuntu 18 system (It claims it supports ubuntu 20 now, but it lies - that is only for the target, not the host.)

Start with the jetson nano running a later version of jetpack (like 4.6.1). Connect to ethernet.

Make sure that nano is powered from a 5V supply directly. Also connect the usb on the nano to the sdk manager system.

On Step 3 “SDK Manager is about to flash your Jetson Nano module” choose “Automatic setup (develop kit version) OEN config of “Runtime”. (It is possible that pre-config will work, but I have had good luck using runtime and then switching to ethernet for the latter install phases.)

setup of nano.

Needed to do these things:

  • copy .ssh directory from uburog
  • setup DRYRAIN wifi with 5mhz band
  • sudo apt upgrade
  • sudo apt install python-pip
  • sudo pip install jetson-stats
  • sudo apt install nano htop
  • Set “default-runtime”: “nvidia” (see above)
  • sudo usermod -aG docker kent
  • in file /etc/systemd/nvzramconfig.sh change the line mem=$((("${totalmem}" / 2 / "${NRDEVICES}") * 1024)) to mem=$((("${totalmem}" / "${NRDEVICES}") * 1024))

Now I am trying to build rkent/jetson-voice on this.

Back to xavier: I installed dusty-nv/jetson-inference and I’m following instructions to build https://github.com/dusty-nv/jetson-inference/blob/master/docs/building-repo-2.md

That built fine on xavier with jetpack 4.6.1


In summary, I have successfully built ros2 on nano, and successfully build jetson-inference on xavier. I checked in the ros2 build script in bdbd2_core.