You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
external-docs/ExtremeSearch2000_series_De...

7.9 KiB

Setting up an Extreme Search™ 2000 Series Appliance (January 2024 Version)

Prerequisites

Bootable USB stick with Ubuntu 20.04 Server

A ubuntu-20.04.4-live-server-amd64.iso file: https://ubuntu.com/download/server is needed. The link will take you to the Ubuntu Server download page. You need to download version 20.04, not 22.04 or any newer version, so scroll down to find the "Ubuntu Server 20.04 LTS" section.

Next, a bootable USB stick is needed. For Windows, you can follow this tutorial: https://ubuntu.com/tutorials/create-a-usb-stick-on-windows#1-overview. For a different OS, there are many solutions on the internet found by searching 'create a bootable ubuntu flashdrive'.

Dell R750xa and 4 Hitek Kuona Cards

  • Physically install the 4 Kuona cards in the 4 GPU slots on the server. See the manufacturer instructions for installing GPUs and install the Kuona cards as if they were GPUs. You will need all the power cables necessary to install 4 GPUs.
  • On a Dell R750xa the Kuona cards go in slots 31-34. There is a "Left GPU Riser" and a "Right GPU Riser" located near the front of the server. 2 Kuona cards go in the "Left GPU Riser" and 2 Kuona cards go in the "Right GPU Riser".
  • Ensure that the minimunm fan speed for the R750xa is set to 100%. This can be done via iDRAC. See the manufacturer instructions for changing fan speed. The temperature sensors for the R750xa are not designed for Kuona cards, which is why the fan speed has to be manually set.

Install OS

Use the bootable flashdrive mentioned above. The 'Boot Manager' page is accessed by hitting F11 (or some other key as displayed on the screen) during bootup. Most of the setup options should be the default, but there are a few you have to be careful with:

  • Storage configuration: Make sure that the operating system is not going to be located on any of the drives set aside for search. All of the drives which have ~3.6 TB of space or so are set aside for search- do not choose one of those. There should be many of these. Each server should have a boot drive, generally around 500 GB or so, which should be chosen instead.

  • Profile setup: Unless specified otherwise, please use these options:

    Prompt Response
    Your name: Whoever is actually setting up the device, or some other identifier so that there is record of who set up the device.
    Your server's name: extremesearch
    Pick a username: lrl_admin
    Choose a password: lrl_admin
  • SSH Setup: Please check the "Install OpenSSH server" box.

From command line of Ubuntu

Download and install source code

  • mkdir LRL
  • cd LRL
  • (the next few lines change based on software version, they are current as of January 2024)
  • wget https://lewis-rhodes-labs.s3.amazonaws.com/2023.10/npusearch-install-scripts-2023.10.1-6_ubuntu_20.04_kuona.tar.gz
  • tar -xzf npusearch-install-scripts-2023.10.1-6_ubuntu_20.04_kuona.tar.gz
  • cd npusearch-install-scripts-2023.10.1-6_ubuntu_20.04_kuona
  • ./npusearch_setup_kuona.sh

The final output will be the license info. It should have a format such as this:

Name: extremesearch
Model: PowerEdge R750xa (SKU=NotProvided;ModelName=PowerEdge R750xa)
OS: 5.4.0-144-generic #161-Ubuntu SMP Fri Feb 3 14:49:04 UTC 2023
CPUs: [28 x Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz] [28 x Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz]
Memory: 64GB
Power: [2400W : PWR SPLY,2400W,RDNT,ARTESYN] [2400W : PWR SPLY,2400W,RDNT,ARTESYN]
NPUs: [8 x IMNN]
SSDs: [32 x 3.8TB Micron_7450_MTFDKBG3T8TFR]

Enet: ***PLEASE SELECT ONE OF THE FOLLOWING INTERFACES TO USE FOR THE NODE-LOCKED LICENSE HOST ID***
 #1 eno8303    b1:7b:25:e4:29:e2 linkup   NetXtreme BCM5720 2-port Gigabit Ethernet PCIe [14E4:165F]]
 #2 eno8403    b1:7b:25:e4:29:e3 linkdown NetXtreme BCM5720 2-port Gigabit Ethernet PCIe [14E4:165F]]
 #3 eno12399   69:05:ca:db:45:7e linkdown I350 Gigabit Network Connection [8086:1521]]
 #4 eno12409   69:05:ca:db:45:7f linkdown I350 Gigabit Network Connection [8086:1521]]
 #5 eno12419   69:05:ca:db:45:80 linkdown I350 Gigabit Network Connection [8086:1521]]
 #6 eno12429   69:05:ca:db:45:81 linkdown I350 Gigabit Network Connection [8086:1521]]
 #7 ens3f0     f9:f2:1e:e0:12:80 linkdown Ethernet Controller X710 for 10GbE SFP+ [8086:1572]]
 #8 ens3f1     f9:f2:1e:e0:12:81 linkdown Ethernet Controller X710 for 10GbE SFP+ [8086:1572]]

216325e71aa1e75919eec5a9cc7bb6a245e9e767a6b0ece5a1eac4ce2dcf4d77

Install license

  • Copy the license info output (example displayed above) and send to LRL (email support@lewis-rhodes.com) to generate a license.
  • Here is an example of what the license file will look like:
LICENSE lrl npusearchhtk 2024.11 permanent uncounted
  hostid=b17b25e429e2 issuer="Lewis Rhodes Labs" customer="EXAMPLE CUSTOMER" contract="EXAMPLE CONTRACT" disable=VM _ck=6f04fef967
  sig="60PG45390DW97KNMGJBHEHXN9CPQX80B6GHFXA822M085T9PBU7W4G3TJF0FUN8
  J4D6QQGU43NUG"
  • With a license obtained, sudo cp $LICENSE_FILE /opt/lrl/lib/npusearch/ or sudo vim /opt/lrl/lib/npusearch/$LICENSE_FILE and paste the license in.

Enable and start the NPUsearch system service:

  • Reboot or power cycle as instructed (can be done before or after installing license). Sometimes a power cycle does not completely remove power from the FPGAs or SSDs. Ensure that it has by inspecting their lights and physically unplug the server if the lights do not turn off. The 2000 series of the product only needs a warm reboot.
  • If changes to configuration file are needed, use sudo vim /opt/lrl/etc/npusearch.conf
  • sudo systemctl enable npusearch.service && sudo systemctl start npusearch.service
  • Wait ~40 seconds, then systemctl status npusearch.service. If it returns any results that indicate npusearch is not active, it is not set up correctly.

Here is an example of active status:

● npusearch.service - NPUSearch search backends
     Loaded: loaded (/etc/systemd/system/npusearch.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2023-03-09 01:33:53 UTC; 16h ago
   Main PID: 1223658 (bash)
      Tasks: 147 (limit: 76695)
     Memory: 273.8M
     CGroup: /system.slice/npusearch.service
             ├─1223658 bash /opt/lrl/lib/npusearch/npusearch-service.sh
             ├─1223675 python3 /opt/lrl/bin/npusearch_startup_kuona.py
             ├─1223676 tee /tmp/npusearch.log
             ├─1223716 /opt/lrl/lib/npusearch/kuona-serve-generic-shared
             ├─1223718 /opt/lrl/lib/npusearch/kuona-serve-generic-shared
             ├─1223720 /opt/lrl/lib/npusearch/kuona-serve-generic-shared
             ├─1223721 python3 /opt/lrl/lib/npusearch/glob_persistent.py
             ├─1223723 python3 /opt/lrl/lib/npusearch/glob_persistent.py
.
.
.

Write sample data and test NPUsearch

  • sudo ./run_all_tests.py -vv. The -vv sets verbose to max. If you do not want the script to print status info to the terminal, run without -vv. This script first checks the backends are up, so give it a few seconds to confirm that is working. Then it runs the backends for 48 hours (can be changed by passing an argument to -t). When finished it will output to terminal and the file npusearch_install.log whether all the tests passed or not. Run ./run_all_tests.py --help if you want help configuring the verbose/timing options.

Test Thermals

The scripts perpetual_test.py and collect_temp_data.py are included to check that the SSDs do not exceed temperature margins when Extreme Search is run under max load. You will need to install nvme-cli (sudo apt install nvme-cli) if it is not already installed. Run both scripts simultaneously. The printed output of collect_temp_data.py will show the temperatures and write the temperature data to ssd_nvme_smart_log_data.txt. With 100% fan speed in a properly cooled data center no problems are expected.