You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
110 lines
7.4 KiB
Markdown
110 lines
7.4 KiB
Markdown
# Setting up an Extreme Search™ 2000 Series Appliance (January 2024 Version)
|
|
## Prerequisites
|
|
### Bootable USB stick with Ubuntu 20.04 Server
|
|
A `ubuntu-20.04.4-live-server-amd64.iso` file: [https://ubuntu.com/download/server](https://ubuntu.com/download/server) is needed.
|
|
|
|
Next, a bootable USB stick is needed. For Windows, you can follow this tutorial: [https://ubuntu.com/tutorials/create-a-usb-stick-on-windows#1-overview](https://ubuntu.com/tutorials/create-a-usb-stick-on-windows#1-overview). For a different OS, there are many solutions on the internet found by searching 'create a bootable ubuntu flashdrive'.
|
|
|
|
### Dell R750xa and 4 Hitek Kuona Cards
|
|
|
|
- Physically install the 4 Kuona cards in the 4 GPU slots on the server. See the manufacturer instructions for installing GPUs and install the Kuona cards as if they were GPUs. You will need all the power cables necessary to install 4 GPUs.
|
|
- On a Dell R750xa the Kuona cards go in slots 31-34. There is a "Left GPU Riser" and a "Right GPU Riser" located near the front of the server. 2 Kuona cards go in the "Left GPU Riser" and 2 Kuona cards go in the "Right GPU Riser".
|
|
- Ensure that the minimunm fan speed for the R750xa is set to 100%. This can be done via iDRAC. See the manufacturer instructions for changing fan speed. The temperature sensors for the R750xa are not designed for Kuona cards, which is why the fan speed has to be manually set.
|
|
|
|
## Install OS
|
|
Use the bootable flashdrive mentioned above. The 'Boot Manager' page is accessed by hitting F11 (or some other key as displayed on the screen) during bootup. Most of the setup options should be the default, but there are a few you have to be careful with:
|
|
|
|
- Storage configuration: Make sure that the operating system is not going to be located on any of the drives set aside for search. All of the drives which have ~3.6 TB of space or so are set aside for search- do not choose one of those. There should be many of these. Each server should have a boot drive, generally around 500 GB or so, which should be chosen instead.
|
|
- Profile setup: Unless specified otherwise, please use these options:
|
|
| Prompt | Response |
|
|
|---|---|
|
|
| Your name: | Whoever is actually setting up the device, or some other identifier so that there is record of who set up the device. |
|
|
| Your server's name: | temp_hostname |
|
|
| Pick a username: | temp_user |
|
|
| Choose a password: | Please generate a random string of characters, preferably at least 8 characters long. Please make note of it so the customer can log on... |
|
|
|
|
- SSH Setup: Please check the "Install OpenSSH server" box.
|
|
|
|
## From command line of Ubuntu
|
|
|
|
### Download and install source code
|
|
- `mkdir LRL`
|
|
- `cd LRL`
|
|
- (the next few lines change based on software version, they are current as of January 2024)
|
|
- `wget https://lewis-rhodes-labs.s3.amazonaws.com/2023.10/npusearch-install-scripts-2023.10.1-6_ubuntu_20.04_kuona.tar.gz`
|
|
- `tar -xzf npusearch-install-scripts-2023.10.1-6_ubuntu_20.04_kuona.tar.gz`
|
|
- `cd npusearch-install-scripts-2023.10.1-6_ubuntu_20.04_kuona`
|
|
- `./npusearch_setup_kuona.sh`
|
|
|
|
The final output will be the license info. It should have a format such as this:
|
|
|
|
```bash
|
|
Name: rambutan
|
|
Model: PowerEdge R750xa (SKU=NotProvided;ModelName=PowerEdge R750xa)
|
|
OS: 5.4.0-144-generic #161-Ubuntu SMP Fri Feb 3 14:49:04 UTC 2023
|
|
CPUs: [28 x Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz] [28 x Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz]
|
|
Memory: 64GB
|
|
Power: [2400W : PWR SPLY,2400W,RDNT,ARTESYN] [2400W : PWR SPLY,2400W,RDNT,ARTESYN]
|
|
NPUs: [8 x IMNN]
|
|
SSDs: [32 x 3.8TB Micron_7450_MTFDKBG3T8TFR]
|
|
|
|
Enet: ***PLEASE SELECT ONE OF THE FOLLOWING INTERFACES TO USE FOR THE NODE-LOCKED LICENSE HOST ID***
|
|
#1 eno8303 b1:7b:25:e4:29:e2 linkup NetXtreme BCM5720 2-port Gigabit Ethernet PCIe [14E4:165F]]
|
|
#2 eno8403 b1:7b:25:e4:29:e3 linkdown NetXtreme BCM5720 2-port Gigabit Ethernet PCIe [14E4:165F]]
|
|
#3 eno12399 69:05:ca:db:45:7e linkdown I350 Gigabit Network Connection [8086:1521]]
|
|
#4 eno12409 69:05:ca:db:45:7f linkdown I350 Gigabit Network Connection [8086:1521]]
|
|
#5 eno12419 69:05:ca:db:45:80 linkdown I350 Gigabit Network Connection [8086:1521]]
|
|
#6 eno12429 69:05:ca:db:45:81 linkdown I350 Gigabit Network Connection [8086:1521]]
|
|
#7 ens3f0 f9:f2:1e:e0:12:80 linkdown Ethernet Controller X710 for 10GbE SFP+ [8086:1572]]
|
|
#8 ens3f1 f9:f2:1e:e0:12:81 linkdown Ethernet Controller X710 for 10GbE SFP+ [8086:1572]]
|
|
|
|
216325e71aa1e75919eec5a9cc7bb6a245e9e767a6b0ece5a1eac4ce2dcf4d77
|
|
```
|
|
|
|
### Install license
|
|
- Copy the license info output (example displayed above) and send to LRL (email [support@lewis-rhodes.com](support@lewis-rhodes.com)) to generate a license.
|
|
- Here is an example of what the license file will look like:
|
|
```
|
|
LICENSE lrl npusearchhtk 2024.11 permanent uncounted
|
|
hostid=b17b25e429e2 issuer="Lewis Rhodes Labs" customer="EXAMPLE CUSTOMER" contract="EXAMPLE CONTRACT" disable=VM _ck=6f04fef967
|
|
sig="60PG45390DW97KNMGJBHEHXN9CPQX80B6GHFXA822M085T9PBU7W4G3TJF0FUN8
|
|
J4D6QQGU43NUG"
|
|
```
|
|
- With a license obtained, `sudo cp $LICENSE_FILE /opt/lrl/lib/npusearch/` or `sudo vim /opt/lrl/lib/npusearch/$LICENSE_FILE` and paste the license in.
|
|
|
|
### Enable and start the NPUsearch system service:
|
|
- Reboot or power cycle as instructed (can be done before or after installing license). Sometimes a power cycle does not completely remove power from the FPGAs or SSDs. Ensure that it has by inspecting their lights and physically unplug the server if the lights do not turn off. The 2000 series of the product only needs a warm reboot.
|
|
- If changes to configuration file are needed, use `sudo vim /opt/lrl/etc/npusearch.conf`
|
|
- `sudo systemctl enable npusearch.service && sudo systemctl start npusearch.service`
|
|
- Wait ~40 seconds, then `systemctl status npusearch.service`. If it returns any results that indicate npusearch is not active, it is not set up correctly.
|
|
|
|
Here is an example of active status:
|
|
```bash
|
|
● npusearch.service - NPUSearch search backends
|
|
Loaded: loaded (/etc/systemd/system/npusearch.service; enabled; vendor preset: enabled)
|
|
Active: active (running) since Thu 2023-03-09 01:33:53 UTC; 16h ago
|
|
Main PID: 1223658 (bash)
|
|
Tasks: 147 (limit: 76695)
|
|
Memory: 273.8M
|
|
CGroup: /system.slice/npusearch.service
|
|
├─1223658 bash /opt/lrl/lib/npusearch/npusearch-service.sh
|
|
├─1223675 python3 /opt/lrl/bin/npusearch_startup_kuona.py
|
|
├─1223676 tee /tmp/npusearch.log
|
|
├─1223716 /opt/lrl/lib/npusearch/kuona-serve-generic-shared
|
|
├─1223718 /opt/lrl/lib/npusearch/kuona-serve-generic-shared
|
|
├─1223720 /opt/lrl/lib/npusearch/kuona-serve-generic-shared
|
|
├─1223721 python3 /opt/lrl/lib/npusearch/glob_persistent.py
|
|
├─1223723 python3 /opt/lrl/lib/npusearch/glob_persistent.py
|
|
.
|
|
.
|
|
.
|
|
```
|
|
|
|
### Write sample data and test NPUsearch
|
|
- `./write_big_data.py`
|
|
- `./run_npusearch_test.py`
|
|
|
|
### Test Thermals
|
|
|
|
The scripts `perpetual_test.py` and `collect_temp_data.py` are included to check that the SSDs do not exceed temperature margins when Extreme Search is run under max load. You will need to install `nvme-cli` (`sudo apt install nvme-cli`) if it is not already installed. Run both scripts simultaneously. The printed output of `collect_temp_data.py` will show the temperatures and write the temperature data to `ssd_nvme_smart_log_data.txt`. With 100% fan speed in a properly cooled data center no problems are expected.
|