
# Extreme Search™ Quick Start
Version 2023.03
[support@lewis-rhodes.com](mailto:support@lewis-rhodes.com)
---
## Contents
1. Overview
2. Adding data
3. Extreme Search™ software architecture
4. "Hello World" example
5. Supported (regular) expressions
6. Frequently Asked Questions
7. Support
---
### 1. Overview
**Note**: Previous versions of this document used the term "neurons". That terminology is replaced with "processing element" (PE) in the current documentation.
Extreme Search™ is a computational storage appliance that enables fast, fixed-throughput regular expression based search of files. The unique capability that Extreme Search™ provides is that search time is independent of expression complexity or number. NPU*search* is a specific implementation of a neuromorphic processor, invented by LRL, and optimized for search. NPU*search* IP allows all files on an appliance to be searched within 12 or 25 minutes (depending on model). Extreme Search™ functionality is exposed through a Python library; users submit a list of regular expressions to search for and a list of files to search. Extreme Search™ returns information about which files matched which expressions.
### 2. Adding data
Searches execute against stored data. The Extreme Search™ appliance comes with the option to store data under GlusterFS, an open source distributed file system that scales to multiple petabytes of storage. A user interacts with Gluster in the same way as any normal file system. LRL preconfigures each appliance with a 1-node, multi-brick (exact number depends on model) Gluster volume named `gv0`. This volume is auto-mounted at `/mnt/gv0` at boot time. Gluster need not be used, however, to maximize the performance of Extreme Search™ the data must be evenly distributed across the SSDs.
### 3. Extreme Search™ software architecture
The search capabilities of Extreme Search™ are enabled with 3 interacting software components:
1. **Backend**: Each backend is a NPU*search* processer which executes search on locally residing SSDs. Each Extreme Search™ appliance runs multiple backends (exact number depends on model).
2. **Redis**: Redis mediates all communication between front end(s) and backends.
3. **Client Access (Frontend)**: The frontend exposes search functions, parses search requests, distributes work to backends, and returns results to user. Access is provided by the `npusearch` python package.
A software license is required for Extreme Search™ to function. The license file should be in `/opt/lrl/lib/npusearch/`.
Details about where the redis server is, what the Gluster hosts are, where logs are stored, etc. can be configured in the backend configuration file located at `/opt/lrl/etc/npusearch.conf`.
### 4. "Hello World" example
From an iPython prompt, here's a simple search followed by a slightly more in-depth analysis on the matching file(s). (*Note: This example assumes Gluster is running on host `server01`, Redis is running on `localhost`, registry tag is `server01`, the gluster volume is named `gv0`, and the volume is mounted at `/mnt/gv0`. See the python documentation included with `npusearch` for more detail.*)
```python
In [1]: import re, time, glob, os, npusearch
In [2]: client = npusearch.NPUGlusterClient(gluster_hosts=['server01'], gluster_volname='gv0')
In [3]: all_files = glob.glob('/mnt/gv0/wiki/*.wiki')
In [4]: len(all_files)
Out[4]: 2304
In [5]: totalbytes = sum(os.path.getsize(p) for p in all_files)
...: totaldata = totalbytes/10**9 #convert to GB
...: totaldata #in GB
Out[5]: 2473.034409005
In [6]: t0 = time.perf_counter()
...: npu_res = client.scan([r'(?i)celebratory.*neuroscience'], 'wiki/*.wiki')
...: runtime = time.perf_counter()-t0
...: runtime
Out[6]: 43.677002266049385
In [7]: totaldata/runtime #search throughput (GB/s)
Out[7]: 56.62097398399791
In [8]: npu_res
Out[8]:
{'pe_usage': 8,
'matches': [{'matches': [0], #indicates the index/indicies of the expression(s) that matched
'overflow': False,
'path': 'wiki/00000026.wiki',
'brick': 'server01:/mnt/npusearch_8/gv0/wiki/00000026.wiki'}],
'errors': [],
'exceptions': {}}
In [9]: with open('/mnt/gv0/wiki/00000026.wiki', 'rb') as fb:
...: data = fb.read().decode('latin_1')
...: re_res = re.search(r'(?i)celebratory.*neuroscience', data)
...: re_res
Out[9]: ) or ripgrep () in functionality, but faster and constant speed.
#### Functionality
**Q1. My scan returns with `'pe_count': 0`. What happened?**
**A1.** The backends may be powered off. Try running `client.check()` in your python. If an empty list is returned, then the backends are powered off. To power the backends back on, run `sudo systemctl restart npusearch.service`. To confirm that the backends are powered on, run `client.check()`, and confirm that the returned list has all of the backends. Alternatively, running `npusearch_check` from the command line returns the same result as `client.check()` in python.
**Q2. My backends will not power on. What do I do?**
**A2.** Confirm that your license is present and covers your current release version. To inspect your release version, inspect `/opt/lrl/VERSION` and see what `LICENSE_VERSION=` states. To inspect what versions your license covers, inspect your `.lic` file in `/opt/lrl/lib/npusearch/` and see what the date is. If your release version date is later then the date in the license file you need either a new license or to install an older version of Extreme Search™, to make your backends work.
If the license is correct, try rebooting your server. If that is not possible or the problem still persists, contact LRL at [support@lewis-rhodes.com](mailto:support@lewis-rhodes.com).
**Q3. What does `scan()` return other than a list of matching files/expressions? Does it return number of matches, matching text, or match location?**
**A3.** No; unlike some regular expression search tools, `scan()` is really simple - it tells you which files, in the list you pass it, matched which of the expressions you pass it. That's it. It doesn't tell you what the matching text was, where it matched, or how many matches there were. This is why we describe the typical use case as a "prefilter" - we reduce a customer's dataset to relevant files, and then tell them to run the analysis they were already going run (to find matching text, etc) on the reduced dataset. This is also why, earlier in this guide, we show a `scan()` search on the wiki data followed by a Python `re.search()` call on the single file that matches to get the matching text, offsets, etc. Replace `re.search()` with whatever tool or tools are relevant for your needs.
**Q4. What factors influence the speed of `scan()`? If I pass in many expressions, will `scan()` run slower? Will `scan()` always take 12-25 minutes to run?**
**A4.** The speed of `scan()` is asymptotically linear in file size and is independent of expression complexity or number as long as the expression fits on the device. By default, compiled expressions must fit within 300 PEs. Note in the example above that `'pe_usage': 8` indicates only 8 of the 300 PEs were used. The data passes through all 300 PEs, even if only 8 are used, so search time will not change depending on how many of the 300 PEs are used. Each backend searches the data which is local to its corresponding SSD(s) in asymptotically linear time corresponding to how much data is searched, and results are returned when each backend is finished. The speed of `scan()` is thus the speed of the slowest backend, so to minimize runtime the data must be as evenly distributed over the SSDs as possible. The 12-25 minute number is the time needed to search all of the data on all of the SSDs when all of the SSDs are complelety full of data; searching a small fraction of the data can be as fast as 200 milliseconds.
**Q5. Does NPU*search* support compression/decompression?**
**A5.** No, it currently does not, since decompression requires passing the files through the CPU. All of the search functionality is done in the NPU, no data touches the CPU, which is what enables Extreme Search™ to run so fast.
**Q6. Does NPU*search* support encryption/decryption?**
**A6.** No, it currently does not. That being said, Samsung's SmartSSDs, which are included in one of the Extreme Search™ models, support self-encrypting drive (SED) functionality that can be transparently enabled so all data at rest is encrypted.
**Q7. Is NPU*search* applicable to field x?**
**A7.** NPU*search* is applicable to any field or any dataset where the granularity of the data is the file, and where the data within the file is amenable to regular expression based search. In the field of cyber forensics, we expect NPU*search* to excel at searching plaintext log files (like Bro/Zeek logs). It may also be adequate at searching PCAP network captures, though with the caveat that it doesn't do packet parsing, TCP stream reassembly, or gzip decompression. In the case where images are a primary form of data, NPU*search* may be excellent at searching the metadata associated with each image.
### 7. Support
Please submit questions, comments, and/or issues to LRL at [support@lewis-rhodes.com](mailto:support@lewis-rhodes.com).