Tech

DOE Develops ‘Superfast’ Search Engine

June 6, 2011

To better help them sift through gigantic scientific databases, computer scientists at the Department of Energy’s Lawrence Berkeley National Laboratory have developed an open source solution called FastBit that is 10 to 100 times faster than its commercial counterparts, depending on the type of searching task.

Jon Bashor of the LBL explains:

This kind of analysis calls for an approach fundamentally different from that of an Internet search engine or a typical commercial database.

Google and Yahoo! have developed massive infrastructures to make keyword searches quick and easy. However, their search engine techniques are not appropriate for analysis tasks where most of the data records are represented by numerical values instead of text, and most search operations require full and complete answers instead of some “Top 10” records.

Commercial database systems, meanwhile, are generally designed to manage a relatively small number of searchable attributes. For example, a banking application might only be able to search for accounts based on account number and customer name. In addition, commercial database management systems are normally designed to locate an individual record (or a very small number of records) efficiently, while most scientific data analysis tasks require a much larger number of records. Furthermore, scientific data analysis requires flexibility: researchers will generally wish to examine many different scenarios or combinations of conditions and attributes.

So how did they go about fixing the problem? Bashor explains:

FastBit organizes data into formats known as “Bitmap indices.” Bitmap indices translate variable values into strings of bits, or 1’s and 0’s. Bitmap indices tend to be very efficient because computer processors are optimized to perform so-called logical operations on bits. Typically, however, bitmap indices have been used where variables have what is called low “cardinality”—that is, a limited number of possible values. Examples would include the gender or the state or zip code of the customer in the database; there are only so many genders, states, or even zip codes.

Scientific data, by contrast, typically has an enormous range of values, so further techniques for developing the index were needed. These included an alternative method for partitioning or organizing the data; innovative ways of encoding the indices; and a revolutionary patented data compression system.

Typically, commercial database programs partition or split up data by record or groups of records. A record might include the following variables: customer name, address, phone, account balance, and date and amount of last payment. That is known as “horizontal” partitioning. (Imagine the record as a horizontal row in a spreadsheet with customer name followed by the rest of the variables.) By contrast, in the STAR application, there are billions of events (records) stored, each of which has multiple variables. But searches are usually looking for just a few variables. It would be enormously time-consuming to call up billions of whole events or records with all their variables when searching for just a few variables. So rather than partition the data by events or records, FastBit partitions data by variable—so-called “vertical” partitioning. This cuts down enormously on memory overhead and speeds processing.

In addition, FastBit provides multiple nested levels of encoding, with the top level providing a relatively coarse index to the data and each successive lower level providing finer detail. In effect, the top level indices provide pre-computed answers to anticipated queries. This enables a rapid narrowing of the search as the software zeros in from a general picture to ever more precise detail.

Finally, FastBit’s authors devised an ingenious, patented method of compressing the bitmap indices that enables rapid performance of logical operations simultaneously on large swaths of data.

Read Bashor’s full report and more about the consumer uses of the technology.

DOE Develops ‘Superfast’ Search Engine

More Like This

The software you can’t use at NASA

Amid scrutiny into the US Secret Service, a look at how the agency uses technology

New TMF investments support AI Safety Institute, upgrades to nuclear emergency response

Top Stories

More than 1,300 devices have been reported missing to USAID, document shows

Harris likely to combine Biden AI policies with Silicon Valley-informed approach

GOP lawmakers, financial leaders ‘leery’ of rushing AI rules on the sector

CrowdStrike outage briefly impacted national organ transplant matching system

NIST seeks organization to stand up institute focused on AI to boost manufacturing

More Scoops

Machine-learning models predicted ignition in fusion breakthrough experiment

DOE uses firmware machine learning to bolster electric grid cybersecurity

Argonne National Lab adds ‘AI supercomputer,’ boosting work of COVID-19 consortium

Inside the HHS system informing White House coronavirus decisions

Federal CIOs directed to tag coronavirus announcements for search engines

Supercomputing consortium adds members as number of coronavirus projects increases, too

Hybrid cloud is hard — but worth it in the long run, feds say

Latest Podcasts

The VA extends its EHR contract with Oracle Center for another 11 months.

Leveraging AI to modernize government IT systems

The Coast Guard’s AI chief takes a new role focused on the 2024 presidential transition

TMF funds enhancements in nuclear and AI safety; Federal initiatives strengthen child online protection

Tech

Defense

Cyber

FedScoop TV