Inferno Overview¶
Inferno Query Language¶
In its simplest form, you can think of Inferno as a query language for large amounts of structured text.
This structured text could be a CSV file, or a file containing many lines of valid JSON, etc.
people.json
{"first":"Homer", "last":"Simpson"}
{"first":"Manjula", "last":"Nahasapeemapetilon"}
{"first":"Herbert", "last":"Powell"}
{"first":"Ruth", "last":"Powell"}
{"first":"Bart", "last":"Simpson"}
{"first":"Apu", "last":"Nahasapeemapetilon"}
{"first":"Marge", "last":"Simpson"}
{"first":"Janey", "last":"Powell"}
{"first":"Maggie", "last":"Simpson"}
{"first":"Sanjay", "last":"Nahasapeemapetilon"}
{"first":"Lisa", "last":"Simpson"}
{"first":"Maggie", "last":"Términos"}
people.csv
first,last
Homer,Simpson
Manjula,Nahasapeemapetilon
Herbert,Powell
Ruth,Powell
Bart,Simpson
Apu,Nahasapeemapetilon
Marge,Simpson
Janey,Powell
Maggie,Simpson
Sanjay,Nahasapeemapetilon
Lisa,Simpson
Maggie,Términos
people.db
If you had this same data in a database, you would just use SQL to query it.
SELECT last_name, COUNT(*) FROM users GROUP BY last_name;
Nahasapeemapetilon, 3
Powell, 3
Simpson, 5
Términos, 1
Terminal
Or if the data was small enough, you might just use command line utilities.
diana@ubuntu:~$ awk -F ',' '{print $2}' people.csv | sort | uniq -c
3 Nahasapeemapetilon
3 Powell
5 Simpson
1 Términos
Inferno
But those methods don’t necessarily scale when you’re processing terabytes of structured text per day.
Here’s what a similar query using Inferno would look like:
InfernoRule(
name='last_names_json',
source_tags=['example:chunk:users'],
map_input_stream=chunk_json_keyset_stream,
parts_preprocess=[count],
key_parts=['last'],
value_parts=['count'],
)
diana@ubuntu:~$ inferno -i names.last_names_json
last,count
Nahasapeemapetilon,3
Powell,3
Simpson,5
Términos,1
Don’t worry about the details - we’ll cover this rule in depth here. For now, all you need to know is that Inferno will yield the same results by starting a Disco map/reduce job to distributing the work across the many nodes in your cluster.