Infovore is a map/reduce framework for processing large RDF data sets such as Freebase and DBpedia. It is based on Hadoop.
|Tags||RDF hadoop pig semantic NLP|
|Operating Systems||Linux Unix|
|Implementation||Jena Semantic Web framework spring hadoop Guava Jena|
Release Notes: This release adds a job cost accounting function.
Release Notes: Haruhi now writes a tag with the Hadoop job ID to all line items for the job, so this release can add up line items with this tag to calculate the cost of a job after the fact. When running a flow (multiple jobs), Haruhi now uses the command line arguments of the flow to determine the name of the flow.
Release Notes: Tuning job parameters has sped up the weekly flow from 2.5 hours to about 57 minutes with a small cost reduction. A job to smush objects has been created, so it is now possible to import Dbpedia PageLinks into the :BaseKB space.
Release Notes: This adds the "sumRDF" tool, which sums up RDF values and is necessary for the conversion of DBpedia-derived subjective importance scores to :BaseKB-compatible scores.
Release Notes: This release adds the "smushSubject" tool, which can change the vocabulary used in the subject field using a reduce-side join.