Software: Case Study

BROWN DOG

Project Team: Kenton McHenry (PI), Jong Lee, Michael Dietze, Barbara Minsker, Praveen Kumar (Co-PIs) 

Funder: CIF21 DIBBs   ACI-1261582 Brown Dog (2013-2019)  

Website: browndog.ncsa.illinois.edu

 

Publications: doi.org/10.7717/peerj-cs.963  

 

Status: 

 

 

ACTIVE

 

SHARE THIS:

The objective of Brown Dog is to construct a service that will allow for past and present un-curated data to be utilized by science while simultaneously demonstrating the novel science that can be conducted from such data.

The proposed effort will focus on the large distributed and heterogeneous bodies of past and present un-curated data, what is often referred to in the scientific community as long-tail data, data that would have great value to science if its contents were readily accessible. The proposed framework will be made up of two re-purposable cyberinfrastructure building blocks referred to as a Data Access Proxy (DAP) and Data Tilling Service (DTS). These building blocks will be developed and tested in the context of three use cases that will advance science in geoscience, biology, engineering, and social science.

Questions about

Brown Dog or would

you like to contribute

to this project?

 

Kenton McHenry  mchenry@illinois.edu 

217-333-3593

Subhead

The DAP will aim to enable a new era of applications that are agnostic to file formats through the use of a tool called a Software Server which itself will serve as a workflow tool to access functionality within 3rd party applications. By chaining together open/save operations within arbitrary software the DAP will provide a consistent means of gaining access to content stored across the large numbers of file formats that plague long tail data. The DTS will utilize the DAP to access data contents and will serve to index unstructured data sources (i.e. instrument data or data without text metadata). Building off of the Versus content based comparison framework and the Medici extraction services for auto-curation the DTS will assign content specific identifiers to untagged data allowing one to search collections of such data. The intellectual merit of this work lies in the proposed solution which does not attempt to construct a single piece of software that magically understands all data, but instead aims at utilizing every possible source of automatable help already in existence in a robust and provenance preserving manner to create a service that can deal with as much of this data as possible. This proverbial “super mutt” of software, or Brown Dog, will serve as a low level data infrastructure to interface.

Research Software Engineering

NCSA’s Research Software Engineers enable new capabilities and advance discovery through innovative software/data solutions.