LISTSERV mailing list manager LISTSERV 16.5

Help for JESSE Archives


JESSE Archives

JESSE Archives


JESSE@LISTSERV.UTK.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

JESSE Home

JESSE Home

JESSE  March 2011

JESSE March 2011

Subject:

DataONE Summer Internship Opportunity

From:

"Monroe, Wanda G." <[log in to unmask]>

Reply-To:

Open Lib/Info Sci Education Forum <[log in to unmask]>

Date:

Tue, 15 Mar 2011 20:41:52 +0000

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (161 lines)

PLEASE POST AND DISTRIBUTE

The Data Observation Network for Earth (DataONE) is a virtual organization dedicated to providing open, persistent, robust, and secure access to biodiversity and environmental data, supported by the U.S. National Science Foundation. DataONE is pleased to announce the availability of summer research internships for undergraduates, graduate students and recent postgraduates.

Program Structure
Up to eight interns will be accepted in 2011, each paired with one primary mentor and, in some cases, secondary mentors. Interns need not necessarily be at the same location or institution as their mentor(s). Interns and mentors are expected to have a face-to-face meeting at the beginning of the summer, and interns are encouraged to attend the DataONE All-Hands Meeting in the fall to present the results of their work. DataONE will pay all necessary travel expenses.


Schedule 

*	March 15 - Application period opens 
*	April 8 - Deadline for receipt of applications at midnight Pacific time 
*	April 15 - Notification of acceptance. Scheduling of face-to-face kickoff meetings based on availability of interns and mentors 
*	May 23 - Program begins* 
*	June 27 - Midterm evaluations 
*	July 29 - Program concludes 
*	October 18-20 - DataONE All-Hands-Meeting, New Mexico (attendance encouraged) 

* Allowance will be made for students who are unavailable during these date due to their school calendar.


Eligibility
The program is open to all undergraduate students, graduate students, and postgraduates who have received their masters or doctorate within the past five years. Given the broad range of projects, there are no restrictions on academic backgrounds or field of study. Interns must be at least 18 years of age by the program start date, must be currently enrolled or employed at a university or other research institution and must currently reside in, and be eligible to work in, the United States. Interns are expected to be available approximately 40 hours/week during the internship period (noted below) with significant availability during the normal business hours. Interns from previous years are eligible to participate. 


Financial Support
Interns will receive a stipend of $4,500 for participation, paid in two installments (one at the midterm and one at the conclusion of the program). In addition, required travel expenses will be borne by DataONE. Participation in the program after the mid-term is contingent on satisfactory performance. The University of New Mexico will administer funds. Interns will need to supply their own computing equipment and Internet connection. For students who are not US citizens or permanent residents, complete visa information will be required, and it may be necessary for the funds to be paid through the student's university or research institution. In such cases, the student will need to provide the necessary contact information for their organization. 


Project Ideas
Projects cover a range of topic areas and vary in the extent and type of prior background required of the intern. The interests and expertise of the applicants will, in part, determine which projects will be selected for the program. Off-list projects are also eligible, in which case potential applicants are strongly encouraged to contact the organizers and/or potential mentors with their ideas prior to applying. The titles of this year's projects (see below for more detailed descriptions) are:

1.	DATA MANAGEMENT: Best practices of data management for public participation in science and research 
2.	DATA MANAGEMENT: Online learning modules related to best practices throughout the data lifecycle 
3.	EDUCATION: Accessing and analyzing environmental data in the classroom 
4.	SOCIOLOGY OF SCIENCE: Understanding how scientists analyze data 
5.	DATA SCIENCE: How much ecological data is out there? 
6.	DATA SCIENCE: Tracking the reuse of 1000 datasets 
7.	PROGRAMMING: Subsetting and publishing "dynamic" scientific datasets 
8.	PROGRAMMING: Scientific workflow provenance repository and publishing toolkit 
9.	PROGRAMMING: Integrating loosely structured data into the Linked Open Data cloud 
10.	SCIENCE COMMUNICATION: Developing video animations for DataONE community engagement 


To Apply
Application materials should be sent to [log in to unmask] <mailto:[log in to unmask]>  by 11:59 PM (Pacific time) on April 8th, and should include a cover letter, resume and letter of reference all in PDF format. The applicant should send the cover letter and resume, while the letter of reference should be sent directly by its author.

1.	The cover letter should address the following questions: 

	*	What DataONE Summer Internship projects are you most interested in and why? 
	*	What contributions do you expect to be able to make to the project(s)? 
	*	What background do you have which is relevant to the project(s)? 
	*	What do you expect to learn and/or achieve by participating? 
	*	What are your thoughts and ideas about the project, including particular suggestions for ways of achieving the project objectives? 
	*	How will participation in this program help you achieve your educational and career objectives? 
	*	Are there any factors that would affect your ability to participate, including other summer employment, university schedules, and other commitments? 

2.	The resume should include the applicant's educational history, current position, any publications or honors, and full contact information (including phone number, e-mail address, and mailing address). 
3.	The letter of reference should be sent directly to [log in to unmask] <mailto:[log in to unmask]>  and should be from a professor, supervisor, or mentor. 


Evaluation of applications

Applications will be judged by the following criteria:

*	The academic and technical qualifications of the applicant. 
*	Evidence of strong written and oral communication skills. 
*	The extent to which the applicant can provide substantive contributions to one or more projects, including the applicant's ideas for project implementation. 
*	The extent to which the internship would be of value to the career development of the applicant 
*	The availability of the applicant during the period of the internship. 

Intellectual Property
DataONE is predicated on openness and universal access. Software is developed under one of several open source licenses, and copyrightable content produced during the course of the project will made available under a Creative Commons (CC-BY 3.0) license. Where appropriate, projects may result in published articles and conference presentations, on which the intern is expected to make a substantive contribution, and receive credit for that contribution.

Funding acknowledgement
The Summer Internships are supported by The National Science Foundation: "INTEROP: Creation of an International Virtual Data Center for the Biodiversity, Ecological and Environmental Sciences" (NSF Award 0753138) and "DataNet Full Proposal: DataNetONE (Observation Network for Earth)" (NSF Award 0830944).


For more information
If you have questions or problems about the application process or internship program in general, please send e-mail to [log in to unmask] 


Project Ideas

1.	Best practices of data management for public participation in science and research
	Description: The DataONE Citizen Science Working Group (CSWG) is working to organize and develop best practices for management of data and information for the increasing number of local, regional and national projects that focus on "Public Participation in Science and Research (PPSR)," also called Citizen Science projects. The 2011 CSWG intern will assist in the inventory and description of data practices for PPSR projects, based on the response from an earlier survey conducted as part of the CSWG. The goals of the intern project are to develop a metadata description for key aspects of the data held by each group, and make this information available back to the CSWG as a small database. The intern will then help identify and document best practices for data management by PPSR projects, assist in vetting the best practice documents across the PPSR community, and work with CSWG to make the best practices available via the DataONE website as well as other outlets. Products will include a suite of best practices for data management by PPSR projects; in addition, the intern will be encouraged to give a formal presentation at a scientific, data management or PPSR conference or meeting. Local work preferred, at Tucson or Ithaca, though remote work would be possible for outstanding candidates (though one trip for an organization meeting would be required).
	Qualifications needed: Undergraduate or graduate student or equivalent; simple database management (e.g., MS Access) skills preferred; public engagement; writing; organization; small project management
	Skills to be learned: Metadata management; best practices template; database management; communications and outreach; project management
	Primary mentor: Jake Weltzin (USA National Phenology Network)
	Secondary mentor: Rick Bonney (Cornell Laboratory of Ornithology)
2.	Developing online learning modules related to the best practices throughout the data lifecycle
	Description: DataONE is developing online learning modules designed to educate DataONE users in various aspects of the data lifecycle. This project involves: 1) researching and acquiring software that can produce high quality online learning; 2) developing online learning modules using pre-prepared power point slides produced by the DataONE Community Engagement and Education Working Group; 3) adding content about data management 4) participating in a workshop hosted by DataONE to refine and add additional content to educational modules (July, 2011).
	Qualifications needed: A science data management background; Familiarity with aspects of the data lifecycle; Ability to quickly learn new software; Some work in development of educational materials helpful
	Skills to be learned: Creative ways to educate a varied audience on data lifecycle; familiarity in use of chosen software used to develop online learning modules; collaboration techniques with dispersed working group.
	Primary mentor: Viv Hutchison (USGS NBII)
	Secondary mentors: Stephanie Hampton (National Center for Ecological Analysis and Synthesis), Carly Strasser (National Center for Ecological Analysis and Synthesis)
3.	Understanding how scientists analyze data
	Description: Scientists use a wide variety of tools and techniques to manage and analyze data. However, to our knowledge no one has taken a systematic look at how scientists do their work. In this project, we will examine a large number of the scientific workflows that have been constructed. We will develop a way of categorizing workflows based on their complexity, types of processing steps employed, and other factors. The goal is to develop new and significant understanding of the scientific process and how it is being enabled by science workflows.
	Qualifications needed: Self-starter, determined, enthusiastic, willing to keep a research notebook up-to-date openly online. Experience with a modern programming language, statistics and data analysis, and R would be helpful.
	Skills to be learned: Kepler and Taverna workflow languages, research methods, research analysis, keeping an open science research notebook, communicating research results. A peer-reviewed publication is envisioned.
	Primary mentor: William Michener (University New Mexico)
	Secondary mentors: Rebecca Koskela (University of New Mexico), Bertram Ludaescher (University of California Davis)
4.	Accessing and analyzing environmental data in the classroom
	Description: A graduate student intern will create an educational module for use in undergraduate classrooms - the module will be designed to teach basic principles in ecology or environmental science using data that are publicly available through the DataONE network. The student will work with mentors to choose appropriate data sets, questions and analyses, and create a simple program to access and analyze the data in R. The student will create documentation that accompanies the exercise, potentially in multimedia formats, to train instructors to use the exercise in classrooms.
	Qualifications needed: Basic background in ecology or environmental science, and statistics is necessary. Experience implementing statistics in a scripted statistical package such as R, Matlab or SAS is necessary. Experience with online training materials and multimedia presentation - e.g., screencasts - is useful.
	Skills to be learned: The student will hone skills in statistical analysis, programming in R, working with large data sets, and creating teaching materials. The student will gain a well-rounded perspective on the importance of all aspects of the data life cycle in environmental sciences, and build a diverse professional network with leaders in environmental informatics and data-driven environmental science research.
	Primary mentor: Stephanie Hampton (National Center for Ecological Analysis and Synthesis)
	Secondary mentors: Carly Strasser (National Center for Ecological Analysis and Synthesis), Amber Budden (University of New Mexico)
5.	How much ecological data is out there?
	Description: No one is certain how much ecological data exists, or how this amount compares to the volume of data currently housed in repositories such as Knowledge Network for Biocomplexity (KNB). It would be useful to determine this for designing infrastructure, but also as a call to arms for ecologists to start sharing this "dark data". For this project, we will develop a method for estimating the amount of ecological data being generated, with a focus on "small science" projects. Initially this project will involve brainstorming about the best way to estimate such a complex figure, and the intern will then be tasked with producing the estimate using the decided upon methods. Potential methods for estimation may include sampling publications, surveying scientists, or exploring existing databases. We foresee that results from this project will be highly cited since such an estimate is useful for discussions about data sharing, data reuse, and repository development in Ecology.
	Qualifications needed: Applicants should be graduate students, have a strong background in the field of ecology or environmental science, and have statistics experience. Experience using computer scripts for data retrieval would be helpful, along with programming experience in R and/or MATLAB. The intern will need to be creative and excited about tackling complex problems.
	Skills to be learned: The student will be exposed to topics in data management, reuse, and archiving, and will learn to work with ecological databases. They will learn to work collaboratively on complex problems with several members of the DataONE team, and have the opportunity to write a peer-reviewed publication with the potential for high citation rates. Particular skills related to computer scripting, statistics, and data mining will be specific to the methods determined by the student and mentors.
	Primary mentor: Carly Strasser (National Center for Ecological Analysis and Synthesis)
	Secondary mentor: Stephanie Hampton (National Center for Ecological Analysis and Synthesis)
6.	Tracking the reuse of 1000 datasets
	Description: We believe that openly archiving raw data facilitates valuable reuse. Can we measure this? What contribution does data reuse make to the published literature? Who reanalyzes data? For what? Does this vary across disciplines and repositories? These questions are the focus of an exploratory study, "Tracking data reuse: Following one thousand datasets from public repositories into the published literature." In this internship you'll work directly with Heather to collect, extract, annotate, and analyze data to explore these important questions. See http://bit.ly/cPsek0 <http://bit.ly/cPsek0>  for more info on the project.
	Qualifications needed: Self-starter, determined, enthusiastic, willing to keep a research notebook up-to-date openly online. Experience with statistics, the academic literature, PubMed, ISI Web of Science, Python, R, and blogging would be helpful.
	Skills to be learned:Research methods, research data collection, text extraction from the scientific literature, keeping an open science research notebook, communicating research results
	Primary mentor: Heather Piwowar (National Evolutionary Synthesis Center)
	Secondary mentor: Todd Vision (University of North Carolina Chapel Hill/National Evolutionary Synthesis Center)
7.	Subsetting and publishing "dynamic" scientific datasets
	Description: The Avian Knowledge Network (AKN) is a federation of bird monitoring datasets, the largest and most dynamic of which is eBird. Datasets such as these, that are constantly being edited and expanded, are challenging to incorporate into the DataONE framework because of the way they are currently published. This project involves researching issues around dataset subsetting and duplication to recommend a publishing approach that works for "dynamic" datasets. Expected outcomes: (1) Implement that strategy by migrating the AKN repository to a DataONE-integrated Metacat deployment, making AKN into a DataONE Member Node; (2) Produce a case-study article that captures the implementation process that could act as a guide to future Member Nodes making similar efforts.
	Qualifications needed: metadata mapping; high level programming language (e.g., Perl, Java); SQL; shell scripting
	Skills to be learned: data repository implementation; scientific data organization and publishing
	Primary mentor: Paul Allen (Cornell Laboratory of Ornithology)
	Secondary mentors: Kevin Webb (Cornell Laboratory of Ornithology)
8.	Scientific workflow provenance repository and publishing toolkit
	Description: Scientific workflow systems are increasingly used to automate scientific computations and data analysis and visualization pipelines. An important feature of scientific workflow systems is their ability to record and subsequently query and visualize provenance information. Provenance includes the processing history and lineage of data, and can be used, e.g., to validate/invalidate outputs, debug workflows, document authorship and attribution chains, etc. and thus facilitate "reproducible science". We aim to develop (1) a provenance repository system for publishing and sharing data provenance collected from runs of a number of scientific workflow systems (Kepler, Taverna, Vistrails), together with (2) a provenance trace publication system that allows scientists to interactively and graphically select relevant fragments of a provenance trace for publishing. The selection may be driven by the need to protect private information, thus including hiding, abstracting, or anonymizing irrelevant or sensitive parts. Part (1) will be based on a DataONE-extension of the Open Provenance Model (D1-OPM) and leverage an earlier Summer of Code project. In particular, the provenance toolkit includes an API for managing workflow provenance (i.e., uploading into and retrieving from a data storage back-end). Part (2) will implement a new policy-aware approach to publishing provenance, which aims at reconciling a user's (selective) provenance publication requests, with agreed upon provenance integrity constraints. For an existing rule-based backend, a graphical user environment needs to be developed that lets users select, abstract, hide, and anonymize provenance graph fragments prior to their publication.
	Qualifications needed: For Part 1, applicants should have experience in SQL and Java or a scripting language (e.g., Python or Perl). For Part 2, programming of GUIs with Rich Internet Application (RIA) technologies (e.g., GWT) is a plus.
	Skills to be learned: : Collaborative open source software development using state-of-the-art languages and tools (databases, workflow systems, interactive information visualization).
	Primary mentor: Bertram Ludaescher (University of California Davis)
	Secondary mentor: Paolo Missier (Newcastle University)
9.	Integrating loosely structured data into the Linked Open Data cloud
	Description: The Linked Data conventions describe four principles that allow data of any kind and from any online source to form a global interconnected web of data: i) name every "thing" that has some data or information associated with it; ii) use HTTP URIs to do so; iii) provide useful information or data in Resource Description Framework (RDF) format to someone looking up such URIs; and iv) within information provided this way, link to other common "things", such as points or axes of reference, and use common vocabularies to attach meaning to links wherever possible. These seemingly simple principles have nonetheless been highly effective in facilitating the creation of large, globally distributed, and constantly growing aggregations of Linked Open Data (LOD), a unversally applicable framework for machines and users alike to integrate, navigate, and discover data by following links that are semantically of interest. Trying to apply the Linked Data principles to data holdings of non-specialized digital repositories, such as DataONE and many of its member nodes, is challenging. These data are often highly heterogenous, and not natively expressed in RDF, or a format structured enough that would lend itself to automatic conversion to RDF. Instead, they are typically represented in formats that are either loosely structured in an ad-hoc manner (such as spreadsheets), or according to one of a myriad of formats output by instruments or analysis programs. It is thus not clear what the universe of "things" to name is, what are common points or axes of reference, what kinds (semantics) of links are needed, and how data archived in this way can be exposed in RDF such that the conversion can be automated, yet is still useful for science-motivated discovery and integration. The idea of this project is to develop an exploratory prototype, and practical recommendations resulting from it, for how the heterogeneous and loosely structured data held in non-specialized DataONE member nodes can be exposed to the Linked (Open) Data cloud. The approach would consist of obtaining a sufficiently representative sample of data sets from DataONE's initial 3 member nodes (Dryad, KNB, and ORNL-DAAC), and using them as instance data for which to define the RDF predicate vocabularies, domain ontologies, resource URIs, and conversion mechanisms that are necessary to create a LOD representation of these data. This representation can then be uploaded to, navigated, and queried in either one of the web-based LOD browsers (such as URIburner), or for example in a local installation of OpenLink Virtuoso.
	Qualifications needed: Knowledge of RDF and one of its widely used serializations (XML, N3). Familiarity with either C or Java programming, or a scripting language that has good support for RDF and OWL, will be needed. Familiarity with Linked Data, and experience with metadata vocabularies and domain ontologies in RDF and OWL will be very helpful.
	Skills to be learned: Designing and executing an exploratory study through all phases. Identifying and communicating alternatives and their advantages and drawbacks. Developing practical semantic web resources for existing instance data.
	Primary Mentor: Hilmar Lapp (National Evolutionary Synthesis Center)
10.	Developing video animations for DataONE community engagement
	Description: DataONE wishes to develop a set of video animations to help explain DataONE's value and capabilities to a range of audiences. Several topics have been identified for these short animations, a couple of storyboards have been developed, and one animation created. The intern will work with the mentors to continue building this set of animations according to the principles of 'universal design'.
	Qualifications needed: Applicants should have strong visual design skills and a high level of expertise in development of digital animation. Expertise in communicating scientific information to a variety of audiences is desirable.
	Skills to be learned: Video / animation development; science communications.
	Primary mentor: Paul Allen (Cornell Laboratory of Ornithology)
	Secondary mentors: Amber Budden (University of New Mexico), Will Morris (Cornell Laboratory of Ornithology) 

This information is also available at: http://www.dataone.org/content/2011-summer-internship-program <http://www.dataone.org/content/2011-summer-internship-program> 


**********************************
Wanda Monroe
Director of Communications
School of Information and Library Science
University of North Carolina at Chapel Hill
100 Manning Hall, CB 3360
Chapel Hill, NC  27599-3360
Phone: 919-843-8337
Web: sils.unc.edu
Follow us on Twitter at: UNC SILS


 

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
February 2003
January 2003
December 2002
November 2002
October 2002
September 2002
August 2002
July 2002
June 2002
May 2002
April 2002
March 2002
February 2002
January 2002
December 2001
November 2001
October 2001
September 2001
August 2001
July 2001
June 2001
May 2001
April 2001
March 2001
February 2001
January 2001
December 2000
November 2000
October 2000
September 2000
August 2000
July 2000
June 2000
May 2000
April 2000
March 2000
February 2000
January 2000
December 1999
November 1999
October 1999
September 1999
August 1999
July 1999
June 1999
May 1999
April 1999
March 1999
February 1999
January 1999
December 1998
November 1998
October 1998
September 1998
August 1998
July 1998
June 1998
May 1998
April 1998
March 1998
February 1998
January 1998
December 1997
November 1997
October 1997
September 1997
August 1997
July 1997
June 1997
May 1997
April 1997
March 1997
February 1997
January 1997
December 1996
November 1996
October 1996
September 1996
August 1996
July 1996
June 1996
May 1996
April 1996
March 1996
February 1996
January 1996
December 1995
November 1995
October 1995
September 1995
August 1995
July 1995
June 1995
May 1995
April 1995
March 1995
February 1995
January 1995
December 1994
November 1994
October 1994
September 1994
August 1994
July 1994
June 1994
May 1994
April 1994
March 1994
February 1994
January 1994

ATOM RSS1 RSS2



LISTSERV.UTK.EDU

CataList Email List Search Powered by the LISTSERV Email List Manager