Dataset Production Checklist

Task Personnel * Start Finish Interfaces **
Prepare Data Management Plan/Resource Estimates
(This should be done by the Product Lead, and submitted to the Data Set Review Board through the operations manager who has a template for the Data Management Plan)
    
Hold Data Set Launch (kick off) meeting
(The product lead should gather the product team together for an initial planning meeting)
    
Algorithm development
(This may or may not be necessary and may or may not be done within the data center. You may want to include a more detailed description of this activity including sub tasks. Often it's a matter of making an existing algorithm "operational" and efficient. It probably involves a "scientific programmer")
    
Name the data set
(See our work with the NSIDC Database Administrator (DBA))
    
Determine means of distribution (ftp, CD ROM, tape, etc.).
(It's important to determine this as soon as possible so that database, archive, and distribution issues such as data granularity can be resolved in advance.)
    
Estimate computer system requirements and coordinate with Systems Engineering and Administration Team (SEAT)
(This is to ensure adequate disk space, mass storage, processing capacity, etc. The SEAT contact would be either Vince or Graham)
    
Prepare initial metadata ("skinny DIF")
(This is the first step to determine Guide and IMS valids and input into the incipient metadata database. The primary contact is the DBA (Karen R.))
    
Data Ingest & Processing Top Page
(The following steps are very broad and generic. Different data sets will have different specific steps which will need to be worked out by the lead in conjunction with programmer and operations staff. Use extra rows as necessary)
Data ingest/Media Migration
(This is essentially getting the data and transferring it from whatever medium it is on to the jukebox or appropriate server. It will most likely involve the data ingest tech. (Renea E.))
    
Quality control
(making sure the data transferred correctly)
    
Processing (reformatting, run algorithms, etc.)
(this is the first of perhaps several steps of data processing.)
    
QA
(a necessary step after the processing which should be considered by both the lead and programmer)
    
Media Migration
(after the first step of processing it may be necessary to migrate the processed data to the jukebox or somewhere again to be available for further processing)
    
QC     
Processing (generate browse, produce CD one-off, etc.)
(The next (usually final) processing step such as producing browse and preparing for distribution. Ultimately there may be several iterative processing steps. Note: All CD one-offs ust be produced by the data center, even or data we do not process. This assures ISO standards are met and the CDs work on all platforms)
    
QA
(Final processing QA)
    
Develop/gather necessary ancillary data (This includes land/coastline masks, overlays, etc.)     
Develop appropriate tools for extraction, display, subsetting browse, etc.
(This is often unnecessary. One should coordinate with the Lead Programmer or his designee on whether specific or generic tools can be applied)
    
Documentation Top Page
(These steps are just high level. A writer can provide a separate, more detailed documentation checklis used by the documentation team and user services)
Generate/update DIF and submit to GCMD (The Data Interchange Format document is a standard document necessary for the Global Change Master Directory (GCMD).All data sets have a DIF, and the team writer can provide a template. The initial input for this should come from the initial skinny DIF above)     
Acquire and edit documentation
(This primarily refers to any documentation or publications created by the data producer. This documentation as well as any software or processing documentation needs to be assembled and given to the documentation team for final editing and synthesis)
    
Input into Guide/put on http server
(The documentation team creates or edits platform, instrument, campaign, and data set documents from the above documents. These are put into a standard Guide format and made available through the WWW. This is primarily for DAAC data sets, but should ultimately involve all data sets.)
    
Review and document walk through
(There may be several interim reviews by the lead, user services, or documentation team, but it should culminate wth a final walk through of the documentation with all parties. Include Guide or other general documentation, DIF, catalog entry, reame, and software documentation))
    
Notify User Services
(This step is not necessarily here. It is meant simply as a place holder so everyone remembers to notify USO of new data sets)
    
IMS--Data base administration Top Page
(The IMS refers to the large cross DAAC (V0) IMS. All DAAC data sets need to be in the IMS and non-DAAC data sets should ultimately be made available through it as well. The primary contact for the following steps is the DBA (Karen R.))
Create metadata schema
(The DBA will work with the lead to develop an appropriate schema)
    
Generate/Acquire metadata
(Once the schema is defined the metadata needs to be created, preferably during the processing)
    
Ingest metadata into CIMS
(Our local IMS the Cryospheric Information Management System)
    
Transfer valids to Project/IMS
(The IMS has a set of defined keywords or valids for searches, etc. Also each data set may have a unique set of valids)
    
Review/test
(Make sure the IMS was actually updated and is working. This should include a valid/keyword search in the IMS and Guide)
    
Determine appropriate citation a user should use when citing their use of the data.
(Please refer to the NSIDC In-house Style Sheet on the Writer's Web. In some cases a new citation may need to be developed. Consult a writer.)
    
Write a descriptive cover letter to be delivered with the data (if appropriate).     
Hold final product team meeting
(This is the final check with the team to ensure all the last little details have been attended to)
    
Final review/QA before release
(The final check by the product lead in conjunction with others to make sure all the t's are crossed, etc.)
    
"Advertise" data set--ensure that funding agencies get proper credit.
(i.e., through "NSIDC Notes", special announcements, newsgroups, etc.)
    
Distribution
(need to consider the means of distribution (ftp, CD-ROM, etc.) as well as the timing)
    
Archive Administration
(Ongoing operational task, but the archive administrator (Renea) should be formally aware of new data sets as they are created)
    
Data set/Documentation update
(The product lead is responsible for ensuring that the data set remains timely, relevant, and accurate)
    
*
Personnel = Who does the task
**
Interfaces includes necessary machines for processing and storage as well as interfaces to things outside the data center such as data producers, Sony for CD production, or the means to announce a new data set, etc.

Top Page