GPS Infrastructure: Data Archiving

Hadley Johnson, Duncan Agnew and Greg Anderson

UC San Diego, Scripps Institution of Oceanography

 

This project supports the collection and archiving of GPS data gathered throughout southern California from ``survey'' measurements (as opposed to the data from the continuous GPS network, SCIGN). Since 1986, a large number of organizations (academic, federal, state, and local government) have collected GPS data at hundreds of points (Figure 1). Much of what has been collected has been gathered intentionally for crustal-motion studies; but some data, though collected for other purposes, will also be useful in constructing a crustal-motion model for this area. The purpose of this effort is to collect these data, and archive them at the SCEC Data Center in such a way that they will be readily available in a standardized format for all interested researchers: primarily those within SCEC who are producing the geodetic velocity model by analyzing these data, but also for other investigators. The SCEC GPS archive is one of several that are intended to form a ``seamless archive'' for GPS data, an effort being led by the UNAVCO Boulder facility.

The activities covered by this grant include:

1. The collection of GPS data (datafiles and logsheets) from those investigators, and agencies, which are making or have made GPS measurements of ``crustal-deformation'' quality at points of interest.

2. The conversion of these data to a standard RINEX format, with inconsistencies and incompleteness removed. These RINEX files are put on the SCEC-DC in Pasadena.

3. The production of a machine-readable file of all ``metadata:'' primarily logsheet information and information originally in the RINEX header, but also what we call ``audit-trail'' information, which documents what steps were taken in task (2), and will allow us to easily track down any errors in processing.

4. The construction of indices of stations and occupations of them, to provide guidance for those wishing to use the data, as well as guidance towards additional field measurements.

5. The maintenance of a ``campaign list'' summarizing data available and yet to be gathered.

6. The maintenance of site descriptions, as much as possible in machine-readable form.

Our focus this year has been on the first three tasks, and has been quite successful. We chose to begin by setting up a system of programs for the efficient processing of files, automating as many tasks as possible, and making many of the rest a matter of routine data-entry. The development of this system occupied much of last year's effort, with final touches being put on during the start of 1997. With the completion of this system, and the hiring of two extremely efficient undergraduate assistants (Heidi Buck and Pam Lehr) we were indeed able to process files quite rapidly. Figure 2 shows our progress, in terms of RINEX files present at the SCEC Data Center. Over the last year we have archived 6230 RINEX files, making a total of 7844 files at the Data Center. The files archived included data which had originally been collected by UCLA as part of the archiving effort, and also a considerable amount of new data from MIT, JPL, the USGS office in Pasadena, Harvey Mudd, NGS, NASA, and Caltrans--as well as UCSD. Getting data has required, in a number of cases, reading original files off diskette (successfully, even for 11-year-old data) and converting the raw data to RINEX. Figure 3 shows the distribution of files by contributing agency: the UCLA/SCEC effort has been the primary source, but many others have contributed as well. At present we have no backlog of files to be processed, though there remain data that we need to obtain.

In addition to archiving files, we have undertaken several other efforts:

1. Site index. We have prepared an index of geodetic monuments that includes all points on the current velocity map plus many others (Figure 1 shows the current distribution). All points have coordinates good to 1 m, a unique identifier, a (brief) description, and a suggested 4-character ID. When possible the list includes the NGS PID, since this can be used to get a fuller site description from the NGS Web-server. This list is available at the SCEC DC.

2. Matching data files to logsheets. We have a machine-readable index to all logsheets, and as part of our regular processing we match data files to logsheets. As a subsidiary task we have done the same for the RINEX data already at the Data Center; completing this cross-index allows us to check the completeness of the different archive holdings (for example, to see if there are logsheets without data--something that has already turned up data that would otherwise have been overlooked).

3. Conversion of older data. Most of the pre-1990 data (about 1400 files) are in an older format called FICA, now readable with only specialized tools. We have begun to convert these files to RINEX, uncovering a number of problems in the conversion routines in the process.

4. Documentation. All the procedures and programs we have developed, and the format of all files of metadata, are fully documented. We expect this documentation to be quite useful in working with other groups (such as UNAVCO).