Data Collection Guideline.pdf

Research Data Collection Guidelines 2023.05.

1. Overview
Purpose
  • ㆍThe presentation of research data collection guidelines applied by the Geoscience Data Center of the Korea Institute of Geoscience and Mineral Resources (KIGAM) when collecting research data.
Target
  • ㆍResearchers and users who want to deposit research data to the Geo Big Data Open Platform.
Scope of Application
  • ㆍApplies to research data during in-house research activities and research data donated by external organizations and individuals.
Application
  • ㆍFor matters not specified in these guidelines may be subject to the research data management guidelines of the National Research Council of Science & Technology (NST) and KIGAM Data Management Regulations.
2. Research Data Definition
Research Data Definition
  • ㆍ"Research data" refers to factual data generated through various experiments, observations, investigations, and analyses of research projects, which are essential for the verification of research results.
Research Data Classification
  • ㆍResearch data is categorized into1 primary data, secondary data, digitally converted data, metadata, external data, and collections.
  • ㆍThis excludes research notes, drafts of scientific papers, future research plans, communications with colleagues, and real physical materials, but includes metadata about real physical materials such as drill cores, rocks, and fossils in research data.

    ◦ Primary data: This refers to data generated through various surveys, experiments, monitoring, observations, measurements, analyses, etc. of the research project, which are essential and objective factual data for reproducing research results.

    ◦ Secondary data: This refers to data produced through processing and analysis based on primary data, such as tables, images, graphs, models, various geological maps, geophysical anomaly maps, and geochemical maps.

    ◦ Digitally converted data: data that has been converted from analog to digital form.

    ◦ Metadata: This refers to data used to describe the data, consisting of the title of the data required for the data description, the producer of the data, the equipment and method of production of the data, the content of the data, the location (location coordinates) and time of acquisition, the format of the data, the quality of the data, etc.

    ◦ External data: This refers to data provided by external sources, both paid and unpaid.

    ◦ Collection: Any aggregation of physical or digital resources.

Classification by Data Production Method Criteria
  • ㆍObservational data: These are data produced through observation or monitoring.
  • ㆍExperimental data: These are data produced through experimentation.
  • ㆍSimulation data: These are data produced through simulation.
  • ㆍDerived or compiled data: These are data produced through compilation or extraction.
Importance and necessity of sharing research data
  • ㆍWith the shift towards data-intensive research, which attempts to make scientific discoveries through vast amounts of research data produced by various measurement and experimental equipment, the importance of recycling and managing research data is increasing.
  • ㆍWith the open science and open access movements, data and publications generated by publicly funded research projects are shifting towards being published in public data repositories.
  • ㆍSharing research data increases its availability to other researchers or institutions.
  • ㆍThis is an opportunity to increase the reputation of individual researchers or research institutions through data citation, etc., and to develop better research through data verification, etc.
  • ㆍSharing research data can reduce the cost of duplicate production and publication of data, allowing for greater focus on future research.
  • ㆍThis allows for the sharing of most pre-processed research data while respecting privacy, research ethics, and other sensitive issues.
  • ㆍIf the research project management organization promotes the disclosure of research data through the Data Management Plan (DMP), standardization of data acquisition, storage, and metadata creation is the best way to manage research data to realize research data sharing.
3. Research Data Collection
Research data collection scope
  • ㆍKIGAM collects and manages geological resource datasets for research fields such as national geology, mineral resources, oil seabed, and geological environment. This includes research data generated during research activities at the institute and data obtained from outside.
  • ㆍThe scope of research data collection (research data acquired through research activities) specified above includes DMP.

    ◦ According to the Research Data Management Regulations of KIGAM, DMP must be submitted when conducting research projects, so it falls under the collection object that is managed and preserved as a research asset.

  • ㆍThe types of research data specified in the DMP are as follows:

    ◦ Field investigation data

    ◦ Outdoor exploration/measurement data

    ◦ Sample and assay data

    ◦ Map data (Final deliverables in the form of drawings, such as geologic maps and geologic thematic maps)

    ◦ Course materials (International School for Geoscience Resources)

    ◦ Literature/survey data

    ◦ External data

    ◦ Other-categorized data (data not included in the above categories)

Select and Evaluate Research Data
  • ㆍResearch data collection criteria:

    ◦ Data with demonstrated tangible value to research and education and ongoing archival significance.

    ◦ Data produced in core areas defined by the organization.

    ◦ Data produced by agency-funded geology and resources projects.

  • ㆍSecurity, privacy, and confidentiality considerations:

    ◦ Otherwise, follow the Research Data Ethics·Copyright·Licensing Guidelines.

    ◦ For sensitive data collected and stored, meet recognized standards for privacy and confidentiality.

    ◦ Licenses applicable to specific data collections should be managed in accordance with the law, given the repository's resources, goals, and mission.

  • ㆍCopyright and licenses:

    ◦ Other details follow research data Ethics·Copyright· Licensing guidelines.

    ◦ When collecting research data, the owner of intellectual property rights must be identifiable.

    ◦ Individuals or organizations with intellectual property rights to submit research data must agree to the deposit conditions set by the repository.

    ◦ Copyright holders of research data select and apply the appropriate license type when licensing data use.

    ◦ CC licenses are prioritized for research data sharing and utilization.

  • ㆍResearch data quality:

    ◦ Encourage the collection of research data with comprehensive technical documentation that provides users with information to assess data quality and reliability when collecting research data.

    ◦ Prefer research data in its original form.

  • ㆍMetadata:

    ◦ Metadata used during ingestion and deposition uses Dublin Core's format as shown in <Table 1>.

<Table 1> Dublin Core metadata elements

Element Contents
Title
  • ㆍA name given to the resource.
Creator
  • ㆍAn entity responsible for making the resource.
Type
  • ㆍThe nature or genre of the resource.
Contributor
  • ㆍAn entity responsible for making contributions to the resource.
Publisher
  • ㆍAn entity responsible for making the resource available.
Date
  • ㆍA point or period of time associated with an event in the lifecycle of the resource.
Language
  • ㆍA language of the resource.
Format
  • ㆍThe file format, physical medium, or dimensions of the resource.
Description
  • ㆍAn account of the resource.
Subject
  • ㆍA topic of the resource.
Relation
  • ㆍA related resource.
Identifier
  • ㆍAn unambiguous reference to the resource within a given context.
Rights
  • ㆍInformation about rights held in and over the resource.
Source
  • ㆍA related resource from which the described resource is derived.
Coverage
  • ㆍThe spatial or temporal topic of the resource, spatial applicability of the resource, or jurisdiction under which the resource is relevant.
Preferred Research Data Format
  • ㆍData that is available in a variety of computing and technology environments.
  • ㆍData that is available in an easily accessible format for users.
  • ㆍData that is accessible and usable without compromising research value.
  • ㆍData that can be converted to a format that can be used by various statistical or analytical software.
  • ㆍData that does not require to be interpreted using additional software.
Research Data Ethics
  • ㆍFollow the Research Data Ethics·Copyright· Licensing Guidelines for other information.
  • ㆍTo prevent ethical issues that may arise in the production and collection of research data, KIGAM collects relevant data in compliance with research data ethics.
4. Research Data Deposit
Meaning of Research Data Deposit
  • ㆍRegistering the research data produced in the system (repository) for the purpose of facilitating future reuse and continuous access to the research data.
Requisites for Research Data Deposit
  • ㆍEnsuring continuous access to research data.
  • ㆍProviding professional research data management, preservation, and access.
  • ㆍReducing storage costs and enabling stable storage of large amounts of data.
  • ㆍLong-term management and preservation of research data in a secure environment.
  • ㆍCan be utilized as a resource with potential value that can be reused in the future.
Materials to be submitted by researchers when depositing research data
  • ㆍResearch data files

    ◦ The data should be in a state that allows software to process (verify, modify, convert, extract, etc.) the individual contents or internal structure of the research data.

    ◦ It is recommended that the Research Dataset or Research Data File are provided in a universal format for future reuse, or in a specific format that is commonly used by the domain-specific community.

  • ㆍDocumentation

    ◦ Provide documentation files with the study data files that are necessary to interpret the study data files.

    ◦ Documentation files may include codebooks, data collection tools, summary statistics, project summaries, and a list of data-related publications.

  • ㆍMetadata

    ◦ Metadata describing the contents of the research data file must be provided along with the research data file.

    ◦ The metadata format (DC) you use may include: title, creator, type, contributor, format, description, subject, and terms. (see Research Data Collection Guidelines <Table 1>).

Research Data Registration Procedure
  • ㆍRegistration

    ◦ Researchers check research data for quality and anomalies and submit research data to the data repository <Figure 1>.

  • ㆍApproval and review

    ◦ Principal investigators or institutional/departmental research data management personnel review and approve submitted research data <Figure 1>.

    ◦ Key review points: Data format, compliance with metadata technical formats, compliance with research data disclosure and licensing, and inclusion of sensitive information.

  • ㆍWhat to look for when making a deposit

    ◦ Checklist

    - Follow naming conventions

    - Check for typos

    - File size (maximum single file size limit: 100 GB)

    - Check the DMP taxonomy code

    <그림1 > 연구데이터 기탁 시 데이터 등록 절차

    <Figure 1> Data registration procedure when depositing research data

5. DMP Writing
The Meaning of DMP
  • ㆍDMP is a plan for the production, preservation, management, and joint utilization of research data.
  • ㆍIt means a document that is prepared and submitted together with the research plan when establishing a research plan for the collection, management, preservation, opening, and utilization of research data.
Purpose of DMP Writing
  • ㆍTo plan in advance the types of data to be produced in the research process and how to acquire them and to manage the data produced efficiently.
Create and Fulfill, Change
  • ㆍMatters regarding the preparation and implementation of the DMP are specified in Article 13 (Preparation and implementation of the data management plan) of the Research Data Management Regulations of the KIGAM and include the subjects of preparation, and matters to be reviewed after preparation.
  • ㆍChanges to the DMP are specified in Article 14 (Changes to the Data Management Plan) of the KIGAM's Research Data Management Regulations.
Configuration
  • ㆍDMP can be written as shown in <Table 2>.

<Table 2> Example of DMP writing

Project Title
Research Period Research Project Manager
Open Status Open/Close Reasons If closed, please write reasons
1. Field Survey Data
ID Classification Code Title Survey Method Producer
1-1
1-2
2. Field exploration/measurement data
ID - Title Exploration Method Producer
2-1
2-2
3. Sample and sample analysis data
ID Classification Code Sample Name Method of Analysis Number of Sample Producer
3-1
3-2
4. Geological thematic maps
ID Classification Code Title Scale Producer
4-1
4-2
5. Lecture materials
ID Classification Code Title of Education Course Title Producer
5-1
5-2
6. Literature/survey data
ID Classification Code Title Purpose of Survey Producer
6-1
6-2
7. External data
ID Classification Code Title Organization Producer
7-1
7-2
8. Other - unclassified data
ID Classification Code Title Contents Producer
8-1
8-2
  • ㆍMetadata standards can be written as shown in <Table 3>.

<Table 3> Example of DMP writing

DivisionContent and examples
Metadata standards
  • ㆍTTAS.IS-19115: Metadata Standard for Geographic Information Management
  • ㆍDublin Core: A set of metadata elements standardized by ISO for efficient search and management of various digital resources on the Internet.
  • ㆍDarwin Core: An extension of Dublin Core for biodiversity information
  • ㆍABCD(Access to Biological Collection Data): Standards for access and exchange of primary biodiversity data, including specimens and observations
  • ㆍABCDEFG (Access to Biological Collection Databases Extended for Geosciences): An extension of ABCD for geoscience data
Expectations when DMP writing
  • ㆍEasily track the creation, management, and sharing of research data generated through assignments.
  • ㆍEnsure reproducibility of research data.
  • ㆍEnsure research integrity by allowung the reuse of research data.
  • ㆍIdentify duplicate studies to improve the efficiency of a researcher’s study.
Version No. Date Contents
0.1 2023. 03. 20. Create document outline
0.6 2023. 04. 28. Create draft
0.8 2023. 05. 08. Guideline review
1.0 2023. 05. 19. Accept review comments