Dataset Introduction:
Welcome to the NuSEA-dataset v1.0, where we present a pathological image dataset. Nuclei segmentation is a potential pre-requisite for numerous quantitative analyses, since the size, shape, axis, and distribution of the nuclei may impact disease diagnosis, treatment, and prognosis. In this dataset, we provide:
(1) Diverse Organs: We have collected pathological images from 12 different organs sourced from the TCGA dataset (https://portal.gdc.cancer.gov/exploration Accessed on July 17, 2023). These images originate from multiple medical centers, offering a rich variety of staining, making the dataset closely resemble real-world pathological scenarios.
(2) High-Resolution Images: Comprising 1185 high-resolution images, the dataset showcases diverse cell morphologies and distributions, closely resembling real-life pathological cases.
(3) Manual Nuclei Annotations: Over 100,000 nuclei have been manually annotated using elliptical instances, with position information recorded in JSON files, accompanied by corresponding elliptical contour images.
(4) NuSEA v1.0 Model: Built upon the upcoming article NuSEA (already submitted), the NuSEA v1.0 model is provided with its core codes available on GitHub ( https://github.com/dreambamboo/NuSEA/ ), serving as a reference for future research.
(5) Coarse-grained Nuclei Contour Annotations: Utilizing the NuSEA v1.0 model, we generated coarse-grained annotations of cell nucleus contours, accompanied by binary files and instance visualization images.
The following images are examples of the results of manual elliptical annotations and coarse-grained segmentation of nuclei.
Dataset Applications:
We firmly believe that the NuSEA-dataset holds great potential for facilitating a wide array of both quantitative and qualitative analyses in the future. Currently, its efficacy has been demonstrated in the following tasks:
(1) Enriched Weak Supervision with Elliptical Annotations: The manual elliptical annotations serve as invaluable weak supervision for weakly supervised learning algorithms, providing more informative auxiliary cues compared to point or rectangular annotations.
(2) Coarse-grained Nuclei Annotations: Given the time-consuming nature of fine-grained manual cell nucleus annotations, our generated coarse-grained annotations achieve performance notably better than existing fully supervised algorithms. These annotations closely approximate the actual contours of cell nuclei, making them suitable as supervision for fully supervised learning approaches.