Fuzzy C-means Clustering Algorithm for Cluster Membership Determination

Document Type : Original Article

Authors

1 National Research Institute of Astronomy and Geophysics (NRIAG Dean of Thebes Institue of computer science - Cairo - Egypt.

2 Department of Mathematics, Faculty of Science, Zagazig University, Zagazig, Egypt

Abstract

  Membership of open star clusters or Galactic open clusters is very important roles to study the formation and evaluation of gravitationally bound system. In the field of an image, the relative x and y coordinate positions of each star with respect to all the other stars are adapted. Therefore, in this paper, a new method for the determination open star cluster membership based on Fuzzy C-means Clustering algorithm is proposed. In the fuzzy clustering algorithm, a data point may belong to several clusters with different degree of memberships. Therefore, the membership values for a data point will represents the degree to which that point belongs to a particular cluster. The proposed method allows to efficiently discriminating the cluster membership from the field stars. The membership probabilities have been calculated and compared to those obtained by the other methods. To validate the method, we applied it on Berkeley 39 and NGC 188 open star clusters, where the membership stars in these clusters are obtained. The membership probability of stars clusters is assigned through the approach provides number of probable members.

Highlights

In this paper, we presented a new method for open star cluster membership determinations based on fuzzy C-means algorithm. This method is able to handle large databases both objectively and automatically. The method includes functions to perform structure of open star cluster, and integrated color range estimations statistically for star membership.  To demonstrate the method quality and test how well it handles real clusters, we applied it on two open star cluster CCD observation and data collected from WEBDA.  We obtained the cluster center and cluster radius, and some parameter have been calculated e. g the range of the color and membership using available UBV, and 2MASS. The resulting values were compared with studies using the photometric system or other method when available, the member of cluster using this method is smaller the previse result as well as the resulting cluster center, cluster readies and membership estimates showed very good compared with other studies.

The method is the best and a first one in the subject of automatization and standardization of membership study of open star cluster, it is does not require initial values of central coordinates. The method shows that the Fuzzy C-means algorithm can converge the best solution, and it has a high convergence rate and high accuracy. The proposed algorithm has been provided the better and clear way to determine the cluster center, cluster radius and cluster membership.

A good result of Fuzzy C-means algorithm confirms that the convergence rate of the different discipline interesting (i.e. for determining the cluster center, cluster radius and membership of star cluster as a compare with work on paper [14].

 

Keywords

Main Subjects


Open star clusters or Galactic open clusters are gravitationally bound system of stars formed together. Open star clusters range from very sparse clusters with only a few members to large agglomerations containing thousands of stars. They usually consist of quite a distinct dense core, surrounded by a more diffuse 'corona' of cluster members. The core is typically about 3–4 light years across, with the corona extending to about 20 light years from the cluster center. Typical star densities in the center of a cluster are about 1.5 stars per cubic light year (the stellar density near the sun is about 0.003 star per cubic light year) [1]. Trumpler in 1930 [2] defined open star clusters as stars grouping as physical systems (stars situated at the same distance and probably of the same origin) and at the same time are sufficiently rich in stars for statistical investigation.  All star clusters have been discovered either by visual examination of the sky with a telescope, or from inspection of photographic or electronic images in the visual or infrared.

Various methods based on the analysis of positions, proper motions, radial velocities, magnitudes and their combinations have been proposed to determine the members of open clusters [3]. Many methods based on mathematically algorithm and analysis of open star cluster data and their combinations have been proposed to determine the members of open clusters. The first mathematically procedure for determination of open cluster membership was developed in [4], with a statistical analysis of proper motions. It is also the most widely used method. Sanders’s approach is based on the model of overlapping distributions of field and cluster stars in the neighborhood and within the region of visible grouping of stars, introduced in [5]. However, this method is high time consuming therefore, this problem can be considered as machine learning clustering problem.

In recent years, clustering has more attention since it has been used in many applications such as, data mining, pattern recognition, machine learning, image segmentation and fault diagnosis, bioinformatics, computer vision, information retrieval [6]. Clustering is defined as the process of partitioning an unlabeled data set into groups of similar objects based on similarity measures. Therefore, the objects in the same cluster are more similar than other clusters.

The clustering algorithms can be classified into, hierarchical and partitioning [7, 8, 9, 10]. A hierarchical clustering algorithm creates a hierarchy of clusters which may be represented in a tree structure called dendrogram. The root of the tree consists of a single cluster containing all observations, and the leaves correspond to individual observations.

The partitioning clustering algorithm (such as K-means) depends on split the data into set of groups (may be prior predefined) then based on some measures it update the member inside these groups. However, in many real-world applications, there are no sharp boundaries between different clusters. To solve this problem fuzzy clustering is good alternative. Also, K-means is time-consuming since it needs an exhaustive search in a huge space, whereas in a fuzzy model all the variables are continuous, so that derivatives can be computed to find the right direction for the search.  In fuzzy clustering, a data point may belong to several clusters with different degree of memberships. Therefore, the membership values for a data point will represent the degree to which that point belongs to a particular cluster. The Fuzzy C-means algorithm is one of the most efficient methods for solving the fuzzy clustering problems. In FCM, the goal is to minimize the criterion function, taking into account the similarity of elements and cluster centers. It is more useful for data sets that have highly overlapping groups. It has become a popular clustering algorithm.

The aim of this paper is to improve the previous our work [3] by avoid the drawbacks of the K-means algorithm, by using the Fuzzy C-means algorithm. The membership function of fuzzy C- means has been used to represent the degree of each element in open cluster. The experimental is performed using the open cluster data namely, NGC 188 and Berkeley 39.

The paper is organized as follows: in section 2, the Fuzzy C-means is introduced. In section 3, the proposed algorithm is introduced. In section 4, the data and result have been introduced. The conclusion and summary presented in section 4.

[1] Subramaniam A., Gorti U., Sagar R., Bhatt H. C., "Probable binary open star clusters in the Galaxy", Astronomy and Astrophysics, v.302, p.86, 1995.
[2] Platais I., Platais V.K., Mathieu R.D., Girard T.M., van Altena W.F., 2003, Astron. J. 126, 2922
[3] Mohamed Abd El Aziz, 1.M. Selim and A. Essam, Exp Astronomy, 2016
[4] Sanders, W.L.: A&A 15, 368 (1971)
[5] Gao, Xin-hua; Chen, Li; Hou, Zhen-jie 2014 ChA&A..38..257G Javakhishvili, V. Kukhianidze, M. Todua, and R. Inasaridze, 2006, A&A 447, 915–919
[6] Kaluzny, Janusz; Richtler, CCD BV photometry of the old open cluster Berkeley 39. AcA.39.139K, (1989).
[7] Blake, R. M.; Rucinski, S. M. (2004) Photometry and Spectroscopy of Short-Period  Binary Stars in Four Old Open Clusters. AAS.205.8504B
[8] Bragaglia, A.; Gratton, R. G.; Carretta, E.; D'Orazi, V.; Sneden, C.; Lucatello, S (2012) Searching for multiple stellar populations in the massive, old open cluster   Berkeley 39. A&A...548A.122B
[9] Kassis, Marc; Janes, Kenneth A.; Friel, Eileen D.; Phelps, Randy L.(1997) Deep CCD  Photometry of Old Open Clusters. AJ.113.1723K
[10] MacMinn, Donn; Phelps, Randy L.; Janes, Kenneth A.; Friel, Eileen D. (1994) Berkeley 20: an unusual old open cluster.  AJ....107.1806M
[11] SELIM I. M.,  A. A. HAROON, H.A. ISMAIL, N.M. AHMED1, A. ESSAM, G.B. ALI, Romanian Astron. J. , 2014, Vol. 24, No. 2, p. 159–168, Bucharest,
[12] Zhen-Yu Wu, Xu Zhou, Jun Ma, Zhao-Ji Jiang, and Jian-Sheng Chen, Publications of the Astronomical Society of the Pacific, 2006, 118: 1104–1111
[13] Alaa Ali1, H.A. Ismail1, Z. Alsolami, Astrophys Space Sci, 2015, 357-21
[14] Essam, A., Selim, I. M.: IJAA, 5, 173–181 (2015)…………………………..
[15] Platais I., Platais V.K., Mathieu R.D., Girard T.M., van Altena W.F., 2003, Astron. J. 126, 2922
[16] Fornal B., Tucker D.L., Smith J.A., Allam S.S., Rider C.J., Sung, 2007, Astron. J. 133, 1409
[17] Amir Ben-Dor, Ron Shamir and Zohar Yakini, 1999, Journal of Computational Biology, 6(3/4): 281-297
[18] Fornal B., Tucker D.L., Smith J.A., Allam S.S., Rider C.J., Sung, 2007, Astron. J. 133, 1409