Method: Apply SVD throw away the points not-estimated well (look at the estimation error of those points) [this is important since the SVD repr. of those points is likely noise]

for each data point compute top k nearest neighbors (e.g. top 100 or top 1000 or until a threshold is reached)

create the data point to data point matrix make sure the graph represented in this matrix is not too disconnected (if so and this is desired, apply the following method to each connected component)

unitl AA’ is not well approximated: dir = Iteratively compute the principlal direction on the truncated matrix project all the data into this direction split the data points by their projections into three groups +1,-1, 0(or ?) truncate by setting the (?) to 0 set AA’ = AA’ - (lambda*)uu’ (u’ is the truncated version; lambda is the learning rate) //essentially a sparse additive model with a sigmoid note that AA’ is not the ground truth but 20 % approximation. So we may want to recompute the distances to the points in +1,-1 buckets (the other thing is that inverted index typically can handle positive only) [make sure no point appears in more than 100 directions; you can put it in the objective]

Because in the end you may not get the residual to be 0, you can repeat a few times by adding the residula to AA’ (the residual is noise)

inutitively each direction is the overlap with a very strong cluster. You can also do some kind of “sparse” k-means and maybe improve.