Skip to content

Instantly share code, notes, and snippets.

@rjurney
Created October 3, 2023 18:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rjurney/68322b9ccdb817f50e71fa1ad01a3e09 to your computer and use it in GitHub Desktop.
Save rjurney/68322b9ccdb817f50e71fa1ad01a3e09 to your computer and use it in GitHub Desktop.
Seeking feedback on my ChatGPT prompting. What can I do to improve this result?

I have run the following code to compute dimension reduction with unlabeled UMAP and DBScan for clustering to group dissimilar names for the same academic journals into clusters representing each journal.

The UMAP code is:

# Step 2: Dimension Reduction with UMAP
reducer = umap.UMAP()
reduced_embeddings = reducer.fit_transform(scaled_embeddings)

The DBSCAN clustering is:

# Step 3: Clustering with DBSCAN - you can search for the best hyperparameters
dbscan = DBSCAN(eps=0.5, min_samples=100)
clusters = dbscan.fit_predict(reduced_embeddings)

To describe the clusters, I run:

np.unique(clusters, return_counts=True)

Which returns (in a Jupyter notebook):

(array([-1,  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
        16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28]),
 array([2332, 7005, 2474,  379,  188, 2381, 3074,  210,  261, 1032,  264,
         468,  149, 1955, 1136,  497,  575,  242,  360,  275,  336,  287,
         512,  269,  112,  132,  190,  444,  105,  126]))

These clusters look good. Now I want to plot the data using the seaborn library so that they are in a 2 dimensional plot, colored by the cluster ID. Please take a deep breath, and write code to do this. Include comments for students to understand what you are doing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment