psychemedia/GPT4all-langchain-demo.ipynb

Last active December 21, 2023 17:30

Star 30 You must be signed in to star a gist
Fork 20 You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/psychemedia/51f45fbfe160f78605bdd0c1b404e499.js"></script>
Save psychemedia/51f45fbfe160f78605bdd0c1b404e499 to your computer and use it in GitHub Desktop.

Download ZIP

Example of running GPT4all local LLM via langchain in a Jupyter notebook (Python)

Raw

GPT4all-langchain-demo.ipynb

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

cg-sachink commented Apr 7, 2023

But this is not working

rain1024 commented Apr 7, 2023 •

edited

I'm having trouble with the following code:

download llama.cpp 7B model
#%pip install pyllama
#!python3.10 -m llama.download --model_size 7B --folder llama/

I install pyllama with the following command successfully

$ pip install pyllama
$ pip freeze | grep pyllama
pyllama==0.0.9
pyllamacpp==1.0.6

However when I run

$ python -m llama.download --model_size 7B --folder llama/

I received an error message:

No module named llama.download

Has anyone else encountered same issue?

Update 1:

When I clone repository pyllama and run from pyllama, I can download the llama folder

monabiyan commented Apr 11, 2023

from langchain.embeddings import LlamaCppEmbeddings does not work.

Author

psychemedia commented Apr 11, 2023 via email

Try updating langchain?

…

On Tue, 11 Apr 2023 at 17:57, Mohsen Nabian ***@***.***> wrote: ***@***.**** commented on this gist. ------------------------------ from langchain.embeddings import LlamaCppEmbeddings does not work. — Reply to this email directly, view it on GitHub <https://gist.github.com/psychemedia/51f45fbfe160f78605bdd0c1b404e499#gistcomment-4533507> or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAUILH6Z35WV4JXVQM3HMTXAWEO7BFKMF2HI4TJMJ2XIZLTSKBKK5TBNR2WLJDHNFZXJJDOMFWWLK3UNBZGKYLEL52HS4DFQKSXMYLMOVS2I5DSOVS2I3TBNVS3W5DIOJSWCZC7OBQXE5DJMNUXAYLOORPWCY3UNF3GS5DZVRZXKYTKMVRXIX3UPFYGLK2HNFZXIQ3PNVWWK3TUUZ2G64DJMNZZDAVEOR4XAZNEM5UXG5FFOZQWY5LFVEYTEMJXGU3TCNBVU52HE2LHM5SXFJTDOJSWC5DF> . You are receiving this email because you authored the thread. Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub> .

JeffreyShran commented Apr 15, 2023

@psychemedia - Any ideas on where to seek out performance gain opportunities?

The final run on#129 is a killer. I was expecting VectorstoreIndexCreator to be something that helped improve performance but it still takes a long time.

Author

psychemedia commented Apr 16, 2023

@JeffreyShran For performance, I'd probably use a different way of semantically indexing items; the above demo was trying to keep things bounded by reusing the same model for everything.

MakkiNeutron commented Apr 16, 2023

ValueError: Requested tokens exceed context window of 512 using gpt4All with langchain llamacpp

JeffreyShran commented Apr 17, 2023 •

edited

@JeffreyShran For performance, I'd probably use a different way of semantically indexing items; the above demo was trying to keep things bounded by reusing the same model for everything.

Thanks @psychemedia, my longer term goal is to load in a git repo by switching the loader with the official git one with the hope that I could ask questions about my private projects.

RE: the error reported by @MakkiNeutron. I tried to load in a semi-small Python script as a text file as a preliminary test and also hit this error. Some digging showed that llama.cpp can be modified to increase the token amount. However I feel like this is a wrong step and in fact we should try to do a) as you suggested to my earlier question, use a different index approach and/or b) further split the text into smaller batches than the 500 in your code.

I'm keen but very much a noob in this space and so any insights that you might be able to share would be useful and appreciated, particularly on my expected use-case. Do you feel that I am taking the smartest route or should I use a different approach within langchain or some other tool?

jrobles98 commented Apr 18, 2023

@JeffreyShran Humm I just arrived here but talking about increasing the token amount that Llama can handle is something blurry still since it was trained from the beggining with that amount and technically you should need to recreate the whole training of Llama but increasing the input size. In other words, is a inherent property of the model that is unmutable from the beggining.
Good news is that the input training that Llama was trained on (therefore the maximum possible) is 2048 tokens!

Here you can see that limit on the HF docs looking at the max_position_embeddings parameter

BTW here is a similar thread if you want to take a sneak peak

Nevertheless there are ways to let Llama have more "memory scope", here are some converstional approaches, the last section is the most interesting one for any purpose.

Hope you found it helpfull✌🏼

JeffreyShran commented Apr 18, 2023

@JeffreyShran Humm I just arrived here but talking about increasing the token amount that Llama can handle is something blurry still since it was trained from the beggining with that amount and technically you should need to recreate the whole training of Llama but increasing the input size. In other words, is a inherent property of the model that is unmutable from the beggining. Good news is that the input training that Llama was trained on (therefore the maximum possible) is 2048 tokens!

Here you can see that limit on the HF docs looking at the max_position_embeddings parameter

BTW here is a similar thread if you want to take a sneak peak

Nevertheless there are ways to let Llama have more "memory scope", here are some converstional approaches, the last section is the most interesting one for any purpose.

Hope you found it helpfull✌🏼

Thanks, that is helpful. However it appears that these settings are already maxed out at default to 2048.

The file I tested with had only a few lines in it, so I think the problem might lie elsewhere.

jrobles98 commented Apr 18, 2023

Thanks, that is helpful. However it appears that these settings are already maxed out at default to 2048.

The file I tested with had only a few lines in it, so I think the problem might lie elsewhere.

Yes, Indeed. I was hoping to find that limit on GPT4All but only found that the standard model used 1024 input tokens. So maybe... the quantized lora version uses a limit of 512 tokens for some reason, although it doens't make that much sense since quantized and lora versions only looses precision rather than dimensionality.

Anyway I think the best way to improve this regard is to try to use other models that we know can handle already 2048 token input. I suggest Vicuna, that was born mainly with this purpose of maxing out input/output.

If somebody can test this it would be so great.