Skip to content

Instantly share code, notes, and snippets.

@psychemedia
Last active December 21, 2023 17:30
Show Gist options
  • Save psychemedia/51f45fbfe160f78605bdd0c1b404e499 to your computer and use it in GitHub Desktop.
Save psychemedia/51f45fbfe160f78605bdd0c1b404e499 to your computer and use it in GitHub Desktop.
Example of running GPT4all local LLM via langchain in a Jupyter notebook (Python)
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "5a005358",
"metadata": {},
"source": [
"# GPT4All Langchain Demo\n",
"\n",
"Example of locally running [`GPT4All`](https://github.com/nomic-ai/gpt4all), a 4GB, *llama.cpp* based large language model (LLM) under [`langchain`](https://github.com/hwchase17/langchain), in a Jupyter notebook running a Python 3.10 kernel.\n",
"\n",
"*Tested on a mid-2015 16GB Macbook Pro, concurrently running Docker (a single container running a sepearate Jupyter server) and Chrome with approx. 40 open tabs).*"
]
},
{
"cell_type": "markdown",
"id": "d226d19c",
"metadata": {},
"source": [
"## Model preparation"
]
},
{
"cell_type": "markdown",
"id": "dde35bcc",
"metadata": {},
"source": [
"- download `gpt4all` model:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d5237686",
"metadata": {},
"outputs": [],
"source": [
"#https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin"
]
},
{
"cell_type": "markdown",
"id": "57520ec5",
"metadata": {},
"source": [
"- download `llama.cpp` 7B model"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "37bed5fc",
"metadata": {},
"outputs": [],
"source": [
"#%pip install pyllama\n",
"#!python3.10 -m llama.download --model_size 7B --folder llama/"
]
},
{
"cell_type": "markdown",
"id": "0c21e391",
"metadata": {},
"source": [
"- transform `gpt4all` model:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4c11c606",
"metadata": {},
"outputs": [],
"source": [
"#%pip install pyllamacpp\n",
"#!pyllamacpp-convert-gpt4all ./gpt4all-main/chat/gpt4all-lora-quantized.bin llama/tokenizer.model ./gpt4all-main/chat/gpt4all-lora-q-converted.bin"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "a858d90f",
"metadata": {},
"outputs": [],
"source": [
"GPT4ALL_MODEL_PATH = \"./gpt4all-main/chat/gpt4all-lora-q-converted.bin\""
]
},
{
"cell_type": "markdown",
"id": "72d09e27",
"metadata": {},
"source": [
"## `langchain` Demo\n",
"\n",
"Example of running a prompt using `langchain`."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "65c9ca4d",
"metadata": {},
"outputs": [],
"source": [
"#https://python.langchain.com/en/latest/ecosystem/llamacpp.html\n",
"#%pip uninstall -y langchain\n",
"#%pip install --upgrade git+https://github.com/hwchase17/langchain.git\n",
"\n",
"from langchain.llms import LlamaCpp\n",
"from langchain import PromptTemplate, LLMChain"
]
},
{
"cell_type": "markdown",
"id": "500eb501",
"metadata": {},
"source": [
"- set up prompt template:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "60202ce2",
"metadata": {},
"outputs": [],
"source": [
"template = \"\"\"\n",
"Question: {question}\n",
"Answer: Let's think step by step.\n",
"\"\"\"\n",
"\n",
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])"
]
},
{
"cell_type": "markdown",
"id": "ad12a227",
"metadata": {},
"source": [
"- load model:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "5d83c596",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"llama_model_load: loading model from './gpt4all-main/chat/gpt4all-lora-q-converted.bin' - please wait ...\n",
"llama_model_load: n_vocab = 32001\n",
"llama_model_load: n_ctx = 512\n",
"llama_model_load: n_embd = 4096\n",
"llama_model_load: n_mult = 256\n",
"llama_model_load: n_head = 32\n",
"llama_model_load: n_layer = 32\n",
"llama_model_load: n_rot = 128\n",
"llama_model_load: f16 = 2\n",
"llama_model_load: n_ff = 11008\n",
"llama_model_load: n_parts = 1\n",
"llama_model_load: type = 1\n",
"llama_model_load: ggml map size = 4017.70 MB\n",
"llama_model_load: ggml ctx size = 81.25 KB\n",
"llama_model_load: mem required = 5809.78 MB (+ 2052.00 MB per state)\n",
"llama_model_load: loading tensors from './gpt4all-main/chat/gpt4all-lora-q-converted.bin'\n",
"llama_model_load: model size = 4017.27 MB / num tensors = 291\n",
"llama_init_from_file: kv self size = 512.00 MB\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 572 ms, sys: 711 ms, total: 1.28 s\n",
"Wall time: 1.42 s\n"
]
}
],
"source": [
"%%time\n",
"\n",
"llm = LlamaCpp(model_path=GPT4ALL_MODEL_PATH)"
]
},
{
"cell_type": "markdown",
"id": "a40b30f6",
"metadata": {},
"source": [
"- create language chain using prompt template and loaded model:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "112a5b4b",
"metadata": {},
"outputs": [],
"source": [
"llm_chain = LLMChain(prompt=prompt, llm=llm)"
]
},
{
"cell_type": "markdown",
"id": "5351ee46",
"metadata": {},
"source": [
"- run prompt:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "41b6cbbd",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 5min 2s, sys: 4.17 s, total: 5min 6s\n",
"Wall time: 43.7 s\n"
]
},
{
"data": {
"text/plain": [
"'1) The year Justin Bieber was born (2005):\\n2) Justin Bieber was born on March 1, 1994:\\n3) The Buffalo Bills won Super Bowl XXVIII over the Dallas Cowboys in 1994:\\nTherefore, the NFL team that won the Super Bowl in the year Justin Bieber was born is the Buffalo Bills.'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"question = \"What NFL team won the Super Bowl in the year Justin Bieber was born?\"\n",
"\n",
"llm_chain.run(question)"
]
},
{
"cell_type": "markdown",
"id": "b9b80c84",
"metadata": {},
"source": [
"Another example..."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "a2946ade",
"metadata": {},
"outputs": [],
"source": [
"template = \"\"\"\n",
"Question: {question}\n",
"Answer: \n",
"\"\"\"\n",
"\n",
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "12b59f4a",
"metadata": {},
"outputs": [],
"source": [
"llm_chain = LLMChain(prompt=prompt, llm=llm)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "3d56cc93",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 14min 37s, sys: 5.56 s, total: 14min 42s\n",
"Wall time: 2min 4s\n"
]
},
{
"data": {
"text/plain": [
"\"A relational database is a type of database management system (DBMS) that stores data in tables where each row represents one entity or object (e.g., customer, order, or product), and each column represents a property or attribute of the entity (e.g., first name, last name, email address, or shipping address).\\n\\nACID stands for Atomicity, Consistency, Isolation, Durability:\\n\\nAtomicity: The transaction's effects are either all applied or none at all; it cannot be partially applied. For example, if a customer payment is made but not authorized by the bank, then the entire transaction should fail and no changes should be committed to the database.\\nConsistency: Once a transaction has been committed, its effects should be durable (i.e., not lost), and no two transactions can access data in an inconsistent state. For example, if one transaction is in progress while another transaction attempts to update the same data, both transactions should fail.\\nIsolation: Each transaction should execute without interference from other concurrently executing transactions, thereby ensuring its properties are applied atomically and consistently. For example, two transactions cannot affect each other's data\""
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"question = \"What is a relational database and what is ACID in that context?\"\n",
"\n",
"llm_chain.run(question2)"
]
},
{
"cell_type": "markdown",
"id": "99219ee6",
"metadata": {},
"source": [
"## Generating Embeddings\n",
"\n",
"We can also use the model to generate embddings."
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "110cace5",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"llama_model_load: loading model from './gpt4all-main/chat/gpt4all-lora-q-converted.bin' - please wait ...\n",
"llama_model_load: n_vocab = 32001\n",
"llama_model_load: n_ctx = 512\n",
"llama_model_load: n_embd = 4096\n",
"llama_model_load: n_mult = 256\n",
"llama_model_load: n_head = 32\n",
"llama_model_load: n_layer = 32\n",
"llama_model_load: n_rot = 128\n",
"llama_model_load: f16 = 2\n",
"llama_model_load: n_ff = 11008\n",
"llama_model_load: n_parts = 1\n",
"llama_model_load: type = 1\n",
"llama_model_load: ggml map size = 4017.70 MB\n",
"llama_model_load: ggml ctx size = 81.25 KB\n",
"llama_model_load: mem required = 5809.78 MB (+ 2052.00 MB per state)\n",
"llama_model_load: loading tensors from './gpt4all-main/chat/gpt4all-lora-q-converted.bin'\n",
"llama_model_load: model size = 4017.27 MB / num tensors = 291\n",
"llama_init_from_file: kv self size = 512.00 MB\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 548 ms, sys: 795 ms, total: 1.34 s\n",
"Wall time: 1.36 s\n"
]
}
],
"source": [
"%%time\n",
"#https://abetlen.github.io/llama-cpp-python/\n",
"#%pip uninstall -y llama-cpp-python\n",
"#%pip install --upgrade llama-cpp-python\n",
"\n",
"from langchain.embeddings import LlamaCppEmbeddings\n",
"\n",
"llama_embeddings = LlamaCppEmbeddings(model_path=GPT4ALL_MODEL_PATH)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "c90c768e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 9.71 s, sys: 1.5 s, total: 11.2 s\n",
"Wall time: 1.59 s\n"
]
}
],
"source": [
"%%time\n",
"text = \"This is a test document.\"\n",
"\n",
"query_result = llama_embeddings.embed_query(text)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "7e8efbf4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 10.4 s, sys: 59.7 ms, total: 10.4 s\n",
"Wall time: 1.47 s\n"
]
}
],
"source": [
"%%time\n",
"doc_result = llama.embed_documents([text])"
]
},
{
"cell_type": "markdown",
"id": "20f62806",
"metadata": {},
"source": [
"## Example Query Supported by a Document Based Knowledge Source\n",
"\n",
"Example document query using the example from the [`langchain` docs](https://python.langchain.com/en/latest/use_cases/question_answering.html).\n",
"\n",
"The idea is to run the query against a document source to retrieve some relevant context, and use that as part of the prompt context."
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "84ca9bb4",
"metadata": {},
"outputs": [],
"source": [
"#https://python.langchain.com/en/latest/use_cases/question_answering.html\n",
"\n",
"template = \"\"\"\n",
"Question: {question}\n",
"Answer: \n",
"\"\"\"\n",
"\n",
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])\n",
"\n",
"llm_chain = LLMChain(prompt=prompt, llm=llm)"
]
},
{
"cell_type": "markdown",
"id": "38c43eca",
"metadata": {},
"source": [
"A naive prompt gives an irrelevant answer:"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "aa8c287c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 58.3 s, sys: 3.59 s, total: 1min 1s\n",
"Wall time: 9.75 s\n"
]
},
{
"data": {
"text/plain": [
"'\\nAnswer: The Pittsburgh Steelers'"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"llm_chain.run(question)"
]
},
{
"cell_type": "markdown",
"id": "2d8fe74f",
"metadata": {},
"source": [
"Now let's try with a source document."
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "62169830",
"metadata": {},
"outputs": [],
"source": [
"#!wget https://raw.githubusercontent.com/hwchase17/langchainjs/main/examples/state_of_the_union.txt\n",
"from langchain.document_loaders import TextLoader\n",
"\n",
"# Ideally....\n",
"loader = TextLoader('./state_of_the_union.txt')"
]
},
{
"cell_type": "markdown",
"id": "2baf9db9",
"metadata": {},
"source": [
"However, creating the embeddings is qute slow so I'm going to use a fragment of the text:"
]
},
{
"cell_type": "code",
"execution_count": 108,
"id": "b8d2d522",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. We can do both. At our border, we’ve installed new technology like cutting-edge"
]
}
],
"source": [
"#ish via chatgpt...\n",
"def search_context(src, phrase, buffer=100):\n",
" with open(src, 'r') as f:\n",
" txt=f.read()\n",
"\n",
" words = txt.split()\n",
" index = words.index(phrase)\n",
" start_index = max(0, index - buffer)\n",
" end_index = min(len(words), index + buffer+1)\n",
" return ' '.join(words[start_index:end_index])\n",
"\n",
"fragment = './fragment.txt'\n",
"with open(fragment, 'w') as fo:\n",
" _txt = search_context('./state_of_the_union.txt', \"Ketanji\")\n",
" fo.write(_txt)\n",
"\n",
"!cat $fragment"
]
},
{
"cell_type": "code",
"execution_count": 101,
"id": "10480de4",
"metadata": {},
"outputs": [],
"source": [
"loader = TextLoader('./fragment.txt')"
]
},
{
"cell_type": "code",
"execution_count": 102,
"id": "4d4dea15",
"metadata": {},
"outputs": [],
"source": [
"#%pip install chromadb\n",
"from langchain.indexes import VectorstoreIndexCreator"
]
},
{
"cell_type": "markdown",
"id": "92170711",
"metadata": {},
"source": [
"Generate an index from the knowledge source text:"
]
},
{
"cell_type": "code",
"execution_count": 103,
"id": "3eff4ff5",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Using embedded DuckDB with persistence: data will be stored in: db\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 2 µs, sys: 2 µs, total: 4 µs\n",
"Wall time: 7.87 µs\n"
]
}
],
"source": [
"%time\n",
"# Time: ~0.5s per token\n",
"# NOTE: \"You must specify a persist_directory oncreation to persist the collection.\"\n",
"# TO DO: How do we load in an already generated and persisted index?\n",
"index = VectorstoreIndexCreator(embedding=llama_embeddings,\n",
" vectorstore_kwargs={\"persist_directory\": \"db\"}\n",
" ).from_loaders([loader])"
]
},
{
"cell_type": "code",
"execution_count": 106,
"id": "21107c72",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 2 µs, sys: 2 µs, total: 4 µs\n",
"Wall time: 10 µs\n"
]
}
],
"source": [
"%time\n",
"pass\n",
"\n",
"# The following errors...\n",
"#index.query(query, llm=llm)\n",
"\n",
"# With the full SOTH text, I got:\n",
"# Error: llama_tokenize: too many tokens;\n",
"# Also getting:\n",
"# ValueError: Requested tokens exceed context window of 512\n",
"# If we do get passed that, \n",
"# NotEnoughElementsException\n",
"# For the latter, somehow need to set something like search_kwargs={\"k\": 1}"
]
},
{
"cell_type": "markdown",
"id": "81059984",
"metadata": {},
"source": [
"It seems the retriever is expecting, by default, 4 results documents. I can't see how to pass in a lower limit (a single response document is acceptable in this case), so we nd to roll our own chain..."
]
},
{
"cell_type": "code",
"execution_count": 121,
"id": "035d144a",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Using embedded DuckDB without persistence: data will be transient\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 5min 59s, sys: 1.62 s, total: 6min 1s\n",
"Wall time: 49.2 s\n"
]
}
],
"source": [
"%%time\n",
"\n",
"# Roll our own....\n",
"\n",
"#https://github.com/hwchase17/langchain/issues/2255\n",
"from langchain.vectorstores import Chroma\n",
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"from langchain.chains import RetrievalQA\n",
"\n",
"documents = loader.load()\n",
"\n",
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)\n",
"texts = text_splitter.split_documents(documents)\n",
"\n",
"# Again, we should persist the db and figure out how to reuse it\n",
"docsearch = Chroma.from_documents(texts, llama_embeddings)"
]
},
{
"cell_type": "code",
"execution_count": 122,
"id": "b6c9d033",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 861 µs, sys: 2.97 ms, total: 3.83 ms\n",
"Wall time: 7.09 ms\n"
]
}
],
"source": [
"%%time\n",
"\n",
"# Just getting a single result document from the knowledge lookup is fine...\n",
"MIN_DOCS = 1\n",
"\n",
"qa = RetrievalQA.from_chain_type(llm=llm, chain_type=\"stuff\",\n",
" retriever=docsearch.as_retriever(search_kwargs={\"k\": MIN_DOCS}))"
]
},
{
"cell_type": "markdown",
"id": "2bb80c55",
"metadata": {},
"source": [
"What do we get in response to our original query now?"
]
},
{
"cell_type": "code",
"execution_count": 127,
"id": "05dcdc74",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"What did the president say about Ketanji Brown Jackson\n",
"CPU times: user 7min 39s, sys: 2.59 s, total: 7min 42s\n",
"Wall time: 1min 6s\n"
]
},
{
"data": {
"text/plain": [
"' The president honored Justice Stephen Breyer and acknowledged his service to this country before introducing Justice Ketanji Brown Jackson, who will be serving as the newest judge on the United States Court of Appeals for the District of Columbia Circuit.'"
]
},
"execution_count": 127,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"\n",
"print(query)\n",
"\n",
"qa.run(query)"
]
},
{
"cell_type": "code",
"execution_count": 128,
"id": "edf7ca99",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 10min 20s, sys: 4.2 s, total: 10min 24s\n",
"Wall time: 1min 35s\n"
]
},
{
"data": {
"text/plain": [
"' The president said that she was nominated by Barack Obama to become the first African American woman to sit on the United States Court of Appeals for the District of Columbia Circuit. He also mentioned that she was an Army veteran, a Constitutional scholar, and is retiring Justice of the United States Supreme Court.'"
]
},
"execution_count": 128,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"\n",
"query = \"Identify three things the president said about Ketanji Brown Jackson\"\n",
"\n",
"qa.run(query)"
]
},
{
"cell_type": "code",
"execution_count": 129,
"id": "be183a65",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 12min 31s, sys: 4.24 s, total: 12min 35s\n",
"Wall time: 1min 45s\n"
]
},
{
"data": {
"text/plain": [
"\"\\n\\nITEM 1: President Trump honored Justice Breyer for his service to this country, but did not specifically mention Ketanji Brown Jackson.\\n\\nITEM 2: The president did not identify any specific characteristics about Justice Breyer that would be useful in identifying her.\\n\\nITEM 3: The president did not make any reference to Justice Breyer's current or past judicial rulings or cases during his speech.\""
]
},
"execution_count": 129,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"\n",
"query = \"\"\"\n",
"Identify three things the president said about Ketanji Brown Jackson. Provide the answer in the form: \n",
"\n",
"- ITEM 1\n",
"- ITEM 2\n",
"- ITEM 3\n",
"\"\"\"\n",
"\n",
"qa.run(query)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
@cg-sachink
Copy link

But this is not working

@rain1024
Copy link

rain1024 commented Apr 7, 2023

I'm having trouble with the following code:

download llama.cpp 7B model
#%pip install pyllama
#!python3.10 -m llama.download --model_size 7B --folder llama/

I install pyllama with the following command successfully

$ pip install pyllama
$ pip freeze | grep pyllama
pyllama==0.0.9
pyllamacpp==1.0.6

However when I run

$ python -m llama.download --model_size 7B --folder llama/

I received an error message:

No module named llama.download

Has anyone else encountered same issue?

Update 1:

When I clone repository pyllama and run from pyllama, I can download the llama folder

@monabiyan
Copy link

from langchain.embeddings import LlamaCppEmbeddings does not work.

@psychemedia
Copy link
Author

psychemedia commented Apr 11, 2023 via email

@JeffreyShran
Copy link

@psychemedia - Any ideas on where to seek out performance gain opportunities?

The final run on#129 is a killer. I was expecting VectorstoreIndexCreator to be something that helped improve performance but it still takes a long time.

@psychemedia
Copy link
Author

@JeffreyShran For performance, I'd probably use a different way of semantically indexing items; the above demo was trying to keep things bounded by reusing the same model for everything.

@MakkiNeutron
Copy link

ValueError: Requested tokens exceed context window of 512 using gpt4All with langchain llamacpp

@JeffreyShran
Copy link

JeffreyShran commented Apr 17, 2023

@JeffreyShran For performance, I'd probably use a different way of semantically indexing items; the above demo was trying to keep things bounded by reusing the same model for everything.

Thanks @psychemedia, my longer term goal is to load in a git repo by switching the loader with the official git one with the hope that I could ask questions about my private projects.

RE: the error reported by @MakkiNeutron. I tried to load in a semi-small Python script as a text file as a preliminary test and also hit this error. Some digging showed that llama.cpp can be modified to increase the token amount. However I feel like this is a wrong step and in fact we should try to do a) as you suggested to my earlier question, use a different index approach and/or b) further split the text into smaller batches than the 500 in your code.

I'm keen but very much a noob in this space and so any insights that you might be able to share would be useful and appreciated, particularly on my expected use-case. Do you feel that I am taking the smartest route or should I use a different approach within langchain or some other tool?

@jrobles98
Copy link

@JeffreyShran Humm I just arrived here but talking about increasing the token amount that Llama can handle is something blurry still since it was trained from the beggining with that amount and technically you should need to recreate the whole training of Llama but increasing the input size. In other words, is a inherent property of the model that is unmutable from the beggining.
Good news is that the input training that Llama was trained on (therefore the maximum possible) is 2048 tokens!

Here you can see that limit on the HF docs looking at the max_position_embeddings parameter

BTW here is a similar thread if you want to take a sneak peak

Nevertheless there are ways to let Llama have more "memory scope", here are some converstional approaches, the last section is the most interesting one for any purpose.

Hope you found it helpfull✌🏼

@JeffreyShran
Copy link

@JeffreyShran Humm I just arrived here but talking about increasing the token amount that Llama can handle is something blurry still since it was trained from the beggining with that amount and technically you should need to recreate the whole training of Llama but increasing the input size. In other words, is a inherent property of the model that is unmutable from the beggining. Good news is that the input training that Llama was trained on (therefore the maximum possible) is 2048 tokens!

Here you can see that limit on the HF docs looking at the max_position_embeddings parameter

BTW here is a similar thread if you want to take a sneak peak

Nevertheless there are ways to let Llama have more "memory scope", here are some converstional approaches, the last section is the most interesting one for any purpose.

Hope you found it helpfull✌🏼

Thanks, that is helpful. However it appears that these settings are already maxed out at default to 2048.

The file I tested with had only a few lines in it, so I think the problem might lie elsewhere.

@jrobles98
Copy link

Thanks, that is helpful. However it appears that these settings are already maxed out at default to 2048.

The file I tested with had only a few lines in it, so I think the problem might lie elsewhere.

Yes, Indeed. I was hoping to find that limit on GPT4All but only found that the standard model used 1024 input tokens. So maybe... the quantized lora version uses a limit of 512 tokens for some reason, although it doens't make that much sense since quantized and lora versions only looses precision rather than dimensionality.

Anyway I think the best way to improve this regard is to try to use other models that we know can handle already 2048 token input. I suggest Vicuna, that was born mainly with this purpose of maxing out input/output.

If somebody can test this it would be so great.

@JeffreyShran
Copy link

JeffreyShran commented Apr 18, 2023

I'm actually using ggml-vicuna-7b-4bit.bin. This is the one I'm having the most trouble with. :)

@mtthw-meyer
Copy link

This would be much easier to follow with the working code in one place instead of only scattered fragments.

@stellarkllc
Copy link

Is it possible to use GPT4All as llm with sql_agent or pandas_agent instead of OpenAI?

@zeke-john
Copy link

I've installed all the packages and still get this: zsh: command not found: pyllamacpp-convert-gpt4all

@lingjiekong
Copy link

I've installed all the packages and still get this: zsh: command not found: pyllamacpp-convert-gpt4all

Try a older version pyllamacpp pip install pyllamacpp==1.0.7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment