Step 1: Setting Up Ollama
Installing a local model used to be cumbersome, but Ollama simplifies the process. It's available for MacOS, Linux, and Windows (via Windows Subsystem For Linux). Ollama is an open-source, free download.
After downloading, you can install Mixtral with a single command:
This command downloads the model, which may take some time. Note that Mixtral requires 48GB of RAM for optimal performance. If this is too much, consider Mistral 7b, installed in the same manner:
For this tutorial, we'll focus on Mixtral, but the steps are similar for Mistral.
Once the model is operational, Ollama facilitates direct interaction. However, to leverage the model with your data, we integrate it with LlamaIndex. The following steps provide detailed code instructions, but you can also access the complete code in our open-source repository.
Step 2: Install Necessary Dependencies
First, install LlamaIndex and other required packages:
Step 3: Conducting a Smoke Test
Ensure Ollama and LlamaIndex are working together with this simple script:
from llama_index.llms import Ollama
llm = Ollama(model="mixtral")
response = llm.complete("Who is Laurie Voss?")
print(response)
Step 4: Load and Index Data
Next, prepare your data for indexing. In this example, we use a collection of tweets. We utilize Qdrant, an open-source vector database, for data storage:
from pathlib import Path
import qdrant_client
from llama_index import (
VectorStoreIndex,
ServiceContext,
download_loader,
)
from llama_index.llms import Ollama
from llama_index.storage.storage_context import StorageContext
from llama_index.vector_stores.qdrant import QdrantVectorStore
JSONReader = download_loader("JSONReader")
loader = JSONReader()
documents = loader.load_data(Path('./data/tinytweets.json'))
client = qdrant_client.QdrantClient(path="./qdrant_data")
vector_store = QdrantVectorStore(client=client, collection_name="tweets")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
llm = Ollama(model="mixtral")
service_context = ServiceContext.from_defaults(llm=llm,embed_model="local")
index = VectorStoreIndex.from_documents(documents,service_context=service_context,storage_context=storage_context)
query_engine = index.as_query_engine()
response = query_engine.query("What does the author think about Star Trek? Give details.")
print(response)
Step 5: Using the Pre-built Index
To utilize the existing index, start a new file:
import qdrant_client
from llama_index import (
VectorStoreIndex,
ServiceContext,
)
from llama_index.llms import Ollama
from llama_index.vector_stores.qdrant import QdrantVectorStore
client = qdrant_client.QdrantClient(path="./qdrant_data")
vector_store = QdrantVectorStore(client=client, collection_name="tweets")
llm = Ollama(model="mixtral")
service_context = ServiceContext.from_defaults(llm=llm,embed_model="local")
index = VectorStoreIndex.from_vector_store(vector_store=vector_store,service_context=service_context)
query_engine = index.as_query_engine(similarity_top_k=20)
response = query_engine.query("Does the author like SQL? Give details.")
print(response)
Step 6: Creating a Web Service
To make your index accessible via an API, install Flask:
Then, set up a basic Flask server:
from flask import Flask, request, jsonify
from flask_cors import CORS, cross_origin
import qdrant_client
from llama_index.llms import Ollama
from llama_index import (
VectorStoreIndex,
ServiceContext,
)
from llama_index.vector_stores.qdrant import QdrantVectorStore
client = qdrant_client.QdrantClient(path="./qdrant_data")
vector_store = QdrantVectorStore(client=client, collection_name="tweets")
llm = Ollama(model="mixtral")
service_context = ServiceContext.from_defaults(llm=llm,embed_model="local")
index = VectorStoreIndex.from_vector_store(vector_store=vector_store,service_context=service_context)
app = Flask(__name__)
cors = CORS(app)
app.config['CORS_HEADERS'] = 'Content-Type'
@app.route('/')
def hello_world():
return 'Hello, World!'
@app.route('/process_form', methods=['POST'])
@cross_origin()
def process_form():
query = request.form.get('query')
if query:
query_engine = index.as_query_engine(similarity_top_k=20)
response = query_engine.query(query)
return jsonify({"response": str(response)})
else:
return jsonify({"error": "query field is missing"}), 400
if __name__ == '__main__':
app.run()
Run the server with python app.py
and use cURL to test:
curl --location 'http://127.0.0.1:5000/process_form' \ --form 'query="What does the author think about Star Trek?"
Conclusion
We explored setting up Mixtral 8x7b with LlamaIndex, creating and querying an index using Qdrant, and developing a simple web API. All these tools are open-source, free, and run locally. This guide should serve as a practical introduction to local model implementation using LlamaIndex.