Jan 24, 2024

Running Mixtral 8x7 locally with LlamaIndex and Ollama

There's been quite a buzz around the latest offering from European AI giant Mistral AI: Mixtral 8x7b. This "mixture of experts" model boasts eight individual models, each trained with 7 billion parameters – hence its name. Initially announced via a surprising tweet, a detailed blog post soon followed, demonstrating its capability to rival GPT-3.5 and even outperform Llama2 70b in various benchmarks.

Author:

Artem Vysotsky

Reviewed by:

Reviewed:

Reviewed by:

Sergey Vysotsky

Step 1: Setting Up Ollama

Installing a local model used to be cumbersome, but Ollama simplifies the process. It's available for MacOS, Linux, and Windows (via Windows Subsystem For Linux). Ollama is an open-source, free download.

After downloading, you can install Mixtral with a single command:

This command downloads the model, which may take some time. Note that Mixtral requires 48GB of RAM for optimal performance. If this is too much, consider Mistral 7b, installed in the same manner:

For this tutorial, we'll focus on Mixtral, but the steps are similar for Mistral.

Once the model is operational, Ollama facilitates direct interaction. However, to leverage the model with your data, we integrate it with LlamaIndex. The following steps provide detailed code instructions, but you can also access the complete code in our open-source repository.

Step 2: Install Necessary Dependencies

First, install LlamaIndex and other required packages:

Step 3: Conducting a Smoke Test

Ensure Ollama and LlamaIndex are working together with this simple script:

from llama_index.llms import Ollama 

llm = Ollama(model="mixtral") 
response = llm.complete("Who is Laurie Voss?") 
print(response)

Step 4: Load and Index Data

Next, prepare your data for indexing. In this example, we use a collection of tweets. We utilize Qdrant, an open-source vector database, for data storage:

from pathlib import Path
import qdrant_client
from llama_index import (
    VectorStoreIndex,
    ServiceContext,
    download_loader,
)
from llama_index.llms import Ollama
from llama_index.storage.storage_context import StorageContext
from llama_index.vector_stores.qdrant import QdrantVectorStore

JSONReader = download_loader("JSONReader")
loader = JSONReader()
documents = loader.load_data(Path('./data/tinytweets.json'))

client = qdrant_client.QdrantClient(path="./qdrant_data")
vector_store = QdrantVectorStore(client=client, collection_name="tweets")
storage_context = StorageContext.from_defaults(vector_store=vector_store)

llm = Ollama(model="mixtral")
service_context = ServiceContext.from_defaults(llm=llm,embed_model="local")

index = VectorStoreIndex.from_documents(documents,service_context=service_context,storage_context=storage_context)

query_engine = index.as_query_engine()
response = query_engine.query("What does the author think about Star Trek? Give details.")
print(response)

Step 5: Using the Pre-built Index

To utilize the existing index, start a new file:

import qdrant_client
from llama_index import (
    VectorStoreIndex,
    ServiceContext,
)
from llama_index.llms import Ollama
from llama_index.vector_stores.qdrant import QdrantVectorStore

client = qdrant_client.QdrantClient(path="./qdrant_data")
vector_store = QdrantVectorStore(client=client, collection_name="tweets")

llm = Ollama(model="mixtral")
service_context = ServiceContext.from_defaults(llm=llm,embed_model="local")

index = VectorStoreIndex.from_vector_store(vector_store=vector_store,service_context=service_context)
query_engine = index.as_query_engine(similarity_top_k=20)
response = query_engine.query("Does the author like SQL? Give details.")
print(response)

Step 6: Creating a Web Service

To make your index accessible via an API, install Flask:

Then, set up a basic Flask server:

from flask import Flask, request, jsonify
from flask_cors import CORS, cross_origin
import qdrant_client
from llama_index.llms import Ollama
from llama_index import (
    VectorStoreIndex,
    ServiceContext,
)
from llama_index.vector_stores.qdrant import QdrantVectorStore

client = qdrant_client.QdrantClient(path="./qdrant_data")
vector_store = QdrantVectorStore(client=client, collection_name="tweets")

llm = Ollama(model="mixtral")
service_context = ServiceContext.from_defaults(llm=llm,embed_model="local")
index = VectorStoreIndex.from_vector_store(vector_store=vector_store,service_context=service_context)

app = Flask(__name__)
cors = CORS(app)
app.config['CORS_HEADERS'] = 'Content-Type'

@app.route('/')
def hello_world():
    return 'Hello, World!'

@app.route('/process_form', methods=['POST'])
@cross_origin()
def process_form():
    query = request.form.get('query')
    if query:
        query_engine = index.as_query_engine(similarity_top_k=20)
        response = query_engine.query(query)
        return jsonify({"response": str(response)})
    else:
        return jsonify({"error": "query field is missing"}), 400

if __name__ == '__main__':
    app.run()

Run the server with python app.py and use cURL to test:

curl --location 'http://127.0.0.1:5000/process_form' \ --form 'query="What does the author think about Star Trek?"

Conclusion

We explored setting up Mixtral 8x7b with LlamaIndex, creating and querying an index using Qdrant, and developing a simple web API. All these tools are open-source, free, and run locally. This guide should serve as a practical introduction to local model implementation using LlamaIndex.

Recent Blog Posts

Aug 10, 2025

Best LibreChat Alternatives: Explore Features & Pricing

Aug 10, 2025

Best LibreChat Alternatives: Explore Features & Pricing

Aug 8, 2025

Comparing Top AI Models: ChatGPT vs Gemini vs Claude

Aug 8, 2025

Comparing Top AI Models: ChatGPT vs Gemini vs Claude

Aug 6, 2025

How I Replaced Six Apps (and $200/mo) with All-in-One AI

Aug 6, 2025

How I Replaced Six Apps (and $200/mo) with All-in-One AI

Aug 4, 2025

Support Writingmate.ai on Product Hunt and Get 85% Discount

Aug 4, 2025

Support Writingmate.ai on Product Hunt and Get 85% Discount

Jul 12, 2025

Everything You Need to Know About New Grok 4 From xAI

Jul 12, 2025

Everything You Need to Know About New Grok 4 From xAI

Jul 11, 2025

Best Gemini AI Alternatives in 2025

Jul 11, 2025

Best Gemini AI Alternatives in 2025

Aug 10, 2025

Best LibreChat Alternatives: Explore Features & Pricing

Aug 8, 2025

Comparing Top AI Models: ChatGPT vs Gemini vs Claude

Aug 6, 2025

How I Replaced Six Apps (and $200/mo) with All-in-One AI

Aug 10, 2025

Best LibreChat Alternatives: Explore Features & Pricing

Aug 8, 2025

Comparing Top AI Models: ChatGPT vs Gemini vs Claude

Aug 6, 2025

How I Replaced Six Apps (and $200/mo) with All-in-One AI

Aug 4, 2025

Support Writingmate.ai on Product Hunt and Get 85% Discount

Writingmate

All AIs. One subscription

Start free & save

Writingmate

All AIs. One subscription

Start free & save

Step 1: Setting Up Ollama

Step 2: Install Necessary Dependencies

Step 3: Conducting a Smoke Test

Step 4: Load and Index Data

Step 5: Using the Pre-built Index

Step 6: Creating a Web Service

Conclusion

Recent Blog Posts

Start Using AISmarter

Start Using AI
Smarter