The Random Walk Blog

2024-09-27

The Story of a Bill: How Well Can AI Models Handle Real-World Math

The Story of a Bill: How Well Can AI Models Handle Real-World Math

Large Language Models (LLMs) have proven effective in tasks like natural language processing, sentiment analysis, data extraction, and answering questions.

According to ChatGPT, complex mathematical operations for it include advanced techniques such as calculus, matrix operations, differential equations, optimization, and probability, often used for solving real-world problems in fields like engineering, physics, and economics. And complex data analysis involves multivariate analysis, statistical modeling, time series analysis, machine learning, and big data handling, used for discovering patterns, making predictions, and drawing insights from large datasets.

Recently, we've explored multiple LLMs’ ability to handle basic mathematical and analytical operations, including additions, subtractions, multiplication, division and percentage calculations and financial data analysis. While LLMs can manage basic arithmetic, While LLMs can manage basic arithmetic, we tested their ability to solve more complex tasks using a restaurant bill, and tried to convert the bill into a table, split the total amount, and calculate each person's percentage share.

First, let’s take the restaurant bill and try to understand how LLM’s can answers complex questions through prompt engineering. We start with basic math and data problems.

Bill.svg

We started by using ChatGPT 4.0 to transform the restaurant bill into a structured table, and it handled the task flawlessly.

ChatGPT example 1.svg

Our second question was to compute the total of the bill, excluding and including taxes. The result is as follows:

To compute the total of the bill:
Excluding Taxes:
The subtotal given on the receipt is ₹13,545.00.
Including Taxes:
We need to include the service charge and all applicable taxes:
Subtotal: ₹13,545.00
Service Charge: ₹406.35
VAT (14.5%): ₹1,424.92
CGST (2.5%): ₹92.95
SGST (2.5%): ₹92.95
Round off: -₹0.17
Total including taxes:
Total
=13,545.00+406.35+1,424.92+92.95+92.95−0.17=₹15,562.00
Total=13,545.00+406.35+1,424.92+92.95+92.95−0.17=₹15,562.00
Summary:
Total Excluding Taxes: ₹13,545.00
Total Including Taxes: ₹15,562.00
The receipt’s calculations are accurate, showing a total of ₹15,562.00 including taxes.

As seen here, ChatGPT accurately performed all the calculations based on the data from the table.

Next, we asked ChatGPT a question,“What would be the bill total including and excluding taxes if Adana Kebab was not purchased?”

To calculate the total of the bill excluding and including taxes if the Adana Kebab (₹449.00) was not purchased, we need to subtract its cost from the totals.
Excluding Taxes:
Original Subtotal: ₹13,545.00
Adana Kebab Cost: -₹449.00
New Subtotal:
New Subtotal= 13,545.00−449.00 = ₹13,096.00
Including Taxes:
We now recalculate the service charge and taxes based on the new subtotal:
Service Charge: (Keeping the original ratio)
New Service Charge = 406.35/13,545.00×13,096.00 = ₹392.80
VAT (14.5%): New VAT = 1,424.92/13,545.00×13,096.00 = ₹1,378.92
CGST (2.5%): New CGST = 92.95/13,545.00×13,096.00 = ₹89.92
SGST (2.5%): New SGST = 92.95/13,545.00×13,096.00 = ₹89.92
Round off: (keeping it the same, as it won't change much): -₹0.17
New Total including taxes:
New Total = 13,096.00+392.80+1,378.92+89.92+89.92−0.17 = ₹15,047.39
Summary:
Total Excluding Taxes (without Adana Kebab): ₹13,096.00
Total Including Taxes (without Adana Kebab): ₹15,047.39
If the Adana Kebab was not purchased, the total bill including taxes would be approximately ₹15,047.39.

Here, ChatGPT successfully calculated the total amount excluding service charges and taxes. However, it struggled to provide the correct total amount when taxes were included. This was due to its inability to recognize that VAT applies only to alcoholic products, while GST is applied to food and non-alcoholic items. After we provided the model with this information, it generated the following result:

Given that VAT is only applied to alcohol and CGST & SGST are applied to non-alcoholic drinks and food,we need to adjust the calculations accordingly when removing the Adana Kebab from the bill.
Total Non-Alcohol Cost (Before Removing Adana Kebab): ₹6509.00
Total Non-Alcohol Cost (After Removing Adana Kebab):₹6509.00−449.00 = ₹6060.00
Step 1: Recalculate VAT
Original VAT: 14.5% on ₹7036.00 = 7036.00×0.145 = ₹1020.22
Step 2: Recalculate CGST and SGST
Original CGST: 2.5% on ₹6509.00 = 6509.00×0.025=₹162.73
Original SGST: 2.5% on ₹6509.00 = 6509.00×0.025=₹162.73
New CGST: 2.5% on ₹6060.00 = 6060.00×0.025=₹151.50
New SGST: 2.5% on ₹6060.00 = 6060.00×0.025=₹151.50
Step 3: Recalculate the Total Bill
Subtotal After Removing Adana Kebab: ₹13,545.00 - ₹449.00 = ₹13,096.00
Service Charge: ₹392.80 (as calculated earlier)
New Total Including Taxes:
New Total = 13,096.00+392.80+1020.22+151.50+151.50−0.17 = ₹14,811.85
Summary:
Total Excluding Taxes (without Adana Kebab): ₹13,096.00
Total Including Taxes (without Adana Kebab): ₹14,811.85
The new total bill, after removing the Adana Kebab and correctly applying the taxes, would be approximately ₹14,811.85.

In this case, the LLM confused food and beverages, leading to inaccuracies in calculating their respective costs. To clarify, we provided the total cost of food, excluding taxes and service charges.

Thank you for the clarification. Let's recalculate the bill with the correct breakdown:
Initial Totals:
Total Cost of Food (including non-alcoholic drinks): ₹3,720.00
Total Cost of Alcohol: ₹9,825.00
Taxes:
VAT (14.5% on Alcohol):
VAT on ₹9,825.00 = 9,825.00×0.145=₹1,424.63
CGST & SGST (2.5% each on Food):
CGST on ₹3,720.00 = 3,720.00×0.025=₹93.00
SGST on ₹3,720.00 = 3,720.00×0.025=₹93.00
Service Charge: ₹406.35
If Adana Kebab (₹449.00) is Removed:
New Total Food Cost: ₹3,720.00 - ₹449.00 = ₹3,271.00
Alcohol Cost: ₹9,825.00 (unchanged)
Recalculated Taxes:
VAT on Alcohol (unchanged): ₹1,424.63
New CGST & SGST on Food:
CGST on ₹3,271.00 = 3,271.00×0.025=₹81.78
SGST on ₹3,271.00 = 3,271.00×0.025=₹81.78
Recalculate the Bill:
New Subtotal:
New Subtotal = 3,271.00+9,825.00=₹13,096.00
Service Charge: ₹406.35
New Total Including Taxes:
New Total=13,096.00+406.35+1,424.63+81.78+81.78−0.17 = ₹15,090.37
Summary:
Total Excluding Taxes (without Adana Kebab): ₹13,096.00
Total Including Taxes (without Adana Kebab): ₹15,090.37
The new total bill, after removing Adana Kebab and correctly applying the taxes, would be approximately ₹15,090.37.

This prompt enabled the LLM to accurately calculate the costs of alcoholic items and determine the new totals, both with and without taxes and service charges. However, one issue remained: it did not proportionally reduce the service charge after removing the Adana Kebab. To address this, we provided two additional prompts, which resulted in the correct calculation.

Our next query was to calculate each person’s total and share in percentage. For this, we gave each person’s share. The following is the query: “Jaya-2328.60, Vijay-2127.60, Jithin-2628.60, Gautham-3077.60, Kiran-3382.60. Calculate total and each persons share in percentage.”

ChatGPT's response was close but not entirely accurate. For the next query, we required the individual split for food and drinks. To facilitate this, we provided a list of the drinks each person had, as the food was shared equally among them. The prompt is given below.

Drinks are charged on the person and the food is shared commonly. Here are the drinks and who had it: Jaya-Cupid, Cosmopolitan, Don Azus (2). Jithin-Mango Meltdown, Mojito (1), Don Azus (2). Gautham-Jagger Bomb, Mojito (1), Don Azus (1). Kiran-Jhonnie Walker Red Label, Monkey Shoulder, Don Azus (2). Vijay-Beefeater, tonic water, Don Azus (1). Show me each persons split

 Initial amounts based on the previous breakdown 

                total_food_cost = 3271.00  # New food subtotal after removing Adana Kebab 
                total_service_charge = 392.94  # New service charge after removing Adana Kebab 
                total_food_cost_with_tax = total_food_cost + (81.78 + 81.78)  # Adding CGST and SGST 

               
                # Drinks cost per person 

                drinks_cost = { 
                    "Jaya": 499.00 + 599.00 + (249.00 * 2), 
                    "Jithin": 799.00 + 599.00 + (249.00 * 2), 
                    "Gautham": 499.00 + 599.00 + 249.00, 
                    "Kiran": 299.00 + 599.00 + (249.00 * 2), 
                    "Vijay": 349.00 + 99.00 + 249.00 
                } 

                 
                # Calculate each person's total cost including their drinks and their share of the food cost 
                total_cost = drinks_cost.copy() 
                num_people = len(total_cost) 
   
                # Each person's share of the food and service charge 
                shared_cost_per_person = (total_food_cost_with_tax + total_service_charge) / num_people 

                
                for person in total_cost: 
                    total_cost[person] += shared_cost_per_person 

                total_cost 

         Result 
                {'Jaya': 2361.5, 
                'Jithin': 2661.5, 
                'Gautham': 2112.5, 
                'Kiran': 2161.5, 
                'Vijay': 1462.5}

In this instance, we requested the total split of the bill, excluding taxes and service charges. However, ChatGPT mistakenly excluded the cost of the Adana Kebab from its calculations. When we asked for a breakdown that included the Adana Kebab, it again provided an incorrect answer. After giving a second prompt, the LLM finally delivered the correct calculation.

Apologies for the confusion. Let's calculate the correct split for the total bill amount of ₹13,545 based on the drinks each person had and the food cost being shared equally. 

                Step-by-Step Breakdown:

                Total Bill Amount: ₹13,545
                
                Drinks Total (Alcohol): ₹9,825 (VAT: ₹1,424.63) 
                Food Total (including Adana Kebab): ₹3,720 (CGST & SGST: ₹186.00) 
                Drinks Cost Per Person: 

                Jaya: Cupid (₹499) + Cosmopolitan (₹599) + Don Azus (2 × ₹249) = ₹1,596 
                
                Jithin: Mango Meltdown (₹799) + Mojito (₹599) + Don Azus (2 × ₹249) = ₹1,895 

              Gautham: Jagger Bomb (₹499) + Mojito (₹599) + Don Azus (₹249) = ₹1,347 
              
                Kiran: Jhonnie Walker Red Label (₹299) + Monkey Shoulder (₹599) + Don Azus (2 × ₹249) = ₹1,395 

                Vijay: Beefeater (₹349) + tonic water (₹99) + Don Azus (₹249) = ₹697 

               Shared Food and Service Charge Cost:
                 
                Total Food Cost: ₹3,720 
                Service Charge: ₹406.35 
                CGST & SGST: ₹186.00 
                
                \text{Shared Cost Per Person} = \frac{\text{Food Cost + Service Charge + CGST & SGST}}{5} = \frac{3,720 + 406.35 + 186.00}{5} = ₹862.07 
                
               Final Calculation:
               
                Jaya's Total: ₹1,596 (Drinks) + ₹862.07 (Shared Food) = ₹2,458.07 
                
                Jithin's Total: ₹1,895 (Drinks) + ₹862.07 (Shared Food) = ₹2,757.07 
                
                Gautham's Total: ₹1,347 (Drinks) + ₹862.07 (Shared Food) = ₹2,209.07 

                Kiran's Total: ₹1,395 (Drinks) + ₹862.07 (Shared Food) = ₹2,257.07 

                Vijay's Total: ₹697 (Drinks) + ₹862.07 (Shared Food) = ₹1,559.07 

             Summary of Each Person’s Total: 
             
                Jaya: ₹2,458.07 
                Jithin: ₹2,757.07 
                Gautham: ₹2,209.07 
                Kiran: ₹2,257.07 
                Vijay: ₹1,559.07 

            These amounts correctly split the total bill of ₹13,545 among everyone based on their drinks and an equal share of the food and service charges. 

Next, we requested a list of food and non-alcoholic drinks from the LLM. However, it hallucinated and provided an inaccurate response, including items that were not on the bill.

chatgpt example 2.svg

The next query was, “Due to discount assume the bill is 12877 and not 15562. Now what is each persons share?”

# Re-defining the amounts and total to correct the issue 
                amounts = { 
                    "Jaya": 2328.60, 
                    "Vijay": 2127.60, 
                    "Jithin": 2628.60, 
                    "Gautham": 3077.60, 
                    "Kiran": 3382.60 
                } 

                # Total of the initial amounts 
                total_initial_amount = sum(amounts.values()) 

                # New total bill after discount 
                new_total_bill = 12877 

                # Calculate the new percentage share and total amount for each person
                new_total_amounts = {} 
                
                for person, amount in amounts.items(): 
                    percentage_share = amount / total_initial_amount 
                    new_total_amounts[person] = percentage_share * new_total_bill 

                new_total_amounts 

ChatGPT's response was nearly correct, reflecting each person's proportional share of the discounted bill. However, it required multiple prompts and follow-up queries to grasp the logic of adding and subtracting taxes and service charges when removing an item and splitting each person’s bill by percentage. Additionally, it began to hallucinate when we asked it to list the names of the items on the bill, providing incorrect answers. To streamline the process, we consolidated all the prompts and follow-up queries into a single prompt and posed the same question in a separate chat. Here’s the prompt:

Convert the above bill into a table format and answer the below questions:

1. Compute the total bill cost with and without - tax and service charge.

2. What would be the bill total including and excluding taxes and service charge if Adana Kebab was not purchased?

3. Jaya-2328.60, Vijay-2127.60, Jithin-2628.60, Gautham-3077.60, Kiran-3382.60. Calculate each persons share in percentage. The total amount should be taken without the tax and service charge.

4. Jaya-2328.60, Vijay-2127.60, Jithin-2628.60, Gautham-3077.60, Kiran-3382.60. Calculate each persons share in percentage. The total amount should be taken with the tax and service charge.

Considerations:

1. VAT is calculated only for alcohol and CGST and SGST are calculated for non-alcoholic drinks and food.

2. Below is a list of food, alcoholic and non-alcoholic answers.

Alcoholic drinks: Cupid, Beefeater, Jagger Bomb, Don Azus, Mango meltdown, Water Melon Jalapeno Tini, Panda Orange, Dice N Nice, Cosmopolitan, Monkey shoulders, Johnnie Walker Red label,  Vodka Prawn

Non-Alcoholic drinks: Tonic Water, Mojito

Food: Crispy Corn, Jalapeno pops, Chermoula Paneer, Adana Kebab, Muttai Paniyaram, Tiramisu, Ice cream, Kunafeh

3. Do no make approximations, give the correct answer upto 2 decimal points

4. Do not make assumptions

5. Do not provide code

6. Give the exact calculations for every step

ChatGPT converted it into a table and gave us a response which is given below.

1. Compute the total bill cost with and without tax and service charge.
Without tax and service charge: ₹13,545.00
With tax and service charge: ₹15,562.00
2. What would be the bill total including and excluding taxes and service charge if Adana Kebab was not purchased?
Without tax and service charge (excluding Adana Kebab):
Total (before tax): ₹13,545.00 - ₹449.00 = ₹13,096.00
Tax and Service Charges Calculation (excluding Adana Kebab):
Service Charge: ₹406.35 - (₹406.35 / ₹13,545.00) × ₹449.00 = ₹389.81
VAT: ₹1424.92 (no change since Adana Kebab is food, not alcohol)
CGST: ₹92.95 - (₹92.95 / ₹13,545.00) × ₹449.00 = ₹89.87
SGST: ₹92.95 - (₹92.95 / ₹13,545.00) × ₹449.00 = ₹89.87
New Grand Total: ₹13,096.00 + ₹389.81 + ₹1424.92 + ₹89.87 + ₹89.87 = ₹15,090.47
3. Jaya-₹2328.60, Vijay-₹2127.60, Jithin-₹2628.60, Gautham-₹3077.60, Kiran-₹3382.60.
Calculate each person's share in percentage of the total amount without tax and service charge (₹13,545.00):
Total Contribution: ₹2328.60 + ₹2127.60 + ₹2628.60 + ₹3077.60 + ₹3382.60 = ₹13545.00
Jaya's Share: (₹2328.60 / ₹13,545.00) × 100 = 17.18%
Vijay's Share: (₹2127.60 / ₹13,545.00) × 100 = 15.70%
Jithin's Share: (₹2628.60 / ₹13,545.00) × 100 = 19.40%
Gautham's Share: (₹3077.60 / ₹13,545.00) × 100 = 22.71%
Kiran's Share: (₹3382.60 / ₹13,545.00) × 100 = 24.98%
4. Jaya-₹2328.60, Vijay-₹2127.60, Jithin-₹2628.60, Gautham-₹3077.60, Kiran-₹3382.60.
Calculate each person's share in percentage of the total amount with tax and service charge (₹15,562.00):
Total Contribution: ₹2328.60 + ₹2127.60 + ₹2628.60 + ₹3077.60 + ₹3382.60 = ₹13,545.00
Jaya's Share: (₹2328.60 / ₹15,562.00) × 100 = 14.96%
Vijay's Share: (₹2127.60 / ₹15,562.00) × 100 = 13.68%
Jithin's Share: (₹2628.60 / ₹15,562.00) × 100 = 16.89%
Gautham's Share: (₹3077.60 / ₹15,562.00) × 100 = 19.78%
Kiran's Share: (₹3382.60 / ₹15,562.00) × 100 = 21.73%

For the second question, the calculated service tax after reducing the cost of the Adana Kebab differed slightly from the actual answer of Rs. 392.88, leading to a minor adjustment in the final result by two points. In response to the third question about percentage calculations, the LLM provided an answer that was approximately correct. However, it faltered on the fourth question, failing to calculate the split of the total amount with and without taxes and service charges. We followed up by asking for each person's amount after including their percentage share of tax and service charges, and it generated an answer that was approximately correct. To further clarify, we inquired about splitting the tax and service charges as a percentage for each person, which resulted in the correct response. We asked it to add these expenses to the individual amounts, and it provided an accurate response.

It gave the correct answer after the two follow up questions. We also asked a query on the discount bill.“Due to discount assume the bill is 12877 and not 15562. Now what is each persons share”.The LLM’s response was accurate.

Thus, we concluded that by providing accurate prompts and instructions to ChatGPT, it would yield approximately accurate responses for math and financial data analysis questions, requiring only four follow-up inquiries.  This highlights the potential benefits of fine-tuning LLMs for specific use cases. In this scenario, fine-tuning ChatGPT on a dataset of bill-related queries and arithmetic calculations helped improve its ability to accurately handle tasks like splitting bills, calculating taxes, and processing multiple related prompts in one go. By adapting the model to better understand financial and tabular data, its performance in similar contexts would likely improve, reducing the need for multiple clarifications and minimizing hallucinations.

We then tested the same prompt with various LLMs and VLMs, and the accuracy of their responses is summarized below:

Claude AI

Claude AI was successful in generating the table from the bill and calculating the basic total including and excluding taxes and service charge. However, it struggled to accurately calculate the query involving the removal of the Adana Kebab, incorrectly reducing the VAT despite our instruction that VAT applies only to alcoholic items. For the fourth question, which asked for each person’s amount including taxes and service charges, it failed to provide an accurate result. After prompting it again for the calculation with tax and service charge, it returned a nearly accurate figure, though it still differed slightly from the correct answer. In the subsequent query, we requested a split of the tax and service charge amounts in both percentage and absolute values for each person, to which it responded with an almost accurate number. When we inquired about the discount, it provided an approximately correct answer. Thus, similar to ChatGPT, Claude AI was also able to deliver nearly correct responses to the mathematical and data analysis queries following effective prompt engineering.

         def calculate_bill(items, exclude_item=None): 
            subtotal = sum(item['price'] * item['quantity'] for item in items if item['name'] != exclude_item) 

            alcohol_total = sum(item['price'] * item['quantity'] for item in items if item['category'] == 'Alcoholic' and item['name'] != exclude_item) 
            non_alcohol_total = subtotal - alcohol_total 

             

            service_charge = subtotal * 0.03 
            vat = alcohol_total * 0.145 
            cgst = non_alcohol_total * 0.025 
            sgst = non_alcohol_total * 0.025 
            
            total = subtotal + service_charge + vat + cgst + sgst 
            return subtotal, total 
            
       def calculate_share(amount, total): 
            return (amount / total) * 100 
            
       # Define items 
        items = [ 
            {'name': 'Cupid', 'price': 499, 'quantity': 1, 'category': 'Alcoholic'}, 
            # ... Add all other items here 
        ]   

        # Calculate totals 
        subtotal, total_with_tax = calculate_bill(items) 
        subtotal_without_kebab, total_without_kebab = calculate_bill(items, 'Adana Kebab') 

        # Calculate shares 
        shares = [2328.60, 2127.60, 2628.60, 3077.60, 3382.60] 
        names = ['Jaya', 'Vijay', 'Jithin', 'Gautham', 'Kiran'] 
        
        shares_without_tax = [calculate_share(share, subtotal) for share in shares] 
        shares_with_tax = [calculate_share(share, total_with_tax) for share in shares] 

        tax_and_service = total_with_tax - subtotal 
        shares_with_tax_amount = [share + (percentage / 100 * tax_and_service) for share, percentage in zip(shares, shares_with_tax)] 

        # Print results 
        print(f"Total without tax: {subtotal:.2f}") 
        print(f"Total with tax: {total_with_tax:.2f}") 
        print(f"Total without Adana Kebab (without tax): {subtotal_without_kebab:.2f}") 
        print(f"Total without Adana Kebab (with tax): {total_without_kebab:.2f}") 

        for name, share_without, share_with, amount in zip(names, shares_without_tax, shares_with_tax, shares_with_tax_amount): 
            print(f"{name}: {share_without:.2f}% (without tax), {share_with:.2f}% (with tax), ₹{amount:.2f}") 

Mistral AI

Mistral AI initially failed to present the bill as a complete table. Its basic calculations for VAT, GST, the total without the Adana Kebab, and the service tax were incorrect, despite clear instructions. However, it managed to provide an approximate calculation of each person's percentage share. It also calculated the split amounts based on these percentages with reasonable accuracy and produced an approximate split amount after applying the discount.

# Total Cost Calculation
total-cost = 499.00 + (3 * 349.00) + 99.00 + (2 * 599.00) + (3 * 499.00) + 249.00 + 699.00 + 699.00 + 299.00 + 449.00 + 599.00 + 449.00 + 299.00 + 799.00 + 599.00 + 2 * 599.00 + 299.00 + 249.00 + 279.00 + 349.00 + 199.00 + 349.00
total-cost = 13545.00
# Service Charge
service-charge = 406.35
# VAT Calculation
vat = 0.145 * total-cost
vat = 1968.625
# CGST Calculation
cgst = 0.025 * 3054.00
cgst = 76.35
# SGST Calculation
sgst = 0.025 * 3054.00
sgst = 76.35
# Grand Total Calculation
grand-total = total-cost + service-charge + vat + cgst + sgst
grand-total = 15562.00
# Total Amount without Adana Kebab
total-cost-without-adana = 499.00 + (3 * 349.00) + 99.00 + (2 * 599.00) + (3 * 499.00) + 249.00 + 699.00 + 699.00 + 299.00 + 599.00 + 449.00 + 299.00 + 799.00 + 599.00 + 2 * 599.00 + 299.00 + 249.00 + 279.00 + 349.00 + 199.00 + 349.00
total-cost-without-adana = 13046.00
# VAT Calculation without Adana Kebab
vat-without-adana = 0.145 * total-cost-without-adana
vat-without-adana = 1878.47
# Grand Total without Adana Kebab
grand-total-without-adana = total-cost-without-adana + service-charge + vat-without-adana + cgst + sgst
grand-total-without-adana = 15563.52
# Calculate each person's share in percentage without tax and service charge
total-amount-without-tax = 13545.00
jayas-share = (2328.60 / total-amount-without-tax) * 100
jayas-share = 17.20
vijays-share = (2127.60 / total-amount-without-tax) * 100
vijays-share = 15.69
jithins-share = (2628.60 / total-amount-without-tax) * 100
jithins-share = 19.44
gauthams-share = (3077.60 / total-amount-without-tax) * 100
gauthams-share = 22.74
kirans-share = (3382.60 / total-amount-without-tax) * 100
kirans-share = 25.00
# Calculate each person's share in percentage with tax and service charge
total-amount-with-tax = 15562.00
jayas-share-with-tax = (2328.60 / total-amount-with-tax) * 100
jayas_share_with_tax = 14.95
vijays-share-with-tax = (2127.60 / total-amount-with-tax) * 100
vijays-share-with-tax = 13.67
jithins-share-with-tax = (2628.60 / total-amount-with-tax) * 100
jithins-share-with-tax = 16.88
gauthams-share-with-tax = (3077.60 / total-amount-with-tax) * 100
gautham-share-with-tax = 19.76
kirans-share-with-tax = (3382.60 / total-amount-with-tax) * 100
kirans-share-with-tax = 21.73
# Amount of each person after including the percentage share of tax and service charge
jayas-amount-with-tax = total-amount-with-tax * (jayas-share-with-tax / 100)
jayas-amount-with-tax = 2328.60 vijays-amount-with-tax = total-amount-with-tax * (vijays-share-with-tax / 100)
vijays-amount-with-tax = 2127.60
jithins-amount-with-tax = total-amount-with-tax * (jithins-share-with-tax / 100)
jithins-amount-with-tax = 2628.60
gauthams-amount-with-tax = total-amount-with-tax * (gauthams-share-with-tax / 100)
gauthams-amount-with-tax = 3077.60
kirans-amount-with-tax = total-amount-with-tax * (kirans-share-with-tax / 100)
kirans-amount-with-tax = 3382.60

Copilot

Copilot was entirely inaccurate in providing the correct answers. The data it converted into a table was incorrect, and it failed to accurately translate the GST amount from the bill into text format. Additionally, it hallucinated figures for the service charge and VAT. While it could calculate the percentage share, it was unable to accurately determine the amount for each person after including taxes and service charges.

Copilot example.svg

Gemini

Gemini successfully converted the bill into a table, but the data was inaccurate. It provided incorrect answers when calculating the amounts with taxes and service charges. However, it was able to give an approximately correct answer when calculating the percentage share and splitting each person’s amount. When we asked about the split after applying the discount, it hallucinated and initially failed to provide a response. After repeating the question, it delivered approximate figures, though they still varied slightly from the correct amounts.

      import pandas as pd 
  
     # Create a list of items from the bill 
     
    items = [ 
    "Cupid", "Beefeater", "Tonic Water", "Mojito", "Crispy Corn", 
    "Water Melon", "Jalapeno Tini", "Panda Orange", "Dice 'n' Nice", 
    "Vodka Prawn", "Chermoula Paneer", "Adana Kebab", "Don Azus", 
    "Mango Meltdown", "Cosmopolitan", "Monkey Shoulder", "Jalapeno Pops", 
    "Muttai Paniyaram", "Johnnie Walker Red Label", "Tiramisu", "Kunafeh", 
    "Choice Of Ice Cream" 

] 


# Create a list of quantities and prices 
quantities = [1, 1, 2, 2, 3, 1, 1, 1, 1, 1, 1, 1, 8, 1, 1, 2, 1, 1, 1, 1, 1] 
prices = [499.00, 99.00, 59.00, 299.00, 249.00, 699.00, 299.00, 299.00, 449.00, 449.00, 299.00, 449.00, 249.00, 799.00, 599.00, 299.00, 249.00, 279.00, 299.00, 349.00, 349.00, 199.00] 

# Create a DataFrame 
df = pd.DataFrame({"Item": items, "Quantity": quantities, "Price": prices}) 

# Calculate subtotal 
df["Subtotal"] = df["Quantity"] * df["Price"] 

# Calculate total bill without tax and service charge 
total_without_tax_service = df["Subtotal"].sum() 


# Calculate tax and service charge 
tax_rate = 0.145 
service_charge = 406.35 
cgst_rate = 0.025 
sgst_rate = 0.025 
 

tax = total_without_tax_service * tax_rate 
total_with_tax_service = total_without_tax_service + tax + service_charge + (total_without_tax_service * cgst_rate) + (total_without_tax_service * sgst_rate) 

# Print the DataFrame and total bill 
print(df) 
print("\nTotal bill without tax and service charge:", total_without_tax_service) 
print("Total bill with tax and service charge) 

Qwen2-VL-Max

Qwen VLM successfully generated an accurate table, but faced challenges with larger prompts, leading to slow response times. While the first question was answered correctly, it struggled with the second, failing to accurately calculate the total bill after removing the Adana Kebab, as it didn’t proportionally adjust the taxes and service charge. The responses to the third, fourth, and fifth questions were mostly correct, with only slight numerical deviations by a few points.

QWEN example.svg

LLaVA OneVision

LLaVA OneVision was unable to generate tables from the bill, though it accurately answered the first question. For the second question, the model failed to calculate the proportional amount after excluding the Adana Kebab. It provided reasonably accurate responses for the fourth and fifth questions, but struggled to process large prompts, requiring multiple prompts to be submitted.

LlaVA one vision.svg

InternVL2

InternVL2 was able to partially generate a table from the bill but struggled with processing long prompts in a single instance, necessitating follow-up questions. This often led to incomplete responses. While it calculated the first question accurately, the second answer was partial and inaccurate. The model provided fairly accurate answers for questions three and four. However, for the fifth question, it only partially generated the response and failed to complete it even after a follow-up prompt.

intern vl2.svg

Ovis 1.6 Gemma

The VLM was unable to convert the bill into a table and struggled with lengthy prompts, requiring the queries to be broken down into follow-up questions. The response to the second question was incorrect, as the model failed to calculate the proportionate amount after excluding the Adana Kebab. It completely miscalculated the third question, though the first answer was accurate, and the fourth was roughly correct. However, the model also provided an incorrect response to the fifth question, where it miscalculated the amount after including the percentage share with tax and service charges.

ovis.svg

The analysis highlights the limitations of both large language models (LLMs) and visual language models (VLMs) when it comes to handling slightly complex mathematical tasks and data analysis. While models like ChatGPT were able to solve basic arithmetic and perform proportional splits in a restaurant bill, they struggled with more intricate calculations involving taxes, percentages, and logical reasoning. In cases where the prompt involved removing items from the bill and recalculating taxes or service charges, some models required multiple prompts and follow-up queries to arrive at the correct answer. Others failed altogether, producing hallucinations—incorrect or unrelated outputs—especially when faced with large or more detailed queries.

Interestingly, some models were able to arrive at the correct answer only after being given very specific instructions, while others simply could not handle the complexity, especially when multiple related calculations were involved. VLMs, in particular, showed limitations in processing large prompts and struggled with maintaining accuracy when dealing with mathematical operations that require logical progression.

This comparison underscores a broader challenge with LLMs and VLMs: despite being trained on vast datasets and capable of answering a wide range of queries, they still exhibit significant inaccuracies when it comes to handling slightly complex mathematics and logic-based tasks. Many of these models, though powerful, are not yet fully reliable for real-world applications that demand precise calculations or nuanced data interpretation. This calls for further fine-tuning and refinement of these models, particularly in areas like arithmetic and financial reasoning, to ensure they can handle even basic math with a higher degree of accuracy. Improvements in these areas will be crucial for making LLMs and VLMs more effective in practical use cases where such calculations are essential.

Related Blogs

Top 5 AI Tools That Should Be Used in the UAE: Do you use any of these already?

The United Arab Emirates (UAE) is not only keeping pace with the AI revolution—it's leading the way. With forward-looking policies such as the **UAE Strategy for Artificial Intelligence 2031**, the nation has established itself as a global testbed for AI innovation**.

Top 5 AI Tools That Should Be Used in the UAE: Do you use any of these already?

The When, Why and for Whom: a comparison of Frontend Frameworks React, Svelte and Solid.js

As a developer, choosing the right frontend framework can significantly impact the performance, maintainability, and scalability of your web applications. This article provides an in-depth comparison of three popular frameworks: React, Svelte, and Solid.js, from a developer's perspective .

The When, Why and for Whom: a comparison of Frontend Frameworks React, Svelte and Solid.js

Matplotlib vs. Plotly: Choosing the Right Data Visualization Tool

In a data-driven world, effective visualization is essential for analyzing complex datasets. Well-crafted visuals simplify intricate information, enhance storytelling, and make insights more accessible. Among the many tools available, Matplotlib and Plotly stand out as two of the most widely used Python libraries for data visualization. Each offers distinct features catering to different user needs. Let's explore their strengths, differences, and ideal use cases.

Matplotlib vs. Plotly: Choosing the Right Data Visualization Tool

AI-Driven Social Listening: Decode Your Gamers' Minds & Boost Revenue

The gaming industry is a multi-billion-dollar battlefield where player sentiment shifts rapidly. Every day, millions of gamers voice their opinions, frustrations, and desires on platforms like Reddit, Twitter, Discord, and Twitch. But are you truly listening?

AI-Driven Social Listening: Decode Your Gamers' Minds & Boost Revenue

DeepSeek Rising: How an Open-Source Challenger Is Cracking OpenAI’s Fortress

The AI race has long been dominated by proprietary giants like OpenAI, but a new contender is making waves—DeepSeek. With its latest open-source models, DeepSeek V3 and DeepThink R1, this Chinese AI company is challenging OpenAI’s dominance by offering competitive performance at a fraction of the cost. DeepSeek’s Mixture of Experts (MoE) architecture, efficient GPU utilization, and strategic innovations have enabled it to deliver high-performance AI models with minimal computational expense. But how does it truly compare to OpenAI’s GPT-4o and GPT-o1? Let's break it down.

DeepSeek Rising: How an Open-Source Challenger Is Cracking OpenAI’s Fortress
Top 5 AI Tools That Should Be Used in the UAE: Do you use any of these already?

Top 5 AI Tools That Should Be Used in the UAE: Do you use any of these already?

The United Arab Emirates (UAE) is not only keeping pace with the AI revolution—it's leading the way. With forward-looking policies such as the **UAE Strategy for Artificial Intelligence 2031**, the nation has established itself as a global testbed for AI innovation**.

The When, Why and for Whom: a comparison of Frontend Frameworks React, Svelte and Solid.js

The When, Why and for Whom: a comparison of Frontend Frameworks React, Svelte and Solid.js

As a developer, choosing the right frontend framework can significantly impact the performance, maintainability, and scalability of your web applications. This article provides an in-depth comparison of three popular frameworks: React, Svelte, and Solid.js, from a developer's perspective .

Matplotlib vs. Plotly: Choosing the Right Data Visualization Tool

Matplotlib vs. Plotly: Choosing the Right Data Visualization Tool

In a data-driven world, effective visualization is essential for analyzing complex datasets. Well-crafted visuals simplify intricate information, enhance storytelling, and make insights more accessible. Among the many tools available, Matplotlib and Plotly stand out as two of the most widely used Python libraries for data visualization. Each offers distinct features catering to different user needs. Let's explore their strengths, differences, and ideal use cases.

AI-Driven Social Listening: Decode Your Gamers' Minds & Boost Revenue

AI-Driven Social Listening: Decode Your Gamers' Minds & Boost Revenue

The gaming industry is a multi-billion-dollar battlefield where player sentiment shifts rapidly. Every day, millions of gamers voice their opinions, frustrations, and desires on platforms like Reddit, Twitter, Discord, and Twitch. But are you truly listening?

DeepSeek Rising: How an Open-Source Challenger Is Cracking OpenAI’s Fortress

DeepSeek Rising: How an Open-Source Challenger Is Cracking OpenAI’s Fortress

The AI race has long been dominated by proprietary giants like OpenAI, but a new contender is making waves—DeepSeek. With its latest open-source models, DeepSeek V3 and DeepThink R1, this Chinese AI company is challenging OpenAI’s dominance by offering competitive performance at a fraction of the cost. DeepSeek’s Mixture of Experts (MoE) architecture, efficient GPU utilization, and strategic innovations have enabled it to deliver high-performance AI models with minimal computational expense. But how does it truly compare to OpenAI’s GPT-4o and GPT-o1? Let's break it down.

Additional

Your Random Walk Towards AI Begins Now