The Random Walk Blog

2024-09-27

The Story of a Bill: How Well Can AI Models Handle Real-World Math

The Story of a Bill: How Well Can AI Models Handle Real-World Math

Large Language Models (LLMs) have proven effective in tasks like natural language processing, sentiment analysis, data extraction, and answering questions.

According to ChatGPT, complex mathematical operations for it include advanced techniques such as calculus, matrix operations, differential equations, optimization, and probability, often used for solving real-world problems in fields like engineering, physics, and economics. And complex data analysis involves multivariate analysis, statistical modeling, time series analysis, machine learning, and big data handling, used for discovering patterns, making predictions, and drawing insights from large datasets.

Recently, we've explored multiple LLMs’ ability to handle basic mathematical and analytical operations, including additions, subtractions, multiplication, division and percentage calculations and financial data analysis. While LLMs can manage basic arithmetic, While LLMs can manage basic arithmetic, we tested their ability to solve more complex tasks using a restaurant bill, and tried to convert the bill into a table, split the total amount, and calculate each person's percentage share.

First, let’s take the restaurant bill and try to understand how LLM’s can answers complex questions through prompt engineering. We start with basic math and data problems.

Bill.svg

We started by using ChatGPT 4.0 to transform the restaurant bill into a structured table, and it handled the task flawlessly.

ChatGPT example 1.svg

Our second question was to compute the total of the bill, excluding and including taxes. The result is as follows:

To compute the total of the bill:
Excluding Taxes:
The subtotal given on the receipt is ₹13,545.00.
Including Taxes:
We need to include the service charge and all applicable taxes:
Subtotal: ₹13,545.00
Service Charge: ₹406.35
VAT (14.5%): ₹1,424.92
CGST (2.5%): ₹92.95
SGST (2.5%): ₹92.95
Round off: -₹0.17
Total including taxes:
Total
=13,545.00+406.35+1,424.92+92.95+92.95−0.17=₹15,562.00
Total=13,545.00+406.35+1,424.92+92.95+92.95−0.17=₹15,562.00
Summary:
Total Excluding Taxes: ₹13,545.00
Total Including Taxes: ₹15,562.00
The receipt’s calculations are accurate, showing a total of ₹15,562.00 including taxes.

As seen here, ChatGPT accurately performed all the calculations based on the data from the table.

Next, we asked ChatGPT a question,_“What would be the bill total including and excluding taxes if Adana Kebab was not purchased?”_

To calculate the total of the bill excluding and including taxes if the Adana Kebab (₹449.00) was not purchased, we need to subtract its cost from the totals.
Excluding Taxes:
Original Subtotal: ₹13,545.00
Adana Kebab Cost: -₹449.00
New Subtotal:
New Subtotal= 13,545.00−449.00 = ₹13,096.00
Including Taxes:
We now recalculate the service charge and taxes based on the new subtotal:
Service Charge: (Keeping the original ratio)
New Service Charge = 406.35/13,545.00×13,096.00 = ₹392.80
VAT (14.5%): New VAT = 1,424.92/13,545.00×13,096.00 = ₹1,378.92
CGST (2.5%): New CGST = 92.95/13,545.00×13,096.00 = ₹89.92
SGST (2.5%): New SGST = 92.95/13,545.00×13,096.00 = ₹89.92
Round off: (keeping it the same, as it won't change much): -₹0.17
New Total including taxes:
New Total = 13,096.00+392.80+1,378.92+89.92+89.92−0.17 = ₹15,047.39
Summary:
Total Excluding Taxes (without Adana Kebab): ₹13,096.00
Total Including Taxes (without Adana Kebab): ₹15,047.39
If the Adana Kebab was not purchased, the total bill including taxes would be approximately ₹15,047.39.

Here, ChatGPT successfully calculated the total amount excluding service charges and taxes. However, it struggled to provide the correct total amount when taxes were included. This was due to its inability to recognize that VAT applies only to alcoholic products, while GST is applied to food and non-alcoholic items. After we provided the model with this information, it generated the following result:

Given that VAT is only applied to alcohol and CGST & SGST are applied to non-alcoholic drinks and food,we need to adjust the calculations accordingly when removing the Adana Kebab from the bill.
Total Non-Alcohol Cost (Before Removing Adana Kebab): ₹6509.00
Total Non-Alcohol Cost (After Removing Adana Kebab):₹6509.00−449.00 = ₹6060.00
Step 1: Recalculate VAT
Original VAT: 14.5% on ₹7036.00 = 7036.00×0.145 = ₹1020.22
Step 2: Recalculate CGST and SGST
Original CGST: 2.5% on ₹6509.00 = 6509.00×0.025=₹162.73
Original SGST: 2.5% on ₹6509.00 = 6509.00×0.025=₹162.73
New CGST: 2.5% on ₹6060.00 = 6060.00×0.025=₹151.50
New SGST: 2.5% on ₹6060.00 = 6060.00×0.025=₹151.50
Step 3: Recalculate the Total Bill
Subtotal After Removing Adana Kebab: ₹13,545.00 - ₹449.00 = ₹13,096.00
Service Charge: ₹392.80 (as calculated earlier)
New Total Including Taxes:
New Total = 13,096.00+392.80+1020.22+151.50+151.50−0.17 = ₹14,811.85
Summary:
Total Excluding Taxes (without Adana Kebab): ₹13,096.00
Total Including Taxes (without Adana Kebab): ₹14,811.85
The new total bill, after removing the Adana Kebab and correctly applying the taxes, would be approximately ₹14,811.85.

In this case, the LLM confused food and beverages, leading to inaccuracies in calculating their respective costs. To clarify, we provided the total cost of food, excluding taxes and service charges.

Thank you for the clarification. Let's recalculate the bill with the correct breakdown:
Initial Totals:
Total Cost of Food (including non-alcoholic drinks): ₹3,720.00
Total Cost of Alcohol: ₹9,825.00
Taxes:
VAT (14.5% on Alcohol):
VAT on ₹9,825.00 = 9,825.00×0.145=₹1,424.63
CGST & SGST (2.5% each on Food):
CGST on ₹3,720.00 = 3,720.00×0.025=₹93.00
SGST on ₹3,720.00 = 3,720.00×0.025=₹93.00
Service Charge: ₹406.35
If Adana Kebab (₹449.00) is Removed:
New Total Food Cost: ₹3,720.00 - ₹449.00 = ₹3,271.00
Alcohol Cost: ₹9,825.00 (unchanged)
Recalculated Taxes:
VAT on Alcohol (unchanged): ₹1,424.63
New CGST & SGST on Food:
CGST on ₹3,271.00 = 3,271.00×0.025=₹81.78
SGST on ₹3,271.00 = 3,271.00×0.025=₹81.78
Recalculate the Bill:
New Subtotal:
New Subtotal = 3,271.00+9,825.00=₹13,096.00
Service Charge: ₹406.35
New Total Including Taxes:
New Total=13,096.00+406.35+1,424.63+81.78+81.78−0.17 = ₹15,090.37
Summary:
Total Excluding Taxes (without Adana Kebab): ₹13,096.00
Total Including Taxes (without Adana Kebab): ₹15,090.37
The new total bill, after removing Adana Kebab and correctly applying the taxes, would be approximately ₹15,090.37.

This prompt enabled the LLM to accurately calculate the costs of alcoholic items and determine the new totals, both with and without taxes and service charges. However, one issue remained: it did not proportionally reduce the service charge after removing the Adana Kebab. To address this, we provided two additional prompts, which resulted in the correct calculation.

Our next query was to calculate each person’s total and share in percentage. For this, we gave each person’s share. The following is the query: _“Jaya-2328.60, Vijay-2127.60, Jithin-2628.60, Gautham-3077.60, Kiran-3382.60. Calculate total and each persons share in percentage.”_

ChatGPT's response was close but not entirely accurate. For the next query, we required the individual split for food and drinks. To facilitate this, we provided a list of the drinks each person had, as the food was shared equally among them. The prompt is given below.

_Drinks are charged on the person and the food is shared commonly. Here are the drinks and who had it: Jaya-Cupid, Cosmopolitan, Don Azus (2). Jithin-Mango Meltdown, Mojito (1), Don Azus (2). Gautham-Jagger Bomb, Mojito (1), Don Azus (1). Kiran-Jhonnie Walker Red Label, Monkey Shoulder, Don Azus (2). Vijay-Beefeater, tonic water, Don Azus (1). Show me each persons split._

//?#Initial amounts based on the previous breakdown !! !! total-food-cost = 3271.00  # New food subtotal after removing Adana Kebab !! !!  total-service-charge = 392.94  # New service charge after removing Adana Kebab !! !! total-food-cost-with-tax = total-food-cost + (81.78 + 81.78)  # Adding CGST and SGST !! !! # Drinks cost per person !! !! drinks_cost = !! !! "Jaya": 499.00 + 599.00 + (249.00 * 2), !! !! "Jithin": 799.00 + 599.00 + (249.00 * 2), !! !! "Gautham": 499.00 + 599.00 + 249.00, !! !! "Kiran": 299.00 + 599.00 + (249.00 * 2), !! !! "Vijay": 349.00 + 99.00 + 249.00 !! !! !! !! #Calculate each person's total cost including their drinks and their share of the food cost !! !! total-cost = drinks-cost.copy() !! !! num-people = len(total-cost) !! !! #Each person's share of the food and service charge !! !! shared-cost-per-person = (total-food-cost-with-tax + total-service-charge) / num-people !! !! for person in total-cost: !! !! total-cost[person] += shared-cost-per-person !! !! total-cost !! !! Result !! !! 'Jaya': 2361.5, !! !! 'Jithin': 2661.5, !! !! 'Gautham': 2112.5, !! !! 'Kiran': 2161.5, !! !!'Vijay': 1462.5//?

In this instance, we requested the total split of the bill, excluding taxes and service charges. However, ChatGPT mistakenly excluded the cost of the Adana Kebab from its calculations. When we asked for a breakdown that included the Adana Kebab, it again provided an incorrect answer. After giving a second prompt, the LLM finally delivered the correct calculation.

//?Apologies for the confusion. Let's calculate the correct split for the total bill amount of ₹13,545 based on the drinks each person had and the food cost being shared equally. !! !! Step-by-Step Breakdown: !! !! Total Bill Amount: ₹13,545 !! !! Drinks Total (Alcohol): ₹9,825 (VAT: ₹1,424.63) !! !! Food Total (including Adana Kebab): ₹3,720 (CGST & SGST: ₹186.00) !! !! Drinks Cost Per Person: !! !! Jaya: Cupid (₹499) + Cosmopolitan (₹599) + Don Azus (2 × ₹249) = ₹1,596 !! !! Jithin: Mango Meltdown (₹799) + Mojito (₹599) + Don Azus (2 × ₹249) = ₹1,895 !! !! Gautham: Jagger Bomb (₹499) + Mojito (₹599) + Don Azus (₹249) = ₹1,347 !! !! Kiran: Jhonnie Walker Red Label (₹299) + Monkey Shoulder (₹599) + Don Azus (2 × ₹249) = ₹1,395 !! !! Vijay: Beefeater (₹349) + tonic water (₹99) + Don Azus (₹249) = ₹697 !! !! Shared Food and Service Charge Cost: !! !! Total Food Cost: ₹3,720 !! !! Service Charge: ₹406.35 !! !! CGST & SGST: ₹186.00 !! !! \textShared Cost Per Person = \frac\text{Food Cost + Service Charge + CGST & SGST}5 = \frac3,720 + 406.35 + 186.005 = ₹862.07 !! !! Final Calculation: !! !! Jaya's Total: ₹1,596 (Drinks) + ₹862.07 (Shared Food) = ₹2,458.07 !! !! Jithin's Total: ₹1,895 (Drinks) + ₹862.07 (Shared Food) = ₹2,757.07 !! !! Gautham's Total: ₹1,347 (Drinks) + ₹862.07 (Shared Food) = ₹2,209.07 !! !! Kiran's Total: ₹1,395 (Drinks) + ₹862.07 (Shared Food) = ₹2,257.07 !! !! Vijay's Total: ₹697 (Drinks) + ₹862.07 (Shared Food) = ₹1,559.07 !! !! Summary of Each Person’s Total: !! !! Jaya: ₹2,458.07 !! !! Jithin: ₹2,757.07 !! !! Gautham: ₹2,209.07 !! !! Kiran: ₹2,257.07 !! !! Vijay: ₹1,559.07 !! !! These amounts correctly split the total bill of ₹13,545 among everyone based on their drinks and an equal share of the food and service charges.//?

Next, we requested a list of food and non-alcoholic drinks from the LLM. However, it hallucinated and provided an inaccurate response, including items that were not on the bill.

chatgpt example 2.svg

The next query was, _“Due to discount assume the bill is 12877 and not 15562. Now what is each persons share?”_

//?# Re-defining the amounts and total to correct the issue !! !! amounts = !! !! "Jaya": 2328.60, !! !!"Vijay": 2127.60, !! !! "Jithin": 2628.60, !! !! "Gautham": 3077.60, !! !!  "Kiran": 3382.60 !! !! !! !! # Total of the initial amounts !! !! total-initial-amount = sum(amounts.values()) !! !! # New total bill after discount !! !! new-total-bill = 12877 !! !! # Calculate the new percentage share and total amount for each person !! !! new-total-amounts = !! !! for person, amount in amounts.items(): !! !! percentage-share = amount / total-initial-amount !! !! new-total-amounts[person] = percentage-share * new-total-bill !! !! new-total-amounts//?

ChatGPT's response was nearly correct, reflecting each person's proportional share of the discounted bill. However, it required multiple prompts and follow-up queries to grasp the logic of adding and subtracting taxes and service charges when removing an item and splitting each person’s bill by percentage. Additionally, it began to hallucinate when we asked it to list the names of the items on the bill, providing incorrect answers. To streamline the process, we consolidated all the prompts and follow-up queries into a single prompt and posed the same question in a separate chat. Here’s the prompt:

_Convert the above bill into a table format and answer the below questions:_

_1. Compute the total bill cost with and without - tax and service charge._

_2. What would be the bill total including and excluding taxes and service charge if Adana Kebab was not purchased?_

_3. Jaya-2328.60, Vijay-2127.60, Jithin-2628.60, Gautham-3077.60, Kiran-3382.60. Calculate each persons share in percentage. The total amount should be taken without the tax and service charge._

_4. Jaya-2328.60, Vijay-2127.60, Jithin-2628.60, Gautham-3077.60, Kiran-3382.60. Calculate each persons share in percentage. The total amount should be taken with the tax and service charge._

_Considerations:_

_1. VAT is calculated only for alcohol and CGST and SGST are calculated for non-alcoholic drinks and food._

_2. Below is a list of food, alcoholic and non-alcoholic answers._

_Alcoholic drinks: Cupid, Beefeater, Jagger Bomb, Don Azus, Mango meltdown, Water Melon Jalapeno Tini, Panda Orange, Dice N Nice, Cosmopolitan, Monkey shoulders, Johnnie Walker Red label,  Vodka Prawn_

_Non-Alcoholic drinks: Tonic Water, Mojito_

_Food: Crispy Corn, Jalapeno pops, Chermoula Paneer, Adana Kebab, Muttai Paniyaram, Tiramisu, Ice cream, Kunafeh_

_3. Do no make approximations, give the correct answer upto 2 decimal points_

_4. Do not make assumptions_

_5. Do not provide code_

_6. Give the exact calculations for every step_

ChatGPT converted it into a table and gave us a response which is given below.

1. Compute the total bill cost with and without tax and service charge.
Without tax and service charge: ₹13,545.00
With tax and service charge: ₹15,562.00
2. What would be the bill total including and excluding taxes and service charge if Adana Kebab was not purchased?
Without tax and service charge (excluding Adana Kebab):
Total (before tax): ₹13,545.00 - ₹449.00 = ₹13,096.00
Tax and Service Charges Calculation (excluding Adana Kebab):
Service Charge: ₹406.35 - (₹406.35 / ₹13,545.00) × ₹449.00 = ₹389.81
VAT: ₹1424.92 (no change since Adana Kebab is food, not alcohol)
CGST: ₹92.95 - (₹92.95 / ₹13,545.00) × ₹449.00 = ₹89.87
SGST: ₹92.95 - (₹92.95 / ₹13,545.00) × ₹449.00 = ₹89.87
New Grand Total: ₹13,096.00 + ₹389.81 + ₹1424.92 + ₹89.87 + ₹89.87 = ₹15,090.47
3. Jaya-₹2328.60, Vijay-₹2127.60, Jithin-₹2628.60, Gautham-₹3077.60, Kiran-₹3382.60.
Calculate each person's share in percentage of the total amount without tax and service charge (₹13,545.00):
Total Contribution: ₹2328.60 + ₹2127.60 + ₹2628.60 + ₹3077.60 + ₹3382.60 = ₹13545.00
Jaya's Share: (₹2328.60 / ₹13,545.00) × 100 = 17.18%
Vijay's Share: (₹2127.60 / ₹13,545.00) × 100 = 15.70%
Jithin's Share: (₹2628.60 / ₹13,545.00) × 100 = 19.40%
Gautham's Share: (₹3077.60 / ₹13,545.00) × 100 = 22.71%
Kiran's Share: (₹3382.60 / ₹13,545.00) × 100 = 24.98%
4. Jaya-₹2328.60, Vijay-₹2127.60, Jithin-₹2628.60, Gautham-₹3077.60, Kiran-₹3382.60.
Calculate each person's share in percentage of the total amount with tax and service charge (₹15,562.00):
Total Contribution: ₹2328.60 + ₹2127.60 + ₹2628.60 + ₹3077.60 + ₹3382.60 = ₹13,545.00
Jaya's Share: (₹2328.60 / ₹15,562.00) × 100 = 14.96%
Vijay's Share: (₹2127.60 / ₹15,562.00) × 100 = 13.68%
Jithin's Share: (₹2628.60 / ₹15,562.00) × 100 = 16.89%
Gautham's Share: (₹3077.60 / ₹15,562.00) × 100 = 19.78%
Kiran's Share: (₹3382.60 / ₹15,562.00) × 100 = 21.73%

For the second question, the calculated service tax after reducing the cost of the Adana Kebab differed slightly from the actual answer of Rs. 392.88, leading to a minor adjustment in the final result by two points. In response to the third question about percentage calculations, the LLM provided an answer that was approximately correct. However, it faltered on the fourth question, failing to calculate the split of the total amount with and without taxes and service charges. We followed up by asking for each person's amount after including their percentage share of tax and service charges, and it generated an answer that was approximately correct. To further clarify, we inquired about splitting the tax and service charges as a percentage for each person, which resulted in the correct response. We asked it to add these expenses to the individual amounts, and it provided an accurate response.

It gave the correct answer after the two follow up questions. We also asked a query on the discount bill._“Due to discount assume the bill is 12877 and not 15562. Now what is each persons share”_.The LLM’s response was accurate.

Thus, we concluded that by providing accurate prompts and instructions to ChatGPT, it would yield approximately accurate responses for math and financial data analysis questions, requiring only four follow-up inquiries.  This highlights the potential benefits of fine-tuning LLMs for specific use cases. In this scenario, fine-tuning ChatGPT on a dataset of bill-related queries and arithmetic calculations helped improve its ability to accurately handle tasks like splitting bills, calculating taxes, and processing multiple related prompts in one go. By adapting the model to better understand financial and tabular data, its performance in similar contexts would likely improve, reducing the need for multiple clarifications and minimizing hallucinations.

We then tested the same prompt with various LLMs and VLMs, and the accuracy of their responses is summarized below:

Claude AI

Claude AI was successful in generating the table from the bill and calculating the basic total including and excluding taxes and service charge. However, it struggled to accurately calculate the query involving the removal of the Adana Kebab, incorrectly reducing the VAT despite our instruction that VAT applies only to alcoholic items. For the fourth question, which asked for each person’s amount including taxes and service charges, it failed to provide an accurate result. After prompting it again for the calculation with tax and service charge, it returned a nearly accurate figure, though it still differed slightly from the correct answer. In the subsequent query, we requested a split of the tax and service charge amounts in both percentage and absolute values for each person, to which it responded with an almost accurate number. When we inquired about the discount, it provided an approximately correct answer. Thus, similar to ChatGPT, Claude AI was also able to deliver nearly correct responses to the mathematical and data analysis queries following effective prompt engineering.

//?def calculate-bill(items, exclude-item=None): !! !! subtotal = sum(item['price'] * item['quantity'] for item in items if item['name'] != exclude-item) !! !!   alcohol-total = sum(item['price'] * item['quantity'] for item in items if item['category'] == 'Alcoholic' and item['name'] != exclude_item)!! !! non-alcohol-total = subtotal - alcohol-total !! !! service-charge = subtotal * 0.03 !! !! vat = alcohol-total * 0.145 !! !! sgst = non-alcohol-total * 0.025!! !! total = subtotal + service-charge + vat + cgst + sgst !! !! return subtotal, total !! !! def calculate-share(amount, total): !! !! return (amount / total) * 100 !! !! # Define items !! !! items = [!! !! 'name': 'Cupid', 'price': 499, 'quantity': 1, 'category': 'Alcoholic', !! !! # ... Add all other items here !! !! ] !! !! # Calculate totals !! !! subtotal, total-with-tax = calculate_bill(items) !! !! subtotal-without-kebab, total-without-kebab = calculate-bill(items, 'Adana Kebab') !! !! # Calculate shares !! !! shares = [2328.60, 2127.60, 2628.60, 3077.60, 3382.60] !! !! names = ['Jaya', 'Vijay', 'Jithin', 'Gautham', 'Kiran'] !! !! shares-without-tax = [calculate-share(share, subtotal) for share in shares]!! !! shares-with-tax = [calculate-share(share, total-with-tax) for share in shares] !! !! tax-and-service = total-with-tax - subtotal !! !!shares-with-tax-amount = [share + (percentage / 100 * tax-and-service) for share, percentage in zip(shares, shares-with-tax)] !! !!# Print results !! !! print(f"Total without tax: subtotal:.2f")!! !! print(f"Total with tax: total_with_tax:.2f") !! !! print(f"Total without Adana Kebab (without tax): subtotal_without_kebab:.2f") !! !! print(f"Total without Adana Kebab (with tax): total_without_kebab:.2f") !! !! for name, share-without, share-with, amount in zip(names, shares-without-tax, shares-with-tax, shares-with-tax-amount): !! !! print(f"name: share-without:.2f% (without tax), share-with:.2f% (with tax), ₹amount:.2f")//?

Mistral AI

Mistral AI initially failed to present the bill as a complete table. Its basic calculations for VAT, GST, the total without the Adana Kebab, and the service tax were incorrect, despite clear instructions. However, it managed to provide an approximate calculation of each person's percentage share. It also calculated the split amounts based on these percentages with reasonable accuracy and produced an approximate split amount after applying the discount.

# Total Cost Calculation
total-cost = 499.00 + (3 * 349.00) + 99.00 + (2 * 599.00) + (3 * 499.00) + 249.00 + 699.00 + 699.00 + 299.00 + 449.00 + 599.00 + 449.00 + 299.00 + 799.00 + 599.00 + 2 * 599.00 + 299.00 + 249.00 + 279.00 + 349.00 + 199.00 + 349.00
total-cost = 13545.00
# Service Charge
service-charge = 406.35
# VAT Calculation
vat = 0.145 * total-cost
vat = 1968.625
# CGST Calculation
cgst = 0.025 * 3054.00
cgst = 76.35
# SGST Calculation
sgst = 0.025 * 3054.00
sgst = 76.35
# Grand Total Calculation
grand-total = total-cost + service-charge + vat + cgst + sgst
grand-total = 15562.00
# Total Amount without Adana Kebab
total-cost-without-adana = 499.00 + (3 * 349.00) + 99.00 + (2 * 599.00) + (3 * 499.00) + 249.00 + 699.00 + 699.00 + 299.00 + 599.00 + 449.00 + 299.00 + 799.00 + 599.00 + 2 * 599.00 + 299.00 + 249.00 + 279.00 + 349.00 + 199.00 + 349.00
total-cost-without-adana = 13046.00
# VAT Calculation without Adana Kebab
vat-without-adana = 0.145 * total-cost-without-adana
vat-without-adana = 1878.47
# Grand Total without Adana Kebab
grand-total-without-adana = total-cost-without-adana + service-charge + vat-without-adana + cgst + sgst
grand-total-without-adana = 15563.52
# Calculate each person's share in percentage without tax and service charge
total-amount-without-tax = 13545.00
jayas-share = (2328.60 / total-amount-without-tax) * 100
jayas-share = 17.20
vijays-share = (2127.60 / total-amount-without-tax) * 100
vijays-share = 15.69
jithins-share = (2628.60 / total-amount-without-tax) * 100
jithins-share = 19.44
gauthams-share = (3077.60 / total-amount-without-tax) * 100
gauthams-share = 22.74
kirans-share = (3382.60 / total-amount-without-tax) * 100
kirans-share = 25.00
# Calculate each person's share in percentage with tax and service charge
total-amount-with-tax = 15562.00
jayas-share-with-tax = (2328.60 / total-amount-with-tax) * 100
jayas_share_with_tax = 14.95
vijays-share-with-tax = (2127.60 / total-amount-with-tax) * 100
vijays-share-with-tax = 13.67
jithins-share-with-tax = (2628.60 / total-amount-with-tax) * 100
jithins-share-with-tax = 16.88
gauthams-share-with-tax = (3077.60 / total-amount-with-tax) * 100
gautham-share-with-tax = 19.76
kirans-share-with-tax = (3382.60 / total-amount-with-tax) * 100
kirans-share-with-tax = 21.73
# Amount of each person after including the percentage share of tax and service charge
jayas-amount-with-tax = total-amount-with-tax * (jayas-share-with-tax / 100)
jayas-amount-with-tax = 2328.60 vijays-amount-with-tax = total-amount-with-tax * (vijays-share-with-tax / 100)
vijays-amount-with-tax = 2127.60
jithins-amount-with-tax = total-amount-with-tax * (jithins-share-with-tax / 100)
jithins-amount-with-tax = 2628.60
gauthams-amount-with-tax = total-amount-with-tax * (gauthams-share-with-tax / 100)
gauthams-amount-with-tax = 3077.60
kirans-amount-with-tax = total-amount-with-tax * (kirans-share-with-tax / 100)
kirans-amount-with-tax = 3382.60

Copilot

Copilot was entirely inaccurate in providing the correct answers. The data it converted into a table was incorrect, and it failed to accurately translate the GST amount from the bill into text format. Additionally, it hallucinated figures for the service charge and VAT. While it could calculate the percentage share, it was unable to accurately determine the amount for each person after including taxes and service charges.

Copilot example.svg

Gemini

Gemini successfully converted the bill into a table, but the data was inaccurate. It provided incorrect answers when calculating the amounts with taxes and service charges. However, it was able to give an approximately correct answer when calculating the percentage share and splitting each person’s amount. When we asked about the split after applying the discount, it hallucinated and initially failed to provide a response. After repeating the question, it delivered approximate figures, though they still varied slightly from the correct amounts.

//? import pandas as pd !! !! # Create a list of items from the bill !! !! items = [!! !!"Cupid", "Beefeater", "Tonic Water", "Mojito", "Crispy Corn", !! !! "Water Melon", "Jalapeno Tini", "Panda Orange", "Dice 'n' Nice", !! !!"Vodka Prawn", "Chermoula Paneer", "Adana Kebab", "Don Azus", !! !! "Mango Meltdown", "Cosmopolitan", "Monkey Shoulder", "Jalapeno Pops", !! !! "Muttai Paniyaram", "Johnnie Walker Red Label", "Tiramisu", "Kunafeh", !! !! "Choice Of Ice Cream" !! !!] !! !! # Create a list of quantities and prices !! !! quantities = [1, 1, 2, 2, 3, 1, 1, 1, 1, 1, 1, 1, 8, 1, 1, 2, 1, 1, 1, 1, 1] !! !! prices = [499.00, 99.00, 59.00, 299.00, 249.00, 699.00, 299.00, 299.00, 449.00, 449.00, 299.00, 449.00, 249.00, 799.00, 599.00, 299.00, 249.00, 279.00, 299.00, 349.00, 349.00, 199.00] !! !! # Create a DataFrame !! !!df = pd.DataFrame("Item": items, "Quantity": quantities, "Price": prices) !! !! # Calculate subtotal !! !! df["Subtotal"] = df["Quantity"] * df["Price"] !! !! # Calculate total bill without tax and service charge !! !!total-without-tax-service = df["Subtotal"].sum() !! !!# Calculate tax and service charge !! !!tax-rate = 0.145 !! !! service-charge = 406.35 !! !! cgst-rate = 0.025 !! !!sgst-rate = 0.025 !! !! tax = total-without_-tax-service * tax-rate !! !!total-with-tax-service = total-without-tax-service + tax + service-charge + (total-without-tax-service * cgst-rate) + (total-without-tax-service * sgst-rate) !! !! # Print the DataFrame and total bill !! !!print(df) !! !!print("\nTotal bill without tax and service charge:", total-without-tax-service) !! !!print("Total bill with tax and service charge)//?

Qwen2-VL-Max

Qwen VLM successfully generated an accurate table, but faced challenges with larger prompts, leading to slow response times. While the first question was answered correctly, it struggled with the second, failing to accurately calculate the total bill after removing the Adana Kebab, as it didn’t proportionally adjust the taxes and service charge. The responses to the third, fourth, and fifth questions were mostly correct, with only slight numerical deviations by a few points.

QWEN example.svg

LLaVA OneVision

LLaVA OneVision was unable to generate tables from the bill, though it accurately answered the first question. For the second question, the model failed to calculate the proportional amount after excluding the Adana Kebab. It provided reasonably accurate responses for the fourth and fifth questions, but struggled to process large prompts, requiring multiple prompts to be submitted.

LlaVA one vision.svg

InternVL2

InternVL2 was able to partially generate a table from the bill but struggled with processing long prompts in a single instance, necessitating follow-up questions. This often led to incomplete responses. While it calculated the first question accurately, the second answer was partial and inaccurate. The model provided fairly accurate answers for questions three and four. However, for the fifth question, it only partially generated the response and failed to complete it even after a follow-up prompt.

intern vl2.svg

Ovis 1.6 Gemma

The VLM was unable to convert the bill into a table and struggled with lengthy prompts, requiring the queries to be broken down into follow-up questions. The response to the second question was incorrect, as the model failed to calculate the proportionate amount after excluding the Adana Kebab. It completely miscalculated the third question, though the first answer was accurate, and the fourth was roughly correct. However, the model also provided an incorrect response to the fifth question, where it miscalculated the amount after including the percentage share with tax and service charges.

ovis.svg

The analysis highlights the limitations of both large language models (LLMs) and visual language models (VLMs) when it comes to handling slightly complex mathematical tasks and data analysis. While models like ChatGPT were able to solve basic arithmetic and perform proportional splits in a restaurant bill, they struggled with more intricate calculations involving taxes, percentages, and logical reasoning. In cases where the prompt involved removing items from the bill and recalculating taxes or service charges, some models required multiple prompts and follow-up queries to arrive at the correct answer. Others failed altogether, producing hallucinations—incorrect or unrelated outputs—especially when faced with large or more detailed queries.

Interestingly, some models were able to arrive at the correct answer only after being given very specific instructions, while others simply could not handle the complexity, especially when multiple related calculations were involved. VLMs, in particular, showed limitations in processing large prompts and struggled with maintaining accuracy when dealing with mathematical operations that require logical progression.

This comparison underscores a broader challenge with LLMs and VLMs: despite being trained on vast datasets and capable of answering a wide range of queries, they still exhibit significant inaccuracies when it comes to handling slightly complex mathematics and logic-based tasks. Many of these models, though powerful, are not yet fully reliable for real-world applications that demand precise calculations or nuanced data interpretation. This calls for further fine-tuning and refinement of these models, particularly in areas like arithmetic and financial reasoning, to ensure they can handle even basic math with a higher degree of accuracy. Improvements in these areas will be crucial for making LLMs and VLMs more effective in practical use cases where such calculations are essential.

Related Blogs

Refining and Creating Data Visualizations with LIDA

Microsoft’s Language-Integrated Data Analysis (LIDA) is a game-changer, offering an advanced framework to refine and enhance data visualizations with seamless integration, automation, and intelligence. Let’s explore the key features and applications of LIDA, and its transformative impact on the data visualization landscape. LIDA is a powerful library designed to effortlessly generate data visualizations and create data-driven infographics with precision. What makes LIDA stand out is its grammar-agnostic approach, enabling compatibility with various programming languages and visualization libraries, including popular ones like matplotlib, seaborn, altair, and d3. Plus, it seamlessly integrates with multiple large language model providers such as OpenAI, Azure OpenAI, PaLM, Cohere, and Huggingface.

Refining and Creating Data Visualizations with LIDA

Core Web Vitals: How to Improve LCP and CLS for Optimal Site Performance

Optimizing a website for performance is essential to enhance user experience and boost search engine rankings. Two critical metrics from Google’s Core Web Vitals (CWV)—Largest Contentful Paint (LCP) and Cumulative Layout Shift (CLS)—play a significant role in measuring and improving a site’s performance. These metrics outline the key strategies for optimization and highlight the observed impact on both mobile and desktop performance.

Core Web Vitals: How to Improve LCP and CLS for Optimal Site Performance

From Frontend-Heavy to a Balanced Architecture: Enhancing System Efficiency

Building efficient and scalable applications often requires balancing responsibilities between the frontend and backend. When tasks like report generation are managed solely on the frontend, it can lead to performance bottlenecks, scalability issues, and user experience challenges. Transitioning to a balanced architecture can address these limitations while improving overall system efficiency.

From Frontend-Heavy to a Balanced Architecture: Enhancing System Efficiency

From Blinking LEDs to Real-Time AI: The Raspberry Pi’s Role in Innovation

The Raspberry Pi, launched in 2012, has entered the vocabulary of all doers and makers of the world. It was designed as an affordable, accessible microcomputer for students and hobbyists. Over the years, Raspberry Pi has evolved from a modest credit card-sized computer into a versatile platform that powers innovations in fields as diverse as home economics to IoT, AI, robotics and industrial automation. Raspberry Pis are single board computers that can be found in an assortment of variations with models ranging from anywhere between $4 to $70. Here, we’ll trace the journey of the Raspberry Pi’s evolution and explore some of the innovations that it has spurred with examples and code snippets.

From Blinking LEDs to Real-Time AI: The Raspberry Pi’s Role in Innovation

Exploring Different Text-to-Speech (TTS) Models: From Robotic to Natural Voices

Text-to-speech (TTS) technology has evolved significantly in the past few years, enabling one to convert simple text to spoken words with remarkable accuracy and naturalness. From simple robotic voices to sophisticated, human-like speech synthesis, models offer specialized capabilities applicable to different use cases. In this blog, we will explore how different TTS models generate speech from text as well as compare their capabilities, models explored include MARS-5, Parler-TTS, Tortoise-TTS, MetaVoice-1B, Coqui TTS among others. The TTS process generally involves several key steps discussed later in detail: input text and reference audio, text processing, voice synthesis and then the final audio is outputted. Some models enhance this process by supporting few-shot or zero-shot learning, where a new voice can be generated based on minimal reference audio. Let's delve into how some of the leading TTS models perform these tasks.

Exploring Different Text-to-Speech (TTS) Models: From Robotic to Natural Voices
Refining and Creating Data Visualizations with LIDA

Refining and Creating Data Visualizations with LIDA

Microsoft’s Language-Integrated Data Analysis (LIDA) is a game-changer, offering an advanced framework to refine and enhance data visualizations with seamless integration, automation, and intelligence. Let’s explore the key features and applications of LIDA, and its transformative impact on the data visualization landscape. LIDA is a powerful library designed to effortlessly generate data visualizations and create data-driven infographics with precision. What makes LIDA stand out is its grammar-agnostic approach, enabling compatibility with various programming languages and visualization libraries, including popular ones like matplotlib, seaborn, altair, and d3. Plus, it seamlessly integrates with multiple large language model providers such as OpenAI, Azure OpenAI, PaLM, Cohere, and Huggingface.

Core Web Vitals: How to Improve LCP and CLS for Optimal Site Performance

Core Web Vitals: How to Improve LCP and CLS for Optimal Site Performance

Optimizing a website for performance is essential to enhance user experience and boost search engine rankings. Two critical metrics from Google’s Core Web Vitals (CWV)—Largest Contentful Paint (LCP) and Cumulative Layout Shift (CLS)—play a significant role in measuring and improving a site’s performance. These metrics outline the key strategies for optimization and highlight the observed impact on both mobile and desktop performance.

From Frontend-Heavy to a Balanced Architecture: Enhancing System Efficiency

From Frontend-Heavy to a Balanced Architecture: Enhancing System Efficiency

Building efficient and scalable applications often requires balancing responsibilities between the frontend and backend. When tasks like report generation are managed solely on the frontend, it can lead to performance bottlenecks, scalability issues, and user experience challenges. Transitioning to a balanced architecture can address these limitations while improving overall system efficiency.

From Blinking LEDs to Real-Time AI: The Raspberry Pi’s Role in Innovation

From Blinking LEDs to Real-Time AI: The Raspberry Pi’s Role in Innovation

The Raspberry Pi, launched in 2012, has entered the vocabulary of all doers and makers of the world. It was designed as an affordable, accessible microcomputer for students and hobbyists. Over the years, Raspberry Pi has evolved from a modest credit card-sized computer into a versatile platform that powers innovations in fields as diverse as home economics to IoT, AI, robotics and industrial automation. Raspberry Pis are single board computers that can be found in an assortment of variations with models ranging from anywhere between $4 to $70. Here, we’ll trace the journey of the Raspberry Pi’s evolution and explore some of the innovations that it has spurred with examples and code snippets.

Exploring Different Text-to-Speech (TTS) Models: From Robotic to Natural Voices

Exploring Different Text-to-Speech (TTS) Models: From Robotic to Natural Voices

Text-to-speech (TTS) technology has evolved significantly in the past few years, enabling one to convert simple text to spoken words with remarkable accuracy and naturalness. From simple robotic voices to sophisticated, human-like speech synthesis, models offer specialized capabilities applicable to different use cases. In this blog, we will explore how different TTS models generate speech from text as well as compare their capabilities, models explored include MARS-5, Parler-TTS, Tortoise-TTS, MetaVoice-1B, Coqui TTS among others. The TTS process generally involves several key steps discussed later in detail: input text and reference audio, text processing, voice synthesis and then the final audio is outputted. Some models enhance this process by supporting few-shot or zero-shot learning, where a new voice can be generated based on minimal reference audio. Let's delve into how some of the leading TTS models perform these tasks.

Additional

Your Random Walk Towards AI Begins Now