Now, AI agents are being adopted virtually everywhere. Theyāre in healthcare, education, architecture, the legal services, the list goes on…
Their adoption is being driven by advanced AI models, both proprietary ones like OpenAI’s GPT, and open-source frameworks like Meta’s LLaMA, which are becoming increasingly accessible to developers by the day.
Nevertheless, significant challenges remain. And while each challenge is specific to its industry, system integration and the need to ensure accurate and reliable outputs is a challenge that unites all developers working with AI.
To get a better understanding of the real-world developer experience when working with these models, we spoke with Imran Gadzhiev, founder of the Zema app and an experienced software engineer. He has had unique experience in applying AI to culturally/socially sensitive domains such as religion and healthcare, where the accuracy and impartiality of the output is especially crucial.
Above: Imran Gadzhiev, founder of the Zema app
In this article, we bring you insights from Imranās first hand experience in integrating these proprietary AI models with different types of data, which is arguably one of the biggest challenges that developers now face in deploying AI applications.
AI is actively being integrated in so many industries, and across all stages of development, too. What made you decide to start using AI agents in your own projects and the ones youāve worked on for other companies?
“Initially, it was the need to analyse large volumes of data. Iāve turned to AI in my project, Zema, which focuses on religion, and in cases Iāve worked on in Health Tech. The second use case was a natural next step ā response generation. And that doesnāt just mean chatbots answering direct questions. For example, in Health Tech, AI is widely used for image or scan analysis and text generation. Weāve worked with images for diagnoses and combined models for both image and text processing. The AI would generate a response ā an image description ā which is a response, even if it requires a specialist to decode this data.”
“Another case where Iāve seen AI be extremely useful in my work is contextual analysis, which is applicable in both medical and religious fields. Contextual analysis allows models to understand the relationships and dependencies between different elements of data more deeply. Maybe itās understanding authorship, historical contexts, or certain events. All of these intricate interconnectionsā¦ For example, letās say there was a war in the 10th century, and there are remnants of published information about it. The model would need to take that into account too, along with more recent interpretations of the same war.”
“These three use cases ā processing large volumes of data, generating responses, and performing contextual analysis ā are the main ones I deal with as a developer.”
Both medicine and religion seem like fields that demand a lot from AI. How does that affect developers like yourself?
“That is absolutely true ā both of these examples are quite challenging in terms of working with AI. Take Zema, for example. Itās a chatbot that answers user questions based on religious texts, but we had to make sure that the answers were generated solely from those specific sources, not from GPTās general knowledge and interpretations. This is a huge responsibility, and a big task too. What I did was feed the sources into the GPT model, and fine-tune it to ensure it responded strictly according to these sources.”
“From a technical standpoint, this required fully isolating the model from being able to make its own conclusions. The idea was to completely prevent GPT from adding any creative input whatsoever.”
“In both medicine and religion, accuracy is crucial. I knew how significant a role it plays in these fields, and made sure I could guarantee that for our users.”
What steps did you take to minimize hallucinations from the AI model?
“This is a big question. Plus, a crucial one. I took a comprehensive approach. First, fine-tuning. I feel I need to emphasize how important an iterative approach is here. In the early versions, Zema didnāt even respond to user questions ā it simply pulled relevant lines from our database. This was because I first wanted to focus on collecting benchmark datasets, which I could later use to fine-tune the GPT model. The task was to structure the datasets correctly, preprocess them, and then fine-tune the model in repeated cycles. Then, with each iteration, Zema got better and better, learning from the data and reducing distortions.”
“It wasnāt just about fine-tuning, though. I focused on ensuring it remained relevant. As new materials were added and new articles appeared on the internet, everything had to be kept up to date.”
“I also used prompt engineering ā structuring the prompts in ways that would guide the model toward more accurate responses. So, when asking the model a question, weād include context that made it easier for the AI to pull from the correct source. For example, you can say: ‘summarize this article‘, but an improved prompt would look more like this: ‘can you provide a concise summary of the following article, focusing on its main findings, conclusions, and any significant implications for future research? Article: [insert article text].’“
“Additionally, I worked to prevent model overfitting during fine-tuning. Overfitting is a common problem in machine learning, where a model performs well on training data but does poorly with new data. If we update materials or if the AI agent accesses the internet, the model might not perform as well. To combat this, I used regularization techniques to prevent overfitting.”
“Here, iterative training and fine-tuning are still very relevant. Cycles of testing and retraining to make the model much more accurate every time. We even found a specialist who helped create a quality control system for the answers. I also built a mini-monitoring service to track any deviations or errors in performance. And of course, we gathered feedback from real users via email.”
“As you can see, it was a very multi-layered approach. Not everything was automated, but this helped maintain the modelās ability to generate accurate answers and minimize hallucinations as much as possible.”
“Fortunately, developers of large proprietary models have done a lot of work to adapt AI to the real world, so both doctors (in the medical fieldās case) and users understand the possibility of hallucinations occurring, even despite developersā best efforts to eliminate them.”
For working in such sensitive fields, do you use one or multiple proprietary models, or are you developing something of your own?
“Iām a full-stack developer and solo founder, and my personal projects are startups, so naturally, I use ready-made models, as most do and would in my case. For most of my products, I use OpenAI, Meta, and Mistral models. Initially, I only used Mistral, but now I exclusively use OpenAI, because they have quite powerful NLP tools.”
“As I said, I donāt create custom models, but I do use external libraries or mathematical algorithms for certain tasks. For example, logistic regression is great for solving binary classification tasks. I usually use such algorithms for preprocessing data for further model training.”
“For example, some data isnāt suitable for us ā like information that doesnāt relate to medicine, religion, finance, or any specific jurisdiction, it doesnāt really matter. So, how do we determine that itās irrelevant? In theory, the agent can pull information from any category. However, it can also retrieve data from categories we donāt want or want to exclude. Thatās exactly why we create a layer to classify this data and figure out whether itās relevant to us or not.”
Speaking of data, how important is it for fine-tuning a proprietary model for a just small product?
“Data quality is still key. It essentially determines how viable the products such as chatbots will be. I even have a saying, ‘a model in a product is only as good as its data’.ā
“The better we gather and preprocess the data, the better the results from the model will be. As I’ve mentioned, I put a lot of resources into this part of my work as a developer, using logistic regression to help with it. Ultimately, the potential of GPT being successful in my product will hinge on how well we handle this aspect.”
Lastly, could you share your thoughts on the importance of working with ML for developers’ professional development? Regardless of whether they’re on a founder’s path or following a traditional career, how essential is it to have ML skills if you’re not an ML engineer?
“Everything I’ve described, including classifications, are common tasks. But whoās going to do that? Surely, data scientists wonāt be the ones dealing with it. So even if you’re not working directly with the technology, if you are a backend or a full-stack developer ā you will be the one integrating it.”
“Given this info, if you want to stay afloat in the market, then learning how to work with AI is key, regardless if you are planning on launching your own product and leading it tech-wise, or working with companies that are set on making AI a key component of their products.”