Block link to some URLs
-
Hi, I’m having an issue with a link to a URL. I want in general all URLs to be linked and referenced in my answer, except for some PDFs that I have uploaded. I’ve tried setting up my prompt in a way that forbids referencing to some PDFs, but somehow the system still overrides it. Would you have any tips in that regard?
In terms of system, everything is up to date, I use GPT4.1 and Voyage 3 largeThe page I need help with: [log in to see the link]
-
Hi @ebzeta,
Glad to see you got the language issue sorted out. I likely should have started with your prompt before because I assumed you already had something like that! As for this, let’s start where that one ended. Would you mind sharing me your related ai behavior instructions related to hyperlinking citations including URLs and what you tried against it linking URLs?
This may be a little tricky because if you add an instruction saying something like “You must cite your source URL from where you get the information, but not PDFs” it will likely misinterpret something like this because technically the PDFs are URLs from your website as well. It may view “PDFs” as URLs because they are URLs which lead to a PDF.
I’ll also thinking about this some as well from a non-prompt engineering way to see if there is likely an easy way to force this to be done not via prompting.
Hey, thanks for the feedback!
As for the language issue, I should let you know that my solution seems to be a workaround – from what I’ve understood once the knowledge base is in several languages models strongly tend to reply in the language their reference source is written in and working just via a prompt is a limited solution. Apparently you can provide further details if you work on the code / embedding, but that’s way beyond my skills.
As for the pdfs, it seems to be trickier. This is what I have thus far:
- In my knowledge base there are several sources, some of them, in PDF, are sources that I don’t want to have publicly available.
- My prompt includes three hyperlink rules:
- 1 – hyperlink all URLs, with the following exceptions:
- 2 – never provide a hyperlink to this URL: XXXXX hyperlink to this one instead: YYYYY
- 3 – never provide a hyperlink to this URL ZZZZZ extract content from the source instead
The first two rules work, but the third one always gets overridden, even with a different placement in the prompt or a different (stronger) formulation.
I’m afraid this may be something that lies deeper in the coding, would you have any ideas on how to proceed?
Thank you!-
This reply was modified 11 months ago by
ebzeta.
Hi @ebzeta,
Here is the solution I have for now. You would need to add this script to your functions.php file:
add_action('wp_footer', 'mxchat_pdf_filter_javascript');
function mxchat_pdf_filter_javascript() {
?>
<script>
function removePdfLinks() {
// Target only the MxChat chatbot container
const chatbotContainer = document.querySelector('#mxchat-chatbot');
if (!chatbotContainer) return;
// Remove PDF links only within the chatbot
const pdfLinks = chatbotContainer.querySelectorAll('a[href*=".pdf"]');
pdfLinks.forEach(function(link) {
const textContent = link.textContent || link.innerText;
link.outerHTML = textContent;
});
// Remove plain PDF URLs from chatbot text
const walker = document.createTreeWalker(
chatbotContainer,
NodeFilter.SHOW_TEXT,
null,
false
);
const textNodes = [];
let node;
while (node = walker.nextNode()) {
textNodes.push(node);
}
textNodes.forEach(function(textNode) {
if (textNode.textContent.includes('.pdf')) {
textNode.textContent = textNode.textContent.replace(/https?:\/\/[^\s]+\.pdf(?:#[^\s]*)?\s?/gi, '');
}
});
}
document.addEventListener('DOMContentLoaded', removePdfLinks);
// Watch only the chatbot container for changes
const chatbotContainer = document.querySelector('#mxchat-chatbot');
if (chatbotContainer) {
const observer = new MutationObserver(removePdfLinks);
observer.observe(chatbotContainer, { childList: true, subtree: true });
}
</script>
<?php
}
This will automatically remove all PDF links from the chat UI before they appear. I’m going to review a more robust and streamlined solution that will be integrated directly into MxChat, but that will take some time to implement. If you aren’t comfortable modifying your functions.php file, I could likely turn this into a little plugin that you could upload instead.Thanks again,
Maxwell
Hi Maxwell, thank you for this! I really appreciate, though unfortunately it doesn’t fully solve my problem. For me it’s a more complex than this, i have:
– PDF files of books I’ve written that when used should hyperlink to the book’s URL on my website
– PDF files of private work that I’ve done that I don’t want publicly available, so no hyperlink at all
– PDF files of examples for the chatbot to rely on, that I want used to develop an answer but not referenced to in the answer itself
– PDF files of canvases or similar that I want linked to directly so that users can download them and use them should they need to.
I understand that this is turning into much more than I originally thought, I will leave this answer here as feedback for you should you decide to use it for the more robust solution you mentioned, and in the meantime mark this thread as complete.Thank you so much for your help!
It’s no problem. That is an easy adjustment if I understand you correctly. You simply want to have control over which PDFs you block. Instead of the function working as a catch all, it can be adjusted to block only certain PDF file names from a list. I’m going to build a much more robust solution directly into MxChat that will allow you to label sources, but this is just a temporary solution for you until I can get that completed (no ETA atm).
Use this instead in your functions.php:
add_action('wp_footer', 'mxchat_pdf_filter_javascript');
function mxchat_pdf_filter_javascript() {
// Add your blocked PDF filenames here
$blocked_pdfs = [
'example.pdf',
'example2.pdf',
'example3.pdf'
];
$blocked_pdfs_json = json_encode($blocked_pdfs);
?>
<script>
function removePdfLinks() {
const blockedPdfs = <?php echo $blocked_pdfs_json; ?>;
const chatbotContainer = document.querySelector('#mxchat-chatbot');
if (!chatbotContainer || blockedPdfs.length === 0) return;
function isBlockedPdf(url) {
return blockedPdfs.some(filename => url.includes(filename));
}
// Remove blocked PDF links
const pdfLinks = chatbotContainer.querySelectorAll('a[href*=".pdf"]');
pdfLinks.forEach(function(link) {
if (isBlockedPdf(link.getAttribute('href'))) {
const textContent = link.textContent || link.innerText;
link.outerHTML = textContent;
}
});
// Remove blocked PDF URLs from text
const walker = document.createTreeWalker(
chatbotContainer,
NodeFilter.SHOW_TEXT,
null,
false
);
const textNodes = [];
let node;
while (node = walker.nextNode()) {
textNodes.push(node);
}
textNodes.forEach(function(textNode) {
if (textNode.textContent.includes('.pdf')) {
let content = textNode.textContent;
blockedPdfs.forEach(function(filename) {
const escapedFilename = filename.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
const regex = new RegExp('https?:\\/\\/[^\\s]*' + escapedFilename + '(?:#[^\\s]*)?\\s?', 'gi');
content = content.replace(regex, '');
});
textNode.textContent = content;
}
});
}
document.addEventListener('DOMContentLoaded', removePdfLinks);
const chatbotContainer = document.querySelector('#mxchat-chatbot');
if (chatbotContainer) {
const observer = new MutationObserver(removePdfLinks);
observer.observe(chatbotContainer, { childList: true, subtree: true });
}
</script>
<?php
}At the top you will see an array list:
// Add your blocked PDF filenames here
$blocked_pdfs = [
‘example.pdf’,
‘example2.pdf’,
‘example3.pdf’
];
Simply add the file names there to the PDFs you want to block. This will allow you to block some PDFs while allowing others through.
Give that a try and let me know!
EDIT:
For clarification – I did test it on my end with several PDFs and it worked perfectly. When I said “give it a try and let me know” I didn’t want it to come off as me sending you untested code. I more so let me know if that is not the functionality you needed!-
This reply was modified 10 months, 4 weeks ago by
m4xw3ll.
-
This reply was modified 10 months, 4 weeks ago by
Jan Dembowski.
The topic ‘Block link to some URLs’ is closed to new replies.