• Resolved Cameron Barrett

    (@cameronbarrett)


    We love, love, love TablePress and are very heavy users of it. Wonderful plugin.

    One of the things we use TablePress for is to display staff and teacher lists for schools. These are stored in Excel CSV files that get updated frequently and tied to TablePress tables.

    One of the data columns we store is email addresses, which is a requirement because our parents need contact info for their childrens’ teachers. Obvious, right?

    Example: http://www.nps.k12.nj.us/ABG/school-staff/faculty-staff/

    Anyway, our IT department wants us to remove all email addresses from our web sites because the spambot scrapers are grabbing those addresses.

    I did a lot of searching around for way to obfuscate email addresses, and short of storing a PHP eval() snippet of code for every cell that contains an email address and then calling the antispambot function in WordPress, I couldn’t find any way to obfuscate email addresses that are stored in TablePress. This also wouldn’t work because we’d have to store that PHP code in Excel before exporting to CSV.

    However, I have an idea. What if you took your Automatic URL Conversion extension for TablePress and extended that so that all email addresses are converted to mailto: links (and also obfuscated). Would also be nice (but not required) if a linked email address could be passed to a Gravity Forms contact form.

    This way all we’d have to do to add something like automatic_email_conversion=true to the TablePress shortcodes.

    https://wordpress.org/plugins/tablepress/

Viewing 5 replies - 1 through 5 (of 5 total)
  • Plugin Author TobiasBg

    (@tobiasbg)

    Hi,

    thanks for your question, and sorry for the trouble.

    Using the antispambot() function is indeed the only real automatic solution that I can think of here. Other methods, like using JavaScript to obfuscate would be harder to add.

    Note that there are already scrapers that can work around antispambot(), so that I usually just write the email address with some extra (hidden) HTML code — the cost being that it’s not clickable. For that, I just write the address as e.g.

    mail<span style="display:none">no-spam</span>@<span style="display:none">mail</span>example.com

    It will appears as mail@example.com and copy/paste will only copy that. Spammers will however have trouble with the extra HTML.

    Now, if you wanted to add antispambot() to the Extension, that would certainly be possible, but would require some changes, as the function itself does not do email address parsing (it already requires an address). One would therefore have to replace the call to make_clickable() to a copy of that file that then uses a modified _make_email_clickable_cb callback, which again is extended with a call to antispambot().
    Unfortunately, I’m pretty busy at the moment and don’t have time to implement that myself, so I suggest that you take a look at that, if possible.

    Regards,
    Tobias

    Thread Starter Cameron Barrett

    (@cameronbarrett)

    I am close to getting this working but my PHP skills are not very good. perhaps you can point me in the right direction.

    I cloned your Automatic URL Conversion extension and renamed it tablepress-automatic-email.conversion.php and network-activated it.

    I can get the str_replace function to work, sort of, but I think I need to do both a str_replace and a join using the PHP dot operator. This is where I’m failing.

    Your code:

    $output = str_replace( '<a href="http', '<a rel="nofollow" href="http', $output );

    I was able to successfully get an email obfuscated by using the Email Obfuscate Shortcode plugin and embedding that shortcode into each table cell, but this would then require me to manually edit every TablePres table across 68+ sites and anytime someone updates it they’d blow away these changes.

    So, what I need is a way to automatically convert an email address as stored in TablePress to an obfuscated email address using the [email-obfuscate email=”test@test.com”] shortcode.

    I’ve tried code like this:

    elseif ( $render_options['automatic_email_conversion_obfuscate'] ) {
    	$output = str_replace( '<a href="mailto:', '[email-obfuscate email=' . ']', $output );

    But I’m not doing something right.

    The TablePress Table is using the following shortcode with passed parameters:

    [table id=1 automatic_email_conversion=true automatic_email_conversion_obfuscate=true /]

    Plugin Author TobiasBg

    (@tobiasbg)

    Hi Cameron,

    I’m afraid that won’t work like that, and using an extra plugin with an extra Shortcode should be overkill here, if we need to modify the Extension anyways.

    The construction of that new Shortcode that you are trying in your PHP code basically suffers from the same problem why we can’t directly use the antispambot() PHP function. We simply don’t have direct access to the plain email address. We only have it part of the other HTML.

    My suggestion therefore is that idea from my previous reply: Copy the make_clickable() function from WordPress into the Extension, with a new name (like cb_make_clickable(). Also copy and rename the _make_email_clickable_cb() function. Modify the new cb_make_clickable() to use that new callback _cb_make_email_clickable_cb. Modify that _cb_make_email_clickable_cb() function to also call antispambot() on the email address that it gets.

    Regards,
    Tobias

    Thread Starter Cameron Barrett

    (@cameronbarrett)

    Just an update on this. I actually came up with two solutions.

    The first is a custom plugin that extends the make_clickable_cb fucntion to call antispambot()

    function _make_email_clickable_antispambot_cb( $content ) {
        $email = $content[2] . '@' . $content[3];
        if ( ! is_email( $email ) ) {
            return;
        }
    
        return '\<a href="mailto:' . antispambot( $email ) . '">' . antispambot( $email ) . '</a>';
        add_shortcode( 'email', 'wpcodex_hide_email_shortcode' );
    }

    And then when the match is returned, there is a callback that processes the antispambot function.

    $ret = preg_replace_callback( '#([\s>])([.0-9a-z_+-]+)@(([0-9a-z-]+\.)+[0-9a-z]{2,})#i', '_make_email_clickable_antispambot_cb', $ret );

    Here’s the full modified plugin file, if you want to add this work to your plugin, that would be great.

    <?php
    /*
    Plugin Name: TablePress Extension: Automatic URL conversion
    Plugin URI: https://tablepress.org/extensions/automatic-url-conversion/
    Description: Custom Extension for TablePress to automatically make URLs (www, ftp, and email) in table cells clickable
    Version: 1.3
    Author: Tobias Bäthge
    Author URI: https://tobias.baethge.com/
    */
    
    /*
     * Usage and possible parameters:
     * [table id=1 automatic_url_conversion=true automatic_url_conversion_new_window=true automatic_url_conversion_rel_nofollow=true /]
     *
     * automatic_url_conversion: Whether URLs shall be made clickable.
     * automatic_url_conversion_new_window: Whether http(s) links shall open in a new window.
     * automatic_url_conversion_rel_nofollow: Whether http(s) links shall get the <code>nofollow</code> attribute.
     */
    
    add_filter( 'tablepress_table_output', 'tablepress_auto_url_conversion', 10, 3 );
    add_filter( 'tablepress_shortcode_table_default_shortcode_atts', 'tablepress_add_shortcode_parameter_auto_url_conversion' );
    
    /**
     * Add Extension's parameters as a valid parameters to the [table /] Shortcode.
     */
    function tablepress_add_shortcode_parameter_auto_url_conversion( $default_atts ) {
    	$default_atts['automatic_url_conversion'] = false;
    	$default_atts['automatic_url_conversion_new_window'] = false;
    	$default_atts['automatic_url_conversion_rel_nofollow'] = false;
    	return $default_atts;
    }
    
    function _make_email_clickable_antispambot_cb( $content ) {
        $email = $content[2] . '@' . $content[3];
        if ( ! is_email( $email ) ) {
            return;
        }
    
        return '\<a href="mailto:' . antispambot( $email ) . '">' . antispambot( $email ) . '</a>';
        add_shortcode( 'email', 'wpcodex_hide_email_shortcode' );
        //return $matches[1] . "<a href=\"mailto:$email\">$email</a>";
    }
    
    /**
     * Convert plaintext URI to HTML links.
     * Modified to call the local antispambot version of _make_email_clickable_cb()
     *
     * Converts URI, www and ftp, and email addresses. Finishes by fixing links
     * within links.
     *
     * @since 0.71
     *
     * @param string $text Content to convert URIs.
     * @return string Content with converted URIs.
     */
    function make_clickable_cb( $text ) {
        $r = '';
    
        //strip mailto tags from the text first - otherwise they are skipped by the following code and while the displayed email
        //address will be obfuscated, the one actually in the mailto tag will not.
        $text = $content = preg_replace("~<a\s+?href=[\'|\"]mailto:(.*?)[\'|\"].*?>.*?</a>~", "$1", $text);
    
        $textarr = preg_split( '/(<[^<>]+>)/', $text, -1, PREG_SPLIT_DELIM_CAPTURE ); // split out HTML tags
        $nested_code_pre = 0; // Keep track of how many levels link is nested inside <pre> or <code>
        foreach ( $textarr as $piece ) {
    
            if ( preg_match( '|^<code[\s>]|i', $piece ) || preg_match( '|^<pre[\s>]|i', $piece ) )
                $nested_code_pre++;
            elseif ( ( '</code>' === strtolower( $piece ) || '</pre>' === strtolower( $piece ) ) && $nested_code_pre )
                $nested_code_pre--;
    
            if ( $nested_code_pre || empty( $piece ) || ( $piece[0] === '<' && ! preg_match( '|^<\s*[\w]{1,20}+://|', $piece ) ) ) {
                $r .= $piece;
                continue;
            }
    
            // Long strings might contain expensive edge cases ...
            if ( 10000 < strlen( $piece ) ) {
                // ... break it up
                foreach ( _split_str_by_whitespace( $piece, 2100 ) as $chunk ) { // 2100: Extra room for scheme and leading and trailing paretheses
                    if ( 2101 < strlen( $chunk ) ) {
                        $r .= $chunk; // Too big, no whitespace: bail.
                    } else {
                        $r .= make_clickable_cb( $chunk );
                    }
                }
            } else {
                $ret = " $piece "; // Pad with whitespace to simplify the regexes
    
                $url_clickable = '~
    				([\\s(<.,;:!?])                                        # 1: Leading whitespace, or punctuation
    				(                                                      # 2: URL
    					[\\w]{1,20}+://                                # Scheme and hier-part prefix
    					(?=\S{1,2000}\s)                               # Limit to URLs less than about 2000 characters long
    					[\\w\\x80-\\xff#%\\~/@\\[\\]*(+=&$-]*+         # Non-punctuation URL character
    					(?:                                            # Unroll the Loop: Only allow punctuation URL character if followed by a non-punctuation URL character
    						[\'.,;:!?)]                            # Punctuation URL character
    						[\\w\\x80-\\xff#%\\~/@\\[\\]*(+=&$-]++ # Non-punctuation URL character
    					)*
    				)
    				(\)?)                                                  # 3: Trailing closing parenthesis (for parethesis balancing post processing)
    			~xS'; // The regex is a non-anchored pattern and does not have a single fixed starting character.
                // Tell PCRE to spend more time optimizing since, when used on a page load, it will probably be used several times.
    
                $ret = preg_replace_callback( $url_clickable, '_make_url_clickable_cb', $ret );
    
                $ret = preg_replace_callback( '#([\s>])((www|ftp)\.[\w\\x80-\\xff\#$%&~/.\-;:=,?@\[\]+]+)#is', '_make_web_ftp_clickable_cb', $ret );
                $ret = preg_replace_callback( '#([\s>])([.0-9a-z_+-]+)@(([0-9a-z-]+\.)+[0-9a-z]{2,})#i', '_make_email_clickable_antispambot_cb', $ret );
    
                $ret = substr( $ret, 1, -1 ); // Remove our whitespace padding.
                $r .= $ret;
            }
        }
    
        // Cleanup of accidental links within links
        return preg_replace( '#(<a([ \r\n\t]+[^>]+?>|>))<a [^>]+?>([^>]+?)</a></a>#i', "$1$3</a>", $r );
    }
    
    /**
     * Convert URLs to links, if Shortcode parameter is set,
     * add the <code>target</code> attribute in http(s):// links,
     * or add the <code>rel</code> attribute, if the Shortcode parameter is set.
     */
    function tablepress_auto_url_conversion ( $output, $table, $render_options ) {
    	if ( $render_options['automatic_url_conversion'] ) {
    		$output = make_clickable_cb( $output );
    	}
    
    	if ( $render_options['automatic_url_conversion_new_window'] && $render_options['automatic_url_conversion_rel_nofollow'] ) {
    		$output = str_replace( '<a href="http', '<a target="_blank" rel="nofollow" href="http', $output );
    	} elseif ( $render_options['automatic_url_conversion_new_window'] ) {
    		$output = str_replace( '<a href="http', '<a target="_blank" href="http', $output );
    	} elseif ( $render_options['automatic_url_conversion_rel_nofollow'] ) {
    		$output = str_replace( '<a href="http', '<a rel="nofollow" href="http', $output );
    	}
    
    	return $output;
    }

    The second solution is that I found an existing email obfuscation plugin that does convert email addresses that are stored in TablePress. Apparently, these two solutions work independently and don’t conflict with each other.

    https://wordpress.org/plugins/obfuscate-email/

    Plugin Author TobiasBg

    (@tobiasbg)

    Hi,

    nice work! Thanks a lot for sharing this!

    Best wishes,
    Tobias

Viewing 5 replies - 1 through 5 (of 5 total)
  • The topic ‘Obfuscating email addresses stored in TablePress’ is closed to new replies.