• Hi there,
    I have a problem while importing data from a csv file.
    Unfortunately this error doesn’t always occur, but most of the time it does so that the import is not reliable. It also doesn’t matter which delimiter or enclosure character I use.

    I have a file that looks like this (of course in UTF-8):

    hull_number;ship_name;year_built;location___city;country;owner_imported;description;data_source;status;email_imported;email
    1751;Rigoletto;1973;Down Cruising Club;United Kingdom;Michael McDermott;RIGOLETTO (formerly Elkedor) moved from Fareham to Strangford in 2008;VEGADB;imported;mjmcd1@test.com;mig1751@hotmail.com
    0002;Elvira;1966;Traelhavets baatklubb;Sweden;Ingemar Baeck;Original Albin O21 is replaced with a Beta 13 hp;VEGADB;imported;ibackaa@test.com;mig0002@hotmail.com

    When I import it I get the following messages:

    The file ximport_test1.csv has been uploaded.
    
    File upload complete: 2 lines received.
    The data is importing in the background, refresh the page to get the current status of the import.
    
    Participants Database:
    
    Import Complete:
    2 records added 2 records imported

    After the import I see that several columns are emtpy, i.e. hull number or email, but most important of course no private ID is created.
    When I turn on debug log, I see that all columns are recognized while populating the array, but some columns are missing in the following INSERT-Statement:

    [03/05/22 11:55am UTC]
    
    xnau_CSV_Import::insert_from_csv
        
    columns:hull_number, ship_name, year_built, location___city, country, owner_imported, description, data_source, status, email_imported, email
      
    csv line= Array
    (
      [hull_number] => 1751
      [ship_name] => Rigoletto
      [year_built] => 1973
      [location___city] => Down Cruising Club
      [country] => United Kingdom
      [owner_imported] => Michael McDermott
      [description] => RIGOLETTO (formerly Elkedor) moved from Fareham to Strangford in 2008
      [data_source] => VEGADB
      [status] => imported
      [email_imported] => mjmcd1@test.com
      [email] => mig1751@hotmail.com
    )
    
    PDb_submission\match\record::field_value_exists query: SELECT EXISTS( SELECT 1 FROM wp1_participants_database p WHERE p.email = 'mig1751@hotmail.com' AND p.id <> '0' )
    
    PDb_submission\main_query\base_query::execute_query storing record: INSERT INTO wp1_participants_database SET  'date_recorded' = "2022-03-05 11:55:21",  'date_updated' = "2022-03-05 11:55:21", 'ship_name' = 'Rigoletto', 'year_built' = '1973', 'location___city' = 'Down Cruising Club', 'country' = 'United Kingdom', 'description' = 'RIGOLETTO (formerly Elkedor) moved from Fareham to Strangford in 2008', 'data_source' = 'VEGADB', 'status' = 'imported', 'mailing_list' = 'No', 'hide_record' = 'Yes', 'owner_imported' = 'Michael McDermott', 'email_imported' = 'mjmcd1@test.com', 'approved' = 'no'

    What am I doing wrong?

    • This topic was modified 2 years, 10 months ago by maybe42.
Viewing 13 replies - 1 through 13 (of 13 total)
  • Plugin Author xnau webdesign

    (@xnau)

    I’ve taken some time to look into this and I have so far failed to duplicate the problem or even identify how this might happen.

    Can you tell me: does it happen at random or are there specific lines of data (in the importing CSV) that always have this issue?

    If you can identify the exact circumstances that lead to the problem every time, that will be enormously helpful. If there is a bug I really want to find it.

    It’s very strange to see the query getting truncated like that and not cause some kind of error.

    Thread Starter maybe42

    (@maybe42)

    Hi Roland, many thanks for your effort.

    As a software developer I understand very well that you want to fix the problem, it should either ALWAYS work or ALWAYS print an error.
    I did a few runs but as soon as I thought I’d spotted a pattern, it just couldn’t be reproduced.

    This is what the result looks like without changing anything in between – I just delete the two imported records again and again before a new import:

    1 okay
    2 okay
    3 NOK
    4 okay
    5 NOK
    6 okay
    7 okay
    8 NOK
    9 okay
    10 NOK
    11 okay
    12 NOK
    13 okay
    14 NOK
    15 okay

    Can the whole thing have something to do with “session status” or “caching”?

    I can do a workaround by breaking the import into smaller chunks (I have about 600 records to import in total) and if a run doesn’t work, I delete the records without “Private ID” and keep trying until it works .

    Unfortunately I don’t have the time to set up a private development environment to debug, but I’ll do some more test runs later to help troubleshoot.

    Best regards
    Peter

    Thread Starter maybe42

    (@maybe42)

    Well, I am through a lot of tests, still no luck.

    1. I streamlined the header line and removed a unused column
    2. I changed the delimiter from semicolon to comma
    3. I added double quotes as delimiter
    4. I tried with ‘Auto’ settings but also with dedicated specification
    5. I removed the email addresses
    6. Finally I removed even any spaces 🙂

    Sometimes it works fine a few times in a row, then it doesn’t work again.
    Now I’m a bit clueless.

    regards, Peter

    Plugin Author xnau webdesign

    (@xnau)

    Thanks, Peter, I appreciate the effort and the good documentation of the problem. I need to look again at what could be causing intermittent problems, especially one that could cause the query to be missing a couple of terms…that’s very odd, typically, it’s a total failure not such a specific limited failure that does not result in an error.

    One thing we can look at is what is special about your server setup, since I haven’t gotten similar reports from other users. For example, the background process uses WP transients to store some values that are used to process the queue. Since it is the WP Transients API, this means that other plugins or object/database caching can change how it works.

    Thread Starter maybe42

    (@maybe42)

    Hi Roland,
    I have installed a transient plug-in to see anything that I wanted never to see 🙂
    After removing all non-persistent transients I imported again, with no success.
    BTW, I changed all my test changes back to the prevoius state as mentioned in my first posting.

    So here again is my test record, settings are Auto/Auto, duplicate record preference=dont import the record, duplicate record check field=email

    hull_number;ship_name;year_built;location___city;country;owner;description;data_source;status;email
    1751;Rigoletto;1973;Down Cruising Club;United Kingdom;Michael McDermott;RIGOLETTO (formerly Elkedor) moved from Fareham to Strangford;VEGADB;imported;mjmcd1@btinternet.com

    Now two things are interesting.

    1.
    The import imported all columns except “hull number” and “email”, and in the list view there is no “private ID”. But when I dive into the edit view, there is a “private ID” – which is already in the database and used by another older record.

    (cant’t attach screenshots here, but I have some)

    2.
    After the failed import there is a transient called ‘participant_cache_4’.
    This is the content (I formatted it a bit to be more readable, its one long string)

    a:1:{i:446;O:8:"stdClass":30:{
    s:2:"id";s:3:"446";
    s:10:"private_id";N;
    s:7:"country";s:14:"United Kingdom";
    s:5:"email";N;
    s:12:"mailing_list";s:2:"No";
    s:5:"photo";N;
    s:7:"website";N;
    s:9:"interests";N;
    s:8:"approved";s:2:"no";
    s:13:"date_recorded";s:19:"2022-03-07 08:55:57";
    s:12:"date_updated";s:19:"2022-03-07 08:55:57";
    s:13:"last_accessed";N;
    s:16:"last_update_user";N;
    s:11:"hull_number";N;
    s:5:"owner";s:17:"Michael McDermott";
    s:9:"ship_name";s:9:"Rigoletto";
    s:15:"location___city";s:18:"Down Cruising Club";
    s:15:"location___area";N;
    s:11:"data_source";s:6:"VEGADB";
    s:6:"status";s:8:"imported";
    s:10:"year_built";s:4:"1973";
    s:14:"more_equipment";N;
    s:13:"delete_record";N;
    s:6:"engine";N;
    s:11:"description";s:61:"RIGOLETTO (formerly Elkedor) moved from Fareham to Strangford";
    s:11:"hide_record";s:3:"Yes";
    s:14:"previous_names";N;
    s:12:"current_flag";N;
    s:14:"owner_imported";N;
    s:14:"email_imported";N;
    }}

    If I see it right, it says the following:
    private id, email and hull number are treated as ‘not in CSV list’, which of course is not true.
    The last two colums, owner_imported and email_imported, are no longer in the database, I added it for testing and removed it afterwards. They will surely be in the column table but are not active anymore.

    This points into a direction where caching comes into mind?

    I hope my findings are of some help, if needed you can have access to my site, it’s just a testbed for the time being.
    Another thing I will try is to set up a quick test environment on my personal linux server where I have more control than on my provider’s server.

    best regards, Peter

    Plugin Author xnau webdesign

    (@xnau)

    The transient you’re looking for will be named “pdb_import_settings” the transient you have there is a cache for existing records, so not directly related.

    When a CSV file is imported, the list of columns is gotten from the CSV header, stored, and used for each line that is imported. The fact that some lines are imported OK and some are missing those columns is hard to explain: if there is a mismatch between the number of items in the header and the data line, there will be an error. Also, since the header is only read once when the file is uploaded, there isn’t an explanation for why some lines would simply be missing those columns with no error.

    The fact that the same line can be imported OK sometimes tells me there isn’t a problem with the data in the line.

    Even caching is not a good explanation for this, caching fails when it provides stale data, wouldn’t explain how some lines in a single upload get imported OK, and some don’t. However, caching issues are usually indicated when things happen intermittently because sometimes the data is stale, sometimes it is not.

    If you are using an object or database cache, disabling it may be a way to determine if that is the problem or not.

    Thread Starter maybe42

    (@maybe42)

    Hi Roland, my problem still persists, but as I mentioned before I was able to delete the corrupt entries and try the import again until all columns are imported correctly.

    I just updated to 2.0.8, but I have still a second error that is more than ugly:
    The imported private IDs are all the same (in my case “RPNE2”).
    I use the addition to expand the private ID to 9 characters, so normal registration IDs look like “IS5T1U0F4”.

    Even if I say that private ID is the duplicate record check field and the preference is not to import these records, the import creates duplicates, all with private ID “RPNE2” 🙁

    Do you have any idea where I could look after this?

    Plugin Author xnau webdesign

    (@xnau)

    Thanks for letting me know about this.

    You can’t import private IDs with your CSV that do not conform to the validity test. This is on line 106 of the plugin file classes/PDb_submission/main_query/internal_column.php

    There is a regex that checks the incoming value and rejects it if it doesn’t pass the test. Your private ID values must pass that test to be accepted.

    Filling in the static value should not be happening, that is probably a bug, but it won’t keep you from importing your private IDs as long as they pass the validity test.

    Thread Starter maybe42

    (@maybe42)

    I am very sorry by not being clear with my description 🙁

    I do not import private IDs, I expected that they were generated as it is the case when I register a single entry using the registration form.
    In my CSV file are just the columns I mentioned before, i.e hull_number, owner_name, year_built, email_address and so on.

    A few weeks ago these private IDs have been generated as expected, but now they all have the same value. When I find the time I will set up another test case on my own server so I can better look into it and maybe extend the debug output a bit.

    Thread Starter maybe42

    (@maybe42)

    Hi Roland,

    I have found out a thing at least.
    The column “hull_number”, which were imported only sometimes and sometimes not, had a flag of being read-only. Same applies to the column email_address.
    But it won’t help to change that flag, I had to delete the column and create a completely new one, and now the import do it right every time.
    So there might be a problem with any kind of caching.

    A second problem arises as I mentioned before, the private ID will either be empty or it always have the same value. I do not import it and moreover I have set the private ID as duplicate record check field… even it should not be necessary, because it must be always unique. I suppose that this is really a bug.

    I have inserted some own debug infos and here is the output of my test case (I reduced it to a minimum):

    xnau_CSV_Import::insert_from_csv
        
    columns:ship_name, vega, year_built
    csv line= Array
    (
      [ship_name] => Rigoletto
      [vega] => 1751
      [year_built] => 1973
    )
    
    xnau_CSV_Import::insert_from_csv
    columns:ship_name, vega, year_built
    csv line= Array
    (
      [ship_name] => Las Vegas
      [vega] => 1002
      [year_built] => 1970
    )
    
     [pbe] after do_action [column_name]:ship_name
     [pbe][column_object->import_value()]:Rigoletto
     [pbe][$column_object->query_clause()]:<code>ship_name</code> = %s
     [pbe][$column_object->add_to_query($action)]:1
    
     [pbe] after do_action [column_name]:vega
     [pbe][column_object->import_value()]:1751
     [pbe][$column_object->query_clause()]:<code>vega</code> = %s
     [pbe][$column_object->add_to_query($action)]:1
    
     [pbe] after do_action [column_name]:year_built
     [pbe][column_object->import_value()]:1973
     [pbe][$column_object->query_clause()]:<code>year_built</code> = %s
     [pbe][$column_object->add_to_query($action)]:1
    
     [pbe] after do_action [column_name]:private_id
     [pbe][column_object->import_value()]:RPNE2
     [pbe][$column_object->query_clause()]:<code>private_id</code> = %s
     [pbe][$column_object->add_to_query($action)]:1
    
    PDb_submission\main_query\base_query::execute_query storing record: INSERT INTO wp_participants_database SET  <code>date_recorded</code> = "2022-03-18 21:25:42",  <code>date_updated</code> = "2022-03-18 21:25:42", <code>ship_name</code> = 'Rigoletto', <code>vega</code> = '1751', <code>year_built</code> = '1973', <code>private_id</code> = 'RPNE2'
    
    PDb_Participant_Cache::refresh_cache: Refreshing Participants Database cache for cache group 2
    
     [pbe] after do_action [column_name]:ship_name
     [pbe][column_object->import_value()]:Las Vegas
     [pbe][$column_object->query_clause()]:<code>ship_name</code> = %s
     [pbe][$column_object->add_to_query($action)]:1
     [pbe] after do_action [column_name]:vega
    
     [pbe][column_object->import_value()]:1002
     [pbe][$column_object->query_clause()]:<code>vega</code> = %s
     [pbe][$column_object->add_to_query($action)]:1
    
     [pbe] after do_action [column_name]:year_built
     [pbe][column_object->import_value()]:1970
     [pbe][$column_object->query_clause()]:<code>year_built</code> = %s
     [pbe][$column_object->add_to_query($action)]:1
    
     [pbe] after do_action [column_name]:private_id
     [pbe][column_object->import_value()]:RPNE2
     [pbe][$column_object->query_clause()]:<code>private_id</code> = %s
     [pbe][$column_object->add_to_query($action)]:1
    
    PDb_submission\main_query\base_query::execute_query storing record: INSERT INTO wp_participants_database SET  <code>date_recorded</code> = "2022-03-18 21:25:42",  <code>date_updated</code> = "2022-03-18 21:25:42", <code>ship_name</code> = 'Las Vegas', <code>vega</code> = '1002', <code>year_built</code> = '1970', <code>private_id</code> = 'RPNE2'
    
    PDb_Participant_Cache::refresh_cache: Refreshing Participants Database cache for cache group 2

    I think I will try to generate an own private ID in my CSV file, that maybe would help me to import my data 🙂

    Plugin Author xnau webdesign

    (@xnau)

    Just to confirm: you’re using the latest version of Participants Database?

    I’m trying to find a reason for what you’re seeing there, I’m unable to get the same result in my tests here.

    Does it help to not use the private id field as the duplicate check? It is normally set to “record id”.

    Plugin Author xnau webdesign

    (@xnau)

    I’m still checking on this, but it occurs to me that you can probably avoid the private_id using it’s default value on the import by directly editing the field definition in phpMyAdmin:

    Edit the “private_id” line, blanking out the “default” value.

    Thread Starter maybe42

    (@maybe42)

    Hi Roland,

    I have found the following workaround(s):
    I set up a new temporary column for the hull number (not read-only, of course), so now it will be imported in every row. After the import I simply copy the values over using SQL.
    That’s works around problem #1.

    For the private ID I generate one into my CSV file, because it gets generated itself using Java this is very simple. On that the duplicate check and skip works well.
    That works around problem #2.

    I will try your suggestion by removing the default value, this maybe would be easier than my solution.

    On the other hand I have set up a debugging environment using Netbeans with xdebug, but I have to find time to dig into it. It will take some time, sorry.

    brgds, Peter

Viewing 13 replies - 1 through 13 (of 13 total)
  • The topic ‘CSV import problems’ is closed to new replies.