[Resolved] Problem with bulk import
Dear Mr. Heller,
I have another question: when I upload a CSV file to my collection, even though the plugin says that every entry of the file has been accepted (let’s say, 100 from a file containing 100 entries) the number of items actually present in the site and searchable is smaller (50 out of 100). Do you have any idea what could be causing this? I don’t think it is a problem with the format of the CSV, since it displays no error; and I have no way to know which of the entries have actually been accepted (therefore the clogging of the collection of which in my other topic…).
I don’t know it it could be related, but if I export the whole collection present in the site as a CSV file, the barcodes are sometimes quoted, sometimes not. This causes some entries to be bundled together in a single line in the file: subsequent entries starting with quoted barcodes appear as a single cell in the CSV file, while every entry starting with unquoted barcode has a separate line. Entries appearing together (that is, entries that have quoted barcodes) appear normally when searched.
Hmmm.. Barcodes should not need to be quoted, since they should only be ASCII letters and digits. If they are being quoted, that suggests that they have ‘strange’ characters in them, which is an error. Try the ‘Fix Broken Barcodes’ button on the collection page.
About the 100 items showing up as 50 items — there might be duplicate barcodes and the later elements are *replacing* the earlier elements. That is, your 100 items have only 50 *unique* barcodes, so only 50 new items are added.
I did not input the barcodes, they are being automatically assigned by the plug-in so I do not think there are strange characters in them – trying to ‘fix’ them show no results. But, I think this could just be a matter of the visualization of the file: I had this quote/unquote matter with opencalc, opening the csv file with microsoft office doesn’t show this error. Also, viewing the single item in the plugin shows the barcode without quotes.
And about the 100/50, could using the automatically generated barcodes still do this? If so, I could try just assigning manually the barcodes and see what happens.
The same thing also happened with small numbers of file – uploading a csv file with 7 entries, 7 entries succesfully inserted, but not one of them showing if searched for.
I thank you for your assistance, I am using your plugin for a library of about 150k entries so I am trying to get the bulk adding process right…
The 50/100 problem should not occur with all auto-generated barcodes. *But* if you are uploading duplicate items, it is possible that the search results but not list duplicate entries (or not *obviously* list them), so it is possibly a weirdness with the listing somehow.
You *probably* should have tested things with a *small* CSV file (say 10-20 items) and cleared out the database before each new attempt. That would have avoided your current ‘mess’. Once you got the CSV file formatted figured out and the procedures for correctly creating it, you would be able to move on to production mode and started uploading your 150K items.
Yes, that I should have. I manually deleted all the database (I luckily did not upload the whole 150k, but “just” about 15k), and started trying again.
A peculiar thing that is now happening is: I am uploading a 250 entry CVS file, and if I pick the “use CSV header” (as I should, since the file includes the header line), the plugin claims to have successfully inserted the whole 250, but only 146 are present – none of the items in this list are duplicates.
If instead I try to upload the same file (after deleting the collection) unticking the CSV header, the whole 250 are claimed inserted, but only 45 show up: what happens in this case is that after the 46th entry the plugins always uses the same barcode – I can see this since in this case the plugin lists the barcodes that he assigns. What happens in this case is, since the CSV header is deactivated, the plugin takes as barcode the first column that actually contains the authors’ names – since they contain spaces and other weird characters the plugin uses the first good entry as model and adjusts it for the other entries: in this case, the word ‘author’ that should be the first word of the header is taken as model, so the subsequent barcodes become ‘auth[letters and numbers]’, but as stated after 45 it gets stuck on ‘authpa.’, using it for every other entry, effectively preventing their insertion.
Now, I understand that this happens because of the csv ticking – my question is, is this normal? Or – can my file be formatted in such a way to induce these mistakes in the barcode assigning process? If so, this could also be the cause of the 146/250 mistake.
thanking you again for your attention,
If instead I try to upload the same file (after deleting the collection) unticking the CSV header, the whole 250 are claimed inserted, but only 45 show up: what …
OK, since the first column is not really bar codes and you do in fact have column headers, deselecting ‘use column headers’ is going do ‘strange’ things.
Now, I understand that this happens because of the csv ticking – my question is, is this normal? …
It is possible that there is something odd happening here. Note that barcodes are limited to 16 ‘digits’ and auto generated barcode *should be* 16 digits in base 36 (0-9a-z), but if somehow it somehow gets ‘stuck’, there could be problems. And really that would be a bug that I would have to look into (and will sometime soon).
It is possible that with your repeated uploads and deletes, that there is something about the database that is confused. It might make sense to completely *drop* the collection table and have the plugin re-create it from scratch and thus start from a ‘clean slate’.
I see – how should I proceed to make the plugin create a new collection table?
You would need to enter a SQL statement like:
drop table wp_weblib_collection;
using a plugin or tool that allows you to directly enter database statements.
Then log into the dashboard as an adminstrator and go to the WebLibrarian settings page (Settings->Web Librarian). The turn ON “Debug Database:”, then click on the “Make Database” button. And then turn OFF “Debug Database:”.
The “SQL Executioner” plugin should let you do what you need to do.
After the latest attempts, I think the barcode issue has something to do with the whole question. The feeling I get is that even after deleting the files, the barcodes those files used remain “occupied” for some time, therefore preventing adding new items.
I describe my latest steps:
1) I add a 50 entry CSV file
2) it says 50 entries are added, but only 35 items are actually in the collection
3) I delete the collection
3) I take 10 of the not loaded item from the original CSV, and create a new file containing only them – no additional changes are made to the text
4) I add this 10 entry CSV file, and it says 10 entries added, and the 10 items are in the collection
5) I delete the collection
6) I add again the original 50 entry CSV
7) this times it actually adds only 25 items, and I notice that the barcodes are following from those used in the 10 items: the last of the 10 had, let’s say, 00000B, and the first of the newly added 25 has 00000C. The way I see it, the 10 barcodes are considered still in use by the (now deleted) 10 items, and therefore the new entries that would have taken those values are not added
8) I delete everything again
9) I add the 50, gaining 35 as initially, and then try to add once again the 10 entry CSV – this time the 10 are not inserted.
So my question is – how are the automatically generated codes generated? Could they depend from some value internal to the entry – that is, two entries should both get the 0000B code, and only the first one is inserted?
What I unluckily did not pay attention to was, whether the 25 added in step 7 were the first in line of the 35 added in step 1, or if they were the entries from 11 to 35 – that is, if the site actually skipped the first 10 entries since they would have gotten the barcode considered still in use by the 10 added in step 4. I would try again right away, but now I can’t seem to manage to add anything anymore to the collection! So yes, I think I managed to break it – starting afresh could be of some use 🙂
I apologise for the lenght of the post, but I tried to describe everything as clearly as possible
I thank you for your assistance in the SQL matter – before cleaning the table, I will await your comment on my latest post
I think what is happening is that somehow the barcode ‘counter’ is getting stuck somehow. Completely deleting the table and then re-creating it should cure this.
The code generates bar codes by finding the ‘highest value’ bar code and ‘incrementing’ it. Somehow either it is not finding the truely highest barcode or failing to properly increment it. This is something I have to look into deeply.
A long post, giving all of the details is always good — it helps with debugging the code. Often people post a ‘short’ message without any details and most of the time my ‘crystal ball’ does not provide the answers I need…
OK, I found an actual bug: the bulk upload code was using some old (broken) code for generating new (unique) barcodes. The new release, 3.2.10, fixes the problem.
I really thank you for your effort and dedication, I think we are getting really close to the solution!
I first used the SQL command (but I have doubts on this step – after activating it, the resulting message was that 0 rows were affected) and then followed your other instructions about the database.
Then, I uploaded to the new version of the plugin.
I uploaded a 50-entries CSV: message says 50 are inserted, and 50 items are actually in the collection. This is nice, and a first during my attempts.
I deleted the collection, and inserted a 250-entries CSV: message says 250 inserted, but 200 are in the collection. Once again, the first barcode is following from the last of the deleted ones: in this case, the first of the 200 is 000000000000001F.
The options that I’m now seeing are:
– it’s a matter of time. The server needs more time to react to the changes, so adding/deleting/adding again makes it gasps for air, resulting in mistakes
– I did not correctly wipe out the old database (maybe the SQL command need to be different? Or, applied in a different moment – I applied it before turning on the debug database options and the rest), resulting in the old options to linger
– maybe some small error is still hiding in the code? And I do not mean this as an accusation in any way, I am truly grateful for your fast and thorough assistance
Another thing that I’m going to try in the morning is adding manual barcodes and see how it goes.
Dear Mr. Heller,
I believe all is solved, thanks to your new release.
The issues in my last post were due to the fact that when doing operations one after another the “url” of the last operation remained in the address bar, thus stopping the next operation since it would have too long an address bar: the solution is either to open a new tab, or to manually delete the url.
Both automatic and manual barcodes now work correctly, and I managed to import 200.000 items losing only 800 – but I think this is due to something in the input file, maybe items containing tabs in some of the fields (and – this also leads to a new thread of mine…).
Really thanking you for your fast assistance,
- The topic ‘[Resolved] Problem with bulk import’ is closed to new replies.