Clicky

Houdah Software Forums
Sign up Latest Topics
 
 
 


Reply
  Author   Comment  
mlondon

Registered:
Posts: 2
 #1 
I used HoudahSpot to find text within PDFs.
For the most part this is working fine, and I'm able to search for a particular phrase successfully.

However I recently searched for a phrase in a PDF where I am 100% certain that phrase is in the PDF, but Houdah (and Spotlight) do not return a result.

Yes, the file is searchable, which I've confirmed by opening in Preview where I am able to fully search the PDF.

I removed the folder from Spotlight by adding to the Privacy Pane, and then forced a re-index by removing it from Privacy. I then confirmed that folder, and that particular file, had been reindexed by running the following command in Terminal. 

sudo fs_usage -w -f filesys mdworker | egrep "open"

But I still cannot search within that file!

Why wouldnt Spotlight/Houdah be able to search this file?
If there is a way to test and fix this, is it possible to run that fix across my entire folder (or drive) so that other PDFs which are, unbeknownst to me, also not being indexed properly?

Many thanks.


0
houdah

Moderator
Registered:
Posts: 3,048
 #2 
You can force Spotlight to index a file and see what information it got out of it:

  1. Open /Applications/Terminal.app
  2. Type or paste in the following command "mdimport -d 4 " (without the quotes, but with the trailing space)
  3. Drag the file from Finder or HoudahSpot into the Terminal window. This will append its path to the above command
  4. Press return / enter

Spotlight will find the appropriate importer plug-in and have it process the file. In Terminal you will see information on what it did and what metadata  it got out of the file. Scroll to the top to see which .mdimporter plug-in was used. The system includes a plug-in to process PDF files. That should have been used. If a third party plug-in was used, you may want to de-install that.

Towards the end of the output you should see kMDItemTextContent. This is the text the importer got out of the file and made available for indexing.

One reason I can imagine that could cause this to fail is some kind of copy protection or password on the file.

__________________
Houdah Software s. à r. l.
https://www.houdah.com

HoudahGeo: One-stop photo geocoding
HoudahSpot: Advanced file search utility
Tembo: Easy and effective file search
0
mlondon

Registered:
Posts: 2
 #3 
Thanks for quick reply.
I ran the command you suggested.
It used the same importer as when I ran my command earlier:

/System/Library/Spotlight/PDF.mdimporter

However, after I ran YOUR command, I can now search for my phrase!

This is very strange. As mentioned earlier, when I ran:

sudo fs_usage -w -f filesys mdworker | egrep "open"

on the enclosing folder, it listed all the files that mdworker had indexed, and this problematic file was listed there.

Why would it not have worked the first time and worked the second time?

Can I run:

mdimport -d 4

on an entire folder? 

Thanks again.
0
houdah

Moderator
Registered:
Posts: 3,048
 #4 
Yes, you can run mdimport on an entire folder. You may want to leave out the "-d 4" option. That is the maximum debug level and produces a lot of output. When importing a whole folder, you will most likely not be reading all the debug information.

When things like this happen - a file appearing in the index on second try - I generally doubt the integrity of the index and prefer rebuilding to from scratch rather than try patching folder by folder.

  1. Go to System Preferences > Spotlight > Privacy
  2. Add your startup disk – not just individual folders – to the list
  3. Wait a bit, to be sure the old Spotlight index is deleted
  4. Remove the startup disk from the Privacy list
  5. Leave the computer running overnight to rebuild the index

__________________
Houdah Software s. à r. l.
https://www.houdah.com

HoudahGeo: One-stop photo geocoding
HoudahSpot: Advanced file search utility
Tembo: Easy and effective file search
0
Previous Topic | Next Topic
Print
Reply

Quick Navigation:

Easily create a Forum Website with Website Toolbox.