Clicky

mark6400
Hello,

The Text Preview feature is potentially quite useful for scanning hundreds of research notes stored as text files; it would be very useful (meaning "I would happily purchase and recommend this software") with a few additional features:

- Support for other encodings: right now, my UTF-8 text files aren't handled as UTF-8, leaving text junk where I have diacritics, smart quotes, and such

- Font selection: font and size would help those who like to big fonts or monospace fonts.

- Phrase highlight style selection: perhaps bold, or underline, instead of (or in addition to) a change of color

Regards,
Mark

0 0
houdah
Hi!

I have been able to reproduce the problem with displaying UTF-8 files. I'll see to fix this soon.

Best,
Pierre Bernard
Houdah Software s.à r.l.

Houdah Software s. à r. l.
https://www.houdah.com

HoudahGeo: One-stop photo geocoding
HoudahSpot: Advanced file search utility
Tembo: Easy and effective file search
0 0
mark6400
Hello Pierre,

Thanks for looking into this so quickly! Since yesterday, I've found another area where UTF-8 is not supported: the search criteria.

Example: I have several files with the name "Varèse" (accent grave on the first E) in the title or contents. Searching for "Var" finds these files -- but when I expand the search to either "Vare" or "Varè", the files in question are not found. Note that Spotlight returns these files for both forms of the search, with and without accent ("Vare" and "Varè"). Needless to say, this is an important issue for someone who consistently uses UTF-8 (and writes about French composers). 

Thanks again for your attention to these issues.

Regards,
Mark
0 0
houdah
Hi Mark!

I saved your email to a text file and had Spotlight, the Finder and HoudahSpot search for it by the word Varèse. None of the three applications found the file.

Then I added the word Varese without accent to the file. Now the file is found despite searching for Varèse. It just seems like the accented cahracter did not even get indexed by Spotlight. In that event it can neither be searched for by Spotlight nor shown in text preview.

Could you please verify that your test files also contained the variant without accent?

What file format is the file you are looking for?

Best,
Pierre Bernard
Houdah Software s.à r.l.

Houdah Software s. à r. l.
https://www.houdah.com

HoudahGeo: One-stop photo geocoding
HoudahSpot: Advanced file search utility
Tembo: Easy and effective file search
0 0
mark6400
Hello Pierre,

I just confirmed that two files -- which are also text files (.txt) -- contain "Varèse" with accent grave, but not "Varese". The word is present both in the file name and in the contents. Spotlight returns both files on searches for both "Varèse" and "Varese", so it appears that Spotlight has indexed the accented word (though I know little about this indexing process).

I tried HoudahSpot search again, using "Varèse", and this time it found one of the two files! But then I opened the file from the results window, and it disappeared; subsequent searches couldn't find either of the two files.

I've attached the two files, for your reference. Interesting... one filename shows a Unicode burp, the other does not. Even more interesting... the filename with the burp (Varèse and the path...) is the one that HoudahSpot briefly found.

 Thanks again!

Regards,
Mark


0 0
houdah
Hi!

Spotlight searches the file name and contents. I guess it found the match only on the name.

Best,
Pierre Bernard
Houdah Software s.à r.l.

Houdah Software s. à r. l.
https://www.houdah.com

HoudahGeo: One-stop photo geocoding
HoudahSpot: Advanced file search utility
Tembo: Easy and effective file search
0 0
houdah
Hi!

The problem seems to run deeper than I thought. QuickLook also gets the accented characters wrong. That would mean that both the Spotlight and the QuickLook plug-ins for plain text files are broken.

For text preview I may be able to bypass the Spotlight importer. This could fix the problem in this particular instance.

Best,
Pierre Bernard
Houdah Software s.à r.l.
Houdah Software s. à r. l.
https://www.houdah.com

HoudahGeo: One-stop photo geocoding
HoudahSpot: Advanced file search utility
Tembo: Easy and effective file search
0 0
houdah
Hi Mark!

Upon further investigation, I conclude that the problem is witch whichever text editor created those UTF-8 files. In my case that was TextMate.

When I open the file using TextEdit, I also get garbage in place of the accented characters.

When I create a UTF-8 file using TextEdit, it is displayed correctly both in QuickLook and TextPreview.

Were you also using TextMate?

Best,
Pierre Bernard
Houdah Software s.à r.l.

Houdah Software s. à r. l.
https://www.houdah.com

HoudahGeo: One-stop photo geocoding
HoudahSpot: Advanced file search utility
Tembo: Easy and effective file search
0 0
mark6400
Hello Pierre,

I believe that I used BBEdit to make these files. I just made another test file with TextEdit, and it shows up properly in both Quicklook and Text Preview.

Sorry to take up your time with a user-side error! (That said, adjustable fonts for Text Preview would still be really helpful.) And now, off to find a way to clean up all of these spotty text files with bad UTF-8 implementations.

Regards,
Mark
0 0
houdah
Hi Mark!

I did some further testing. Now it seems that the bug is with RichText.mdimporter. At the command line both my test files are flagged as being UTF-8. One was created using TextEdit. The other using TextMate. My guess is that TextEdit stores additional info some place to tip off RichText.mdimporter about text encoding.

[CODE]$ file -I ~/Downloads/Varèse.txt
/Users/pierre/Downloads/Varèse.txt: text/plain; charset=utf-8
$ file -I ~/Desktop/V2.txt
/Users/pierre/Desktop/V2.txt: text/plain; charset=utf-8
[/CODE]

Your suggestions are duly noted.

Best,
Pierre Bernard
Houdah Software s.à r.l.

Houdah Software s. à r. l.
https://www.houdah.com

HoudahGeo: One-stop photo geocoding
HoudahSpot: Advanced file search utility
Tembo: Easy and effective file search
0 0
mark6400
Hello Pierre,

I did "convert" the problem files by opening them in TextEdit and saving them. This changes the document type from "Document" to "Plain Text". And now they all search and display properly.

I hope this means I'm not stuck using TextEdit forever...

Thanks for your help!

Regards,
Mark
0 0
mark6400
Hello Pierre,

I think the culprit may be the BOM (byte order mark). When I convert a file from from BOM to no BOM in BBEdit, I get garbage in TextEdit; the inverse happens when opening a file saved in TextEdit in BBEdit. Anyway, it's a big mess, but it looks like its cause is multiple text editors that are not all using the same form of UTF-8. Let this be a lesson to us who try to adhere to standards...

Regards,
Mark
0 0
houdah
Hi Mark!

Good to know. I was unable to determine what the difference was.

Best,
Pierre Bernard
Houdah Software s.à r.l.

Houdah Software s. à r. l.
https://www.houdah.com

HoudahGeo: One-stop photo geocoding
HoudahSpot: Advanced file search utility
Tembo: Easy and effective file search
0 0
TripleToe
I too love the text preview feature but there are two things that would really put it over the top:

1. Provide the option to have the preview be attached to the main window frame rather than have it just floating outside the main window, almost like a popup properties window does.  I would like to have it appears in a frame to the right of the 'results list' or perhaps beneath it.

2. Allow the addition of files that are valid text files but have different extensions.  For example, my .as (actionscript) and .mxml (flex) files are valid text files but the text preview will not work with them.  Also, it would be great if the highlighting of the word in the text preview was more visible. Perhaps surround it with a yellow highlight border like Safari does when you find words within a web page.

Thanks!
0 0
houdah
Hi!

Thank you for the feedback.

Could you try the following commands in Terminal.app while substituting actual files?

[CODE]mdimport -d 4 someFile.as[/CODE]
[CODE]mdimport -d 4 someFile.mflex[/CODE]

Thing is, I don't have the appropriate importers for those file types. Thus I cannot know if these importers are able to provide text content.

Please also try:

[CODE]mdls someFile.as[/CODE]
[CODE]mdls someFile.mflex[/CODE]

Best,
Pierre Bernard
Houdah Software s.à r.l.

Houdah Software s. à r. l.
https://www.houdah.com

HoudahGeo: One-stop photo geocoding
HoudahSpot: Advanced file search utility
Tembo: Easy and effective file search
0 0
houdah
Hi!

Thank you for the feedback.

Could you try the following commands in Terminal.app while substituting actual files?

[CODE]mdimport -d 4 someFile.asmdimport -d 4 someFile.mflex[/CODE]

Thing is, I don't have the appropriate importers for those file types. Thus I cannot know if these importers are able to provide text content.

Please also try:

[CODE]mdls someFile.asmdls someFile.mflex[/CODE]

Best,
Pierre Bernard
Houdah Software s.à r.l.

Houdah Software s. à r. l.
https://www.houdah.com

HoudahGeo: One-stop photo geocoding
HoudahSpot: Advanced file search utility
Tembo: Easy and effective file search
0 0
houdah
Hi!

Thank you for the feedback.

Could you try the following commands in Terminal.app while substituting actual files?

[CODE]mdimport -d 4 someFile.as
mdimport -d 4 someFile.mflex[/CODE]

Thing is, I don't have the appropriate importers for those file types. Thus I cannot know if these importers are able to provide text content.

Please also try:

[CODE]mdls someFile.asmdls someFile.mflex[/CODE]

Best,
Pierre Bernard
Houdah Software s.à r.l.

Houdah Software s. à r. l.
https://www.houdah.com

HoudahGeo: One-stop photo geocoding
HoudahSpot: Advanced file search utility
Tembo: Easy and effective file search
0 0
TripleToe
What am I looking for in the results of that command's output?  I'm getting lots of MDI information like this:
...
kMDItemContentCreationDate     = 2010-05-13 10:58:10 -0500
kMDItemContentModificationDate = 2010-05-13 10:58:10 -0500
kMDItemContentType             = "com.adobe.mxml"
kMDItemContentTypeTree         = (
    "com.adobe.mxml",
    "public.source-code",
    "public.plain-text",
    "public.text",
    "public.data",
    "public.item",
    "public.content"
)
...

Is that what you are looking to find?
0 0
houdah
Hi!

What I am looking for are the values for kMDItemContentTypeTree and kMDItemTextContent.

The content type tree tells me what to look for when trying to identify the files. The text content is what is shown in the Text Preview window.

Best,
Pierre Bernard
Houdah Software s.à r.l.

Houdah Software s. à r. l.
https://www.houdah.com

HoudahGeo: One-stop photo geocoding
HoudahSpot: Advanced file search utility
Tembo: Easy and effective file search
0 0
TripleToe
I do not see an entry for "kMDItemTextContent" in the generated output.  There are several other entries that do appear and I've include the ones that appear relevant below.   Also, when I was setting up my mac so I could do quicklook previews on these types of files, I recall that their is some sort of a naming conflict where .as files are listed on the mac as 'applesingle-archive' types rather than actionscript files.  (see http://tekkie.flashbit.net/flash/as/enable-quick-look-of-actionscript-and-flex-files-on-snow-leopard)

Regardless, perhaps this information will help:

For .mxml files:
kMDItemContentType             = "com.adobe.mxml"
kMDItemContentTypeTree =     (
        "com.adobe.mxml",
        "public.source-code",
        "public.plain-text",
        "public.text",
        "public.data",
        "public.item",
        "public.content"
    ;
kMDItemKind =     {
        "" = "Adobe MXML Document";
    };


For .as (actionscript) files:
kMDItemContentType             = "com.apple.applesingle-archive"
kMDItemContentTypeTree =     (
        "com.apple.applesingle-archive",
        "public.data",
        "public.item",
        "public.archive"
    ;
kMDItemKind =     {
        "" = PlainTextType;
        en = "Plain Text File";
        ja = "\U6a19\U6e96\U30c6\U30ad\U30b9\U30c8";
        nl = "Tekstbestand zonder opmaak";
    };


0 0
houdah
Hi!

The lack of kMDItemTextContent means that the Spotlight importer in charge of the file does not provide text content. Thus you will not be able to search these files by text content.

Currently Text Preview also relies on kMDItemTextContent to peek at the file's contents. Seeing that the file is of type public.plain-text, I may however bypass Spotlight and give you a direct view of the file's contents.

The output for .as files shows that you system considers them to be applesingle archives. Thus they are not correctly imported and can neither be searched by Spotlight or displayed by HoudahSpot's Text Preview.

Best,
Pierre Bernard
Houdah Software s.à r.l.


Houdah Software s. à r. l.
https://www.houdah.com

HoudahGeo: One-stop photo geocoding
HoudahSpot: Advanced file search utility
Tembo: Easy and effective file search
0 0
TripleToe
Actually, I've tweaked the MDI settings so that .as files do indeed get indexed by spotlight.  I know that is not the case on every user's system since they would have to follow the same steps to get that working, but since those are indexed on my system, can I also preview them in the text preview?
0 0
houdah
Hi!

Could you provide the full output from mdimport now that you have tweaked the settings?

Best,
Pierre Bernard
Houdah Software s.à r.l.

Houdah Software s. à r. l.
https://www.houdah.com

HoudahGeo: One-stop photo geocoding
HoudahSpot: Advanced file search utility
Tembo: Easy and effective file search
0 0
abbe
Hi, 

While the textpreview is a very nice feature (basically, this bring houdahspot on par with the otherwise excellent foxtrot desktop search engine, but without separate indexing overhead), I am wondering why we see only text in this window. Most of my searches regard PDF-files with text overlay. When I use text-preview I get to see unformatted text of my PDF files, rendering the preview hard to read at a glance. 

And that is the point, I would think.

Is there any way to let the text preview render the image of the PDF (or other file types) so we end up with a Quicklook-like preview of the file, but with search terms highlighted throughout the text?

That would be an incredible value-add to the application.

Looking forward to hearing from you,

Abbe


0 0
houdah
Hi!

I agree that a combination of QuickLook and Text Preview would be ideal.

Unfortunately that is hardly feasible. While Apple shipped QuickLook plug-ins for many standard file formats, third party vendors are in charge of shipping plug-ins for proprietary formats.
I don't think we could ever talk a third party into shipping plug-ins targeted exclusively at HoudahSpot.

Thus such a preview feature would be limited to a select few file formats like PDF.
Previewing PDFs in this way would probably require the development of a full-featured PDF viewer.

Best,
Pierre Bernard
Houdah Software s.à r.l.
 

Houdah Software s. à r. l.
https://www.houdah.com

HoudahGeo: One-stop photo geocoding
HoudahSpot: Advanced file search utility
Tembo: Easy and effective file search
0 0