FAQ Database Discussion Community
tika
I am writing a class to recursively extract files from inside a zip file and produce them to a Kafka queue for further processing. My intent is to be able to extract files from multiple levels of zip. The code below is my implementation of the tika ContainerExtractor to do...
ruby,popen,tika,apache-tika
I'm using the Tika jar to extract metadata from Microsoft Word doc files but in the case Tika encounters a problem my rescue is not catching the error, instead the scripts exits. I'm on windows 7 with MRI Ruby 1.9.3 I could adapt the doc file but I want to...
java,row,talend,tika,filelist
I'm trying to use tTikaExtractor component to extract the content of several files in a folder. It is working with a single file but when I add a tFileList component, I don't understand how to get the content of the 2 different files. I think it is something related to...
tika,apache-tika
Is there a way to extract content from a file with a Tika server without explicitly defining the header? For example for a specific file named "file.pdf" if I do curl -X PUT --data-binary @file.pdf localhost:9998/tika --header "Content-type: application/pdf" > file.txt I get the extracted content in "file.txt" but if...
php,tika
I need to make an internal website which allows users to upload .doc, .pdf, .xls files and see the text in a textarea box. I have created the site in PHP to the point where a user can upload the files. I have installed Tika on my server and...
java,apache,metadata,key-value,tika
I'm trying to get the Metadata Values from an Office Document and all it shows as key-value pair is this one: Content-Type: application/zip I just can't tell the issue in this one. Why does it only show the Content-Type? What i'm interested in are Keys like title. import java.io.FileInputStream; import...
mysql,solr,tika
Solr, more specifically Tika, is having some problems finding my file whose filepath is retrieved from a database. Whenever I go to index it logs errors saying that this can't find the file. I'm basically doing what this guy is doing here, which is taking a file path from a...
java,jax-rs,resteasy,tika
I want to PUT, via binary, to an endpoint that can consume one of many possible mimetypes. Specifically, I am communicating with an Apache Tika server, which could take, say, a PDF or a Word .docx file. I've set up a client proxy interface that I can hardcode, say, the...
solr,solrnet,tika,solr-cell
I am doing a POC to index pdf and word documents using solr search engine. I tried to search about detailed level information or articles but did not get\found any detailed article to do it. What I found is to use some solr package provided example. That is not I...
java,tika,apache-tika
I am using tika with Java for crawling program. I have used BSF_Recursive for that. After some results, it shows me this... http://www.google.com Exception in thread "main" java.io.IOException: Server returned HTTP response code: 403 for URL: http://translate.google.com/ at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source) ...
java,tika,apache-tika
I'm using the Tika facade, per the example of the elasticsearch-mappper-attachment plugin. Here's my test code: Tika tika = new Tika(); Metadata md = new Metadata(); try { String content = tika.parseToString(src, md, 100000); System.out.println("Content length: " + content.length()); for (String s: md.names()) { System.out.println(s + ": " + md.get(s));...
apache,tomcat,war,tika
How to deploy tika-server as WAR file, under a servlet container Tomcat? I prefer to deploy without using in maven.