I managed to run filecrush for the first time and after everything seemed to finish successfully I got this error. In fact although it reported loads of files to crush it did not crush any...
Exception in thread "main" java.io.IOException: not a gzip file
at org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.processBasicHeader(BuiltInGzipDecompressor.java:496)
at org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.executeHeaderState(BuiltInGzipDecompressor.java:257)
at org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.decompress(BuiltInGzipDecompressor.java:186)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:91)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:72)
at java.io.DataInputStream.readByte(DataInputStream.java:265)
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2281)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2304)
at com.m6d.filecrush.crush.Crush.cloneOutput(Crush.java:769)
at com.m6d.filecrush.crush.Crush.run(Crush.java:666)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.m6d.filecrush.crush.Crush.main(Crush.java:1330)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
My command line
hadoop jar ./filecrush-2.2.2-SNAPSHOT.jar com.m6d.filecrush.crush.Crush --info --clone --verbose --compress gzip --input-format text --output-format text /user/camus/tests/topics/ /user/camus/tests/topics_orig/ 20101121121212
Why does it say "SequenceFile"? I have gzipped json (ie text). Soon to be snappy json
I managed to run filecrush for the first time and after everything seemed to finish successfully I got this error. In fact although it reported loads of files to crush it did not crush any...
Exception in thread "main" java.io.IOException: not a gzip file
at org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.processBasicHeader(BuiltInGzipDecompressor.java:496)
at org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.executeHeaderState(BuiltInGzipDecompressor.java:257)
at org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.decompress(BuiltInGzipDecompressor.java:186)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:91)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:72)
at java.io.DataInputStream.readByte(DataInputStream.java:265)
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2281)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2304)
at com.m6d.filecrush.crush.Crush.cloneOutput(Crush.java:769)
at com.m6d.filecrush.crush.Crush.run(Crush.java:666)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.m6d.filecrush.crush.Crush.main(Crush.java:1330)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
My command line
hadoop jar ./filecrush-2.2.2-SNAPSHOT.jar com.m6d.filecrush.crush.Crush --info --clone --verbose --compress gzip --input-format text --output-format text /user/camus/tests/topics/ /user/camus/tests/topics_orig/ 20101121121212
Why does it say "SequenceFile"? I have gzipped json (ie text). Soon to be snappy json