amazon-web-services,amazon-s3,cloud , How to extract files from a zip archive in S3

How to extract files from a zip archive in S3


Tag: amazon-web-services,amazon-s3,cloud

I have a zip archive uploaded in S3 in a certain location (say /foo/ I would like to extract the values within and place it under /foo without downloading or re-uploading the extracted files. How can I do this, so that S3 is treated pretty much like a file system


S3 isn't really designed to allow this; normally you would have to download the file, process it and upload the extracted files.

However, there may be a few options:

  1. You could mount the S3 bucket as a local filesystem using s3fs and FUSE (see article and github site). This still requires the files to be downloaded and uploaded, but it hides these operations away behind a filesystem interface.

  2. If your main concern is to avoid downloading data out of AWS to your local machine, then of course you could download the data onto a remote EC2 instance and do the work there, with or without s3fs. This keeps the data within Amazon data centers.

  3. You may be able to perform remote operations on the files, without downloading them onto your local machine, using AWS Lambda.

You would need to create, package and upload a small program written in node.js to access, decompress and upload the files. This processing will take place on AWS infrastructure behind the scenes, so you won't need to download any files to your own machine. See the FAQs.

Finally, you need to find a way to trigger this code - typically, in Lambda, this would be triggered automatically by upload of the zip file to S3. If the file is already there, you may need to trigger it manually, via the invoke-async command provided by the AWS API. See the AWS Lambda walkthroughs and API docs.

However, this is quite an elaborate way of avoiding downloads, and probably only worth it if you need to process large numbers of zip files! Note also that Lambda functions are limited to 60 seconds maximum duration (default timeout is 3 seconds), so may run out of time if your files are extremely large.


Configure Dockerfile to set AWS configurations

I've just started looking at Docker. I have a node app that resizes and image and then sends an SQS message to aws when finished. I have managed to create a docker image of my app, copying it from my local machine, but run into the issue that I can't...

Storing user submitted images

I'm building a node application in which users can submit images to customize their profile. I'm wondering what the best way would be to store these images? Is something like Amazon S3 the way to go? What about CloudFront, can this accept user submitted images? Sorry if this question is...

AWS RDS on Eclipse

I know this question seems a repeat, but it's not and I have tried all the solutions I could find PROBLEM: I am running an AWS RDS instance for the database. It works fine when I connect to it using AWS EC2 instance(uses linux) but when I try it with...

jets3t cannot upload file to s3

I'm trying to upload files from local to s3 using hadoop fs and jets3t, but I'm getting the following error Caused by: java.util.concurrent.ExecutionException: org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. HEAD '/project%2Ftest%2Fsome_event%2Fdt%3D2015-06-17%2FsomeFile' on Host '' @ 'Thu, 18 Jun 2015 23:33:01 GMT' -- ResponseCode: 404, ResponseStatus: Not Found, RequestId: AVDFJKLDFJ3242, HostId: D+sdfjlakdsadf\asdfkpagjafdjsafdj I'm...

Why is this python boto S3 multipart upload code not working?

I am trying to upload a 10 GB file to AWS S3, and someone said to use S3 Multipart Upload, so I stumbled upon someone's github gist: import os import sys import glob import subprocess import contextlib import functools import multiprocessing from multiprocessing.pool import IMapIterator from optparse import OptionParser from...

Installing Python 3 Docker Ubuntu error command 'x86_64-linux-gnu-gcc

I'm trying to create a dockerfile that uses Python 3. FROM ubuntu:14.04 RUN apt-get update RUN apt-get install -y python3 python3-dev python-pip RUN apt-get install -y libxml2-dev libxslt1-dev libpq-dev libjpeg-dev libfreetype6-dev zlib1g-dev RUN cd /var/projects/apps && pip install -r requirements.txt I get the error fatal error: Python.h: No such file...

Which is a better way: retrieve images from AWS S3 or download it and store locally in a temp folder to be displayed?

Problem: Retrieve image from S3 and load into UIButton. I'm currently doing my research on this issue and can't seem to make up my mind. Which is a better way to do it in terms of performance and security issue? Also, do I need to do caching or store these...

Why are the object values getting pushed into the array 3 times?

I have a simple object array into which I am pushing an object with 2 fields: bucketName and Date. The problem is that the values are getting pushed thrice into the array. Please help me. JS: sortBucket: function(bucketList) { var counter, j = 0; var str = "aws-billing-csv"; console.log("Bucket List...

AWS Kinesis - data source on a third party server

New to AWS Kinesis. We're trying to evaluate whether it makes sense or even possible to place events captured in a log file which is located on a third party server into AWS Kinesis stream, given that we only have a VPN access to this server where the log file...

Polling Continuously from a SQS queue on AWS

I have a java class that connects to an SQS queue and I would like it to respond to messages that are sent to the SQS queue. Is that possible without running the java class continuously, sending receiveMessageRequests?

Why does my image url from Amazon S3 have AWSaccesskey and expiration even though I made the bucket public?

Here is the policy I added: { "Version": "2012-10-17", "Statement": [ { "Sid": "MakeItPublic", "Effect": "Allow", "Principal": "*", "Action": "s3:GetObject", "Resource": "arn:aws:s3:::bucketname/*" } ] } I created an IAM and attached the AmazonS3FullAccess to that IAM. In my rails app, I display the pictures like this: - @pictures.each do |picture|...

ajax GET request times out for URL when browser and CURL work

I see one similar question but it does not have an accepted response. The following ajax request times out. But GET request on the same URL using browser or curl work fine. Note this is a cross domain AJAX since the code sits on a different server and URL is...

Using Java web service on Amazon cloud

I want to make a web service in java which will take arguments and do processing and return a json response. I am not been able to figure out how to deploy this service on amazon ie (on ec2 or some where else) . what will be the url to...

AWS Beanstalk - Passenger Standalone not serving web pages after Rails 4.2.1 migration

My Rails 3.2.21 app was running fine on AWS Beanstalk under Passenger Standalone 4.0.53. I migrated the app to Rails 4.2.1 and got it passing all tests on my local development machine (Ubuntu, WEBrick). I deployed it to Beanstalk (aws.push), the deploy succeeds (copied from /ondeck to /current) and: nothing....

How can I know the database url of AWS EC2 MySQL?

I would like to import gtfs files into mysql by using a tool from github, runing the follwoing command: gtfsdb-load --database_url <db url> <gtfs file | url> How can I get the database_url of mysql located in AWS EC2?...

Eclipse not compiling because of ClassNotFoundException

After following an AWS tutorial for Eclipse, my code no longer compiles and runs. I decided to undo what the tutorial told me, so I may have changed some settings that I forgot to unchanged but I really cannot find the root of my problem. Eclipse seems to be back...

“undefined method 'value'” when looping

I am trying to loop through a list of tags returned from AWS API, but I'm getting "undefined method 'value'. I can provide further information if needed. This is my simple loop: @instances.each do |i| t = 0 while i.tags.any? do puts i.tags[t].value t += 1 end end ...

Secure file upload directly to s3 or server to s3 (from iOS app) [closed]

I need to upload sensitive images to s3 from an iOS app. I'm wondering which option is better: Upload to my server first, then upload to s3. Upload to s3 directly, then upload metadata to my server. ...

how to use AWS cognito with custom authentication to create temporary s3 upload security token

So I'm a bit confused by the Amazon documentation on Cognito concerning one of their stated use cases: "use your own identity system... allowing your apps to save data to the AWS cloud". In my case I want to give them aws tokens to upload directly to s3 from the...

cloudsearch query to boost exact match on range

In a cloudsearch structured query. I have a couple of fields I am searching on. On field one, the user selects "2" On field two the user selects "1" I am wanting to run this as a range query, so that the results that are returned are -1 to +1...

How to find Unused Security Groups of all AWS Security Groups?

How to find all the used security groups attached with all the aws resources using Boto? Currently the following script which is giving only ec2 instances- sec_grps = ec2_conn.get_all_security_groups() for group in sec_grps: print group, " Instances attached ", group.instances() Is there any way to get all security groups which...

Keep config file secure using github and Elastic Beanstalk?

I am using github (public) to keep track of my web app and about to deploy it to Elastic Beanstalk. Is there a good way to keep my config file secure which has RDS username/password? I have to add the file to git in order to push it to Elastic...

How do I SSH into EC2 with .pub?

When I create a new Elastic Beanstalk environment it asked me if wanted to create a new keypair. I say yes, and it created two file in my .ssh folder locally called app and Normally to ssh into an instance I use a app.pem file. i.e ssh -i app.pem...

eb cli 3.0 is not putting my settings from my existing environment after connecting it

I created a customized Elastic Beanstalk environment from the web interface with configuration for VPC and other things. I now have a local repo that I want to connect to this created environment. I ran eb init and was able to spot my environment and selected it. I then ran...

AWS Elastic beanstalk scale triggering

I set the following parameters in my elastic beanstalk environment: Do you think this settings are reasonable? I didn't understand the breach duration parameter. What does it means? is 5 minutes is reasonable? Thanks...

AWS Beanstalk autoscale user files

I have setup AWS Beanstalk instance where a server app is deployed. In the backend users can change files in images/ directory. But when autoscaling the instances, the user files are not mirrored. How to solve this requirement? Can I setup AWS Ec2 to create new AMI each night based...

How to route traffic by proximity from Route 53 to closest NGINX server?

I'm trying to set up a web server stack in the following way: Use Route 53 for my DNS Serve static content with Cloudflare from S3 buckets Route API calls to nearest NGINX server that sits in front of some Nodejs servers. So all static content is done easily enough...

Loop through list of AWS-instances shows only first item

I am working on a simple customer frontend for AWS. I want a list of all the users machines for start/stopping the EC2s. While the logic works I can only show the first of the machines in my view. I guess it's related to the AWS APIs pageable response format,...

what is the nodejs package for s3 image upload

I'm looking to upload my assets to s3. Is there any package in Nodejs like carrierwave in rails I want to resize images with versions. I have come across papercut. What is the best node module for s3 image upload....

ArgumentError - unknown SSL method `TLSv1_2'

I am trying to move my AWS integration over TLS instead of SSLv3, but I'm receiving an error when trying to set the config.fog_credentials as another SO post has suggested, but I am receiving the ArgumentError above (unknown SSL method 'TLSv1_2'. I am open to a different solution to move...

Use Reserved instance and autoscaling group

I would like to know if it would possible to create an architecture with both reserved instance (RI) and auto-scaling group to serve web pages. The idea would be to have one RI serving 24/7 and launching on demand instances in an auto-scaling group when the CPU of the RI...

DynamoDB Conditional Check Fail Monitoring

I have used dynamodb-session to set DynamoDB for Session state provider. In my ASP.NET_SessionState table in DynamoDb There is "Conditional CheckFailed " monitoring. My question is: In what condition these exceptions happen and how can I reduce them?...

How to transfer files from iPhone to EC2 instance or EBS?

I am trying to create an iOS app, which will transfer the files from an iPhone to a server, process them there, and return the result to the app instantly. I have noticed that AWS offers an SDK to transfer files from iOS app to S3, but not to EC2...

How to turn an s3 object string into something useful when using laravel 5.1 filesystem

I'm at a loss. I'm trying to display an object (image.jpg) I successfully have uploaded to my s3 bucket. I have made sure the file is set to public. I use the Storage::get(); method which the doc says "returns a string of the object". See here: The get method may...

Image Upload Strategy with Clusters And Amazon S3

Trying to sort out a strategy to deal with uploaded images whose endpoint is Amazon S3. The goal is, upon upload, that the image is immediately visible. However, the current way of handling the situation is that the end-user uploads the image and then has to wait for it to...

heroku pgbackups:url command is no longer working?

How do I download my dump directly from Amazon AWS S3 if heroku pgbackups:url b004 isn't working? Specifically, when I run this command it returns: ! Please add the pgbackups addon first via: ! heroku addons:add pgbackups And then when I run this command I get: ! No such add-on...

Amazon DynamoDB Mapper - limits to batch operations

I am trying to write a huge number of records into a dynamoDB and I would like to know what is the correct way of doing that. Currently, I am using the DynamoDBMapper to do the job in a one batchWrite operation but after reading the documentation, I am not...

How to set a variable using dynamic inventory using Ansible

I am looking for method to set a variable in ansible playbook using inventory information received from dynamic inventory. For example if we have a sample playbook like --- - hosts: localhost connection: local tasks: - set_fact: rds_hostname="{{ rds_mysql }}" #set rds endpoint from - debug: var=rds_hostname I am...

Xcode + AWS Integration Apple Mach-O Linker Error

I have a very simple Xcode project that I started and am now trying to integrate the Amazon Web Service (AWS) SDK into my project. I followed the instructions posted on their instruction page and everything looks good EXCEPT that I've got the following output... duplicate symbol _OBJC_CLASS_$_XMLDictionaryParser in: /Volumes/Macintosh...

How to configure aws CLI to s3 cp with anonymous user

I need to download files recursively from a s3 bucket. The s3 bucket lets anonymous access. How to list files and download them without providing AWS Access Key using an anonymous user? My command is: aws s3 cp s3://[email protected]/pavlo/text/tiny/rankings/uservisits uservisit --region us-east --recursive The aws compains that: Unable to locate...

Amazon EC2 Storage lacks

I have launched Amazon EC2 instance of "m3.large" type. According to this page, m3.large should have 2vCPUs, 7.5GiB Memory and 1x32GB SSD Storage. But df -ah returns following results. It seems that the instance lacks the volume. Filesystem Size Used Avail Use% Mounted on /dev/xvda1 7.9G 797M 6.7G 11% /...

How to change the IP address of Amazon EC2 instance using boto library

How can I assign a new IP address (or Elastic IP) to an already existing AWS EC2 instance using boto library.

How to limit access in Amazon S3 files to specific people?

I work on a SaaS application where Creators can create Groups and invite others to their Group to share files, chat and so on. Only people within specific group should have access to this group's files. People from other group must not have access to not their group's files. And...

Deleting Data from DynamoDb Table automatically

Is there any kind of life retention period concept in DynamoDB. I mean is there any way such that data inside a table will be deleted after some time like we can set some retention period in S3. Thanks,...