google-app-engine,bigdata,google-bigquery , Designing an API on top of BigQuery


Designing an API on top of BigQuery

Question:

Tag: google-app-engine,bigdata,google-bigquery

I have an AppEngine app that tracks user various sorts of impression data across several websites. Currently we're gathering roughly 40 million records a month and the main BigQuery table is closing in on 15Gb in size after 6 weeks of gathering data and our estimates show that within 6 more weeks, we will be gathering over 100 million records a month. A relatively small dataset in terms of bigdata, but with potential to grow quite a bit quite fast.

Now faced with a successful trial we need to work on an API that sits on top of BigQuery that allows us to analyze the data and deliver the results to a dashboard provided by us.

My concern here is that most of the data being analyzed by the customer spans only a few days at most (per request) and since BigQuery queries are in fact full table scans, the API may in time become slower to respond as the table grows in size and BQ needs to process more data in order to return the results.

My question is therefore this. Should we shard the BigQuery log tables, for instance by month or week, in order to reduce data that needs processing, or would it be "wiser" to pre-process the data and store the results in the NDB datastore? This would result in a blazingly fast API, but requires us to pre-process everything, even things some customers may never need.

Or am I perhaps optimizing prematurely?


Answer:

Based on my experience analyzing performance of similar projects in BigQuery. If you are concerned with performance only, then you don't have to change anything. BigQuery's optimizer can figure out many things, and if query uses WHERE against only few days - the performance will be good. But from billing point of view, you will be paying more and more as your data grows, so in order to save money - it is wise to shard data by month or even by week. With TABLE_RANGE you still will be able to query all the data if you will need it, so you don't lose any functionality.


Related:


Sending mail by Unauthorised sender in Google AppEngine


python,google-app-engine,email,sendmail
I've got a Google AppEngine Python application connected with my domain. I want to be able to send emails from any email, like: [email protected] So I use sendmail() and set mailobject.sender = "[email protected]", but it does not work. Also I made a receive function, but I don't want to receive...

Hive(Bigdata)- difference between bucketing and indexing


hadoop,mapreduce,hive,bigdata
What is the main difference between bucketing and indexing of a table in Hive?

Getting gcloud to work in Cygwin Windows


python,windows,google-app-engine,cygwin
I am fairly new to programming. I have started app development on google app engine. I am trying to get the Google Cloud SDK to work with Cygwin 32 on Windows. I first ran the curl command to download the Google SDK files. Then I ran the install.py script and...

jQuery ajax() success data - retrieving object results from Python server


jquery,python,ajax,google-app-engine,ajaxform
I am trying to display success data after a jQuery ajax function from a Python server (GAE). I am able to make it work with just a single string or number as the success data, but I would like to relay several pieces of info from the server in the...

jquery google app engine


jquery,google-app-engine
I have an issue where the Javascript file that I have uploaded does not work. My code is below. app.yaml has: - url: /js static_dir: /js index.html has: <script type="text/javascript" src="/js/script.js"></script> script.js has: $(document).ready(function(){ $('img').click(function(){ $(this).fadeOut('slow'); }); $('p').click(function(){ $(this).fadeOut('slow'); }); alert("hi"); }); and not even the alert comes out...

Is it possible to use Google Cloud Endpoints built in authentication with Google+ Domains API?


google-app-engine,authentication,google-plus,google-cloud-endpoints,google-plus-domains
Google Cloud Endpoints has it's own authentication process in which the backend endpoint method is simply passed a com.google.appengine.api.users.User object. https://cloud.google.com/appengine/docs/java/endpoints/auth The Google+ Domains API specifies its own authentication process in order to get the com.google.api.client.auth.oauth2.Credential object. This allows for the building of the com.google.api.services.plusDomains.PlusDomain object. https://developers.google.com/+/domains/authentication/ How would you...

Google-App-Engine[PHP]: Error trying to establish database connection


php,mysql,database,wordpress,google-app-engine
I am finding a problem with the cloudsql as database connection is not being established. I have followed the following site step-by-step https://googlecloudplatform.github.io/appengine-php-wordpress-starter-project/ However I seem to be finding problems after deploying the project to the appengine and accessing the wordpress installation page with URL: https://<PROJECT-ID>.appspot.com/wp-admin/install.php The error shown is:...

GAE DOMDocument::load(): I/O warning : failed to load external entity


javascript,php,json,google-app-engine
I'm trying to move an existing webapp on GAE. At the moment the app is running on my local SDK. When the app perform a compatibility check, it returns an error (read by FIREBUG ): This is the code of the js who fails: function checkConfig(){ // Launch the configuration...

Scaling non-default version of Google App Engine Backend


python,google-app-engine
I have a live app that uses Google App Engine (python) for the backend. The app is pointing https://my-app.appspot.com. Because the API has changed significantly, I've set up a new version of the backend, 'v2', and am pointing the new app to https://v2.my-app.appspot.com. I see now in the docs, that...

Google App Engine: Unable to find 'increasing daily budget'


google-app-engine,payment
I can't find it. https://cloud.google.com/appengine/pricing talks about it but doesn't say where. This one says it's in the billing section, but the billing section lists my billing accounts and inside an account I can't find it either....

serving GAE applications over http


java,google-app-engine,ssl
I have implemented an application on GAE which can be accessible through https://<my_app_id>.appspot.com. Now I have a custom domain registered with Register.com. As described in GAE documentation I have mapped my custom domain to https://<my_app_id>.appspot.com and I see my application getting served from my custom domain. But I see requests...

Using partition key along with secondary index


cassandra,nosql,bigdata,cassandra-2.0
Following are the two queries that I need to perform. select * from where dept = 100 and emp_id = 1; select * from where dept = 100 and name = 'One'; Which of the below options is better ? Option 1: Use secondary index along with a partition key....

Cloud Endpoints with Cloud SQL sample code


java,google-app-engine,google-cloud-endpoints,google-cloud-sql
I am looking to use Cloud Endpoints to create an API on the Google App Engine but all the tutorials I see are for how to interact with the Cloud Datastore. I am looking to use Cloud SQL though. Does anyone have a Cloud Endpoints class sample code (in Java)...

Could someone bring Google OAuth2 for Cloud DNS via Rest to light?


c#,api,rest,google-app-engine,dns
Infinite Loop Error. I have been spending way too much time going in circles in Googles really terrible API documentation regarding implementation of their cloud services in a desktop application. First major problem, it seems that I must use Oauth2 somehow in order to obtain additional client credentials (which I...

Google App Engine datastore: filter()


python,google-app-engine
I'm trying to retrieve an entry from Google App Engine's datastore using the filter() method as follows: result = Sender.all().filter("email =", email).filter("source_address =", source).filter("dest_address =", dest).filter("food_type =", food_type) Then, if such an entry exists, I change the value of one of the columns in that entry. Otherwise, I'm displaying an...

Google Appengine - Entity class is not enhanced


google-app-engine
If I change an Entity class in a an appengine project in eclipse, in run time I get this error Found Meta-Data for [classpath] but this class is not enhanced!! Please enhance the class before running DataNucleus. Even when I undo the changes and re-run the local srv i get...

App Engine - NDB query with projection requires subproperty?


google-app-engine,gae-datastore,app-engine-ndb,google-app-engine-python
I have the following objects: class Address(ndb.Model): type = ndb.StringProperty() # E.g., 'home', 'work' street = ndb.StringProperty() city = ndb.StringProperty() class Friend(ndb.Model): first_name = ndb.StringProperty() # E.g., 'home', 'work' last_name = ndb.StringProperty() class Contact(ndb.Model): name = ndb.StringProperty() addresses = ndb.StructuredProperty(Address, repeated=True) friends = ndb.StructuredProperty(Friend, repeated=True) And now to optimize the...

Merge a large list of logical vectors


r,list,merge,bigdata
I have a large list of TRUE/FALSE logical vectors (144 list elements, each ~ 23 million elements long). I want to merge them using any to produce one logical vector. If any of the first elements of each list element are TRUE then TRUE is returned and so on for...

GAE webapp2 delete all UserTokens (drop all sessios) for specific user


python,google-app-engine,webapp2
I want to drop all user sessions when user resets his password, but I can't find a way to do that. My idea was to get all UserTokens of the specific user and delete them, but it seems impossible, because of user = model.StringProperty(required=True, indexed=False) in UserToken model Any ideas...

How can you get the Google+ Profile of the current user when using Google Cloud Endpoint's (Java) built in authentication?


google-app-engine,google-plus,google-cloud-endpoints
My Setup Backend: Google App Engine (Java) w/ Google Cloud Endpoints using Endpoint's built in authentication Frontend: AngularJS web app Problem I need to get the Google+ profile for my users. The keyword "me" can generally be used to get the current user's Google+ profile, however since all the authentication,...

GAE python - client_secrets.json 'File not found' - app.yaml error?


python,json,google-app-engine,youtube-api,app.yaml
I am using GAE (python) to make a web application. I am specifically attempting to get a youtube API to work, but I cannot get the credentials to function properly. If I follow the "Retrieve my Uploads" example here, I keep getting an error when importing the client_secrets.json file: InvalidClientSecretsError('File...

IllegalArgumentException: expected primitive class, but got: class UUID


android,google-app-engine,google-cloud-endpoints
My app is using GAE endpoints. My model has UUID. And once i try to send it Android app it encounters illegalArgument exception. Anyone has got recommendations how to handle endpoints model with UUID on android app with Google App Engine endpoints? 06-14 23:26:49.560 27462-27488/com.example E/AndroidRuntime﹕ FATAL EXCEPTION: AsyncTask #1...

Profiling memory usage on App Engine


python,google-app-engine,memory
How can I profile memory (RAM) usage on my App Engine app? I'm trying to address errors related to exceeding the instance memory limit. I've tried these things and, so far, they don't work or don't provide what I need. Appstats. This doesn't provide memory usage details. Apptrace. It hasn't...

GAE Managed VMs: Possible to use C-based Python libraries with standard runtime?


python,google-app-engine
I'm building a background module for my app in Python 2.7, but it needs to use C-based external libraries such as OpenCV. While GAE only "directly" supports pure Python libraries, I understand that using a managed VM removes that constraint. What I'm not quite clear on, after reading the documentation,...

Trying to download a file using Dropbox Java API in the GAE


java,google-app-engine
I have an XML file on Dropbox that I want to access from my Google App Engine using the Dropbox Java API. After a bit of playing around I find the GAE doesn't support FileOutputStream. FileOutputStream outputStream = new FileOutputStream("myFile.txt"); try { DbxEntry.File downloadedFile = client.getFile("/myFile.txt", null, outputStream); System.out.println("Metadata: "...

Designing an API on top of BigQuery


google-app-engine,bigdata,google-bigquery
I have an AppEngine app that tracks user various sorts of impression data across several websites. Currently we're gathering roughly 40 million records a month and the main BigQuery table is closing in on 15Gb in size after 6 weeks of gathering data and our estimates show that within 6...

Use of core-site.xml in mapreduce program


hadoop,mapreduce,bigdata
I have seen mapreduce programs using/adding core-site.xml as a resource in the program. What is or how can core-site.xml be used in mapreduce programs ?

What are the different ways to check if the mapreduce program ran successfully


hadoop,mapreduce,bigdata
If we need to automate a mapreduce program or run from a script, what are the different ways to check if the mapreduce program ran successfully? One way is to find is if _SUCCESS file is created in the output directory. Does the command "hadoop jar program.jar hdfs:/input.txt hdfs:/output" return...

Objectify - should I create an entity super class?


java,google-app-engine,objectify
Is there any reason why shouldn't all my entities be subclasses of one generic ModelEntity object? @Entity public class ModelEntity { @Id Long id; } @Subclass public class User extends ModelEntity { @Index String username; } The advantages are clear: there is code common to all entities (like id, date,...

AJAX call to Servlet Google App Engine (GAE)


java,ajax,jsp,google-app-engine,servlets
I am trying to make an AJAX call to my servlet but it doesn't work. I can't redirect to a JSP. This is my AJAX call: $.ajax({ url: '/register', type: 'GET' }); This is my Servlet: public void doGet(HttpServletRequest req, HttpServletResponse resp) throws IOException, ServletException { getServletContext().getRequestDispatcher("/prueba.jsp").forward(req, resp); } If...

Check if a queue is empty in Google App Engine


python,django,google-app-engine,task-queue
I have a script that add tasks to a queue. For example: api.py: from google.appengine.api import taskqueue [...] for u in users: taskqueue.add(queue_name='mailqueue', url="/api/users/send-notification/%s" % (u.id), method='GET') I would like to check if the queue is empty and all tasks are finishing view.py: if queue_is_empty: print "Your task is finished"...

Shuffled vs non-shuffled coalesce in Apache Spark


scala,apache-spark,bigdata,distributed-computing
What is the difference between the following transformations when they are executed right before writing RDD to a file? coalesce(1, shuffle = true) coalesce(1, shuffle = false) Code example: val input = sc.textFile(inputFile) val filtered = input.filter(doSomeFiltering) val mapped = filtered.map(doSomeMapping) mapped.coalesce(1, shuffle = true).saveAsTextFile(outputFile) vs mapped.coalesce(1, shuffle = false).saveAsTextFile(outputFile)...

Omniture Data Warehouse Segments Issue


bigdata,data-warehouse,adobe-analytics
Currently, I'm trying to create a segment filter called "Only Search Page" which filters out one particular server from a list of several thousand. Currently, I'm a little stuck and it might be easier to explain with screenshots. In the Segment Manager I set up a segment to check for...

GAE/P: Migrating to NDB efficiently


python,google-app-engine,app-engine-ndb
I'm finally upgrading from db to ndb (it is a much bigger headache than I anticipated...). I used a lot of ReferenceProperty and I've converted these to KeyProperty. Now, every place where I used a ReferenceProperty I need to add an explicit get because it was previously done for me...

Do we HAVE to generate and use client libraries to use Google App Engine's Endpoints?


ios,swift,rest,google-app-engine,google-cloud-endpoints
I am currently developing an Swift iOS app with GAE Endpoints for the RESTful API. It seems like all the tutorials and documents make you generate and use client libraries if you need to use the API on the client side. I was wondering if it's possible for me to...

GAE Python PyML ImportError: No module named _ckernel


python,google-app-engine,pyml
I'm trying to import PyML on Google App Engine as a requirement for another library, however I am getting the following import error: File "/base/data/home/apps/s~myapp/uno.385079313378714244/PyML/__init__.py", line 4, in <module> from PyML.containers import * File "/base/data/home/apps/s~myapp/uno.385079313378714244/PyML/containers/__init__.py", line 3, in <module> VectorDataSet = __import__('PyML.containers.vectorDatasets', fromlist=['']).VectorDataSet File...

How to get public link for the uploaded file on google cloud storage in local dev server(Google App engine+JAVA)


java,google-app-engine,file-upload,google-cloud-platform
I am trying to upload the image files using gcs client library+java in local google app engine dev server. Images are uploaded successfully and i can see the entries created in local datastore under localhost:8888/_ah/admin/datastore How to get the public key for the uploaded images so that i can show...

Google App Engine performance setting : Some performance settings must be changed via Module configuration files


java,google-app-engine
Google App Engine performance setting is not available for the application, it is displaying as Some performance settings must be changed via Module configuration files. My question is, in my current application I am not using module, so it should have performance setting sliders displayed? Or I need to set...

In terms of back-end and front-end technology, what can GAE do that Web Hosting can't? [closed]


google-app-engine,web-hosting,lamp,paas,web-technologies
I vaguely understand the difference between Google App Engine and a traditional Web Hosting service. I do understand Google App Engine can scale for a much wider audience, thanks to not having to maintain your own hardware, handling the load-balancing, spreading the data over multiple locations, etc. But in terms...

Choosing specific ports on local development server for non-default modules


java,google-app-engine,android-studio,gradle,app-engine-modules
In my build.gradle file, I use the following config for my non-default (module2) AppEngine gradle module: appengine { downloadSdk = true httpAddress = "0.0.0.0" httpPort = 8081 appcfg { email = "[email protected]" oauth2 = true } } However, when I run my "Google AppEngine configuration", the module still starts on...

What is the equivalent of BlobstoreLineInputReader for targeting Google Cloud Storage?


python,google-app-engine,mapreduce,pipeline
This is a python appengine question, mapreduce library 1.9.21 . I have code writing lines to a blob in the local blobstore, then processing that using mapreduce BlobstoreLineInputReader. Given that the files api is going away, I thought I'd retarget all my processing to cloud storage. I would expect to...

Getting user credentials using Google+ API


android,google-app-engine,google-api,google-api-java-client
I am trying to include Google sign in in my android application using Google+ Api. I am able to take account details from the user but once signed in I am getting null when requesting for username using call: Plus.PeopleApi.getCurrentPerson(mGoogleApiClient).getDisplayName() And Logcat shows: BasicNetwork.performRequest: Unexpected response code 403 for https://www.googleapis.com/plus/v1/people/me...

viewing google app engine Python logging messages in CodeEnvy


python,google-app-engine,logging,cloud
I'm trying to move my GAE development to the cloud. So far Codeenvy has the richest toolset however I'm struggling with one small issue. when I use the python logging library I don't know where to view these messages! def post(self): self.response.write('Processing form data...') feedback = self.request.get('content') logging.info(feedback) I assumed...

How to get a response for a streaming url on google app engine (python)


python,google-app-engine,urllib2,urlfetch
I am trying to verify if a online radio url is delivering music and if the url was redirected or not (this happens if for some reason the request url is wrong or not active). I found some advices here Fetching url in python with google app engine. However, for...

I am working on small project with Google AppEngine (Python), tutored by Udacity. I am unable to render user comments to main page


python-2.7,google-app-engine,gae-datastore,jinja2
I am suppose to add forms to my HTML to allow users to add data (comments to my page) and also store the data via GAE-Datastore. I have been able to store the data but not able to get the user comments posted to the main page. Seperate code files:...

Hadoop map reduce Extract specific columns from csv file in csv format


java,hadoop,file-io,mapreduce,bigdata
I am new to hadoop and working on a big data project where I have to clean and filter given csv file. like if given csv file has 200 columns then I need to select only 20 specific columns (so called data filtering) as a output for further operation. also...

No module named _mysql - Google App Engine & Django


python,mysql,django,google-app-engine
First of all I'm working on Mac (Yosemite). I've created a simple Django project with Google App Engine. I'm using Cloud SQL in production and MySQL in development environment as recommended in the docs. The project uses virtualenv on my dev machine of course. I can run the project with...

Google AppEngine channel is opened, client is receiving responses, but socket.onmessage is not being called


javascript,google-chrome,google-app-engine,long-polling,channel-api
I have a webpage that I want to use the google app engine channel API with. I have a token being generated with an external library, which is fed into this very, very simple javascript. <html lang="en"> <body> <script src="jquery-1.6.3.min.js" ></script> <script type="text/javascript" src="/_ah/channel/jsapi"></script> <script type="text/javascript"> var token, channel, socket;...

upload CSV file to database on Google app engine using Python


python,database,google-app-engine,csv,upload
I'm a newbie in python and started learning it from about a week. I was looking into couple of tasks one of which was to upload a file (.txt or .jpg) to a bucket on GAE. I was able to solve it by following couple of tutorials online, but i'm...