FAQ Database Discussion Community


BigQuery streaming - our data is not showing up anymore

google-bigquery
We're using the BigQuery streaming API, and we have been for some time now. We noticed that about 4:05am UTC (June 18th) BigQuery was no longer reporting any new data being streamed in. We checked all our logs, and everything looks good, and we're even getting back 200's from the...

How to get distinct values on GROUP_CONCAT using Google Big Query

distinct,google-bigquery,group-concat
I'm trying to get distinct values when using GROUP_CONCAT in BigQuery. I'll recreate the situation using a simpler, static example: EDIT: I've modified the example to represent better my real situation: 2 columns with group_concat which needs to be distinct: SELECT category, GROUP_CONCAT(id) as ids, GROUP_CONCAT(product) as products FROM (SELECT...

unable to configure apprtc.appspot with own url

python,google-bigquery,webrtc,apprtcdemo
This is the error I get when I try to configure apprtc with my own url. I tried to set up my own Turn Server and also tried to give a client url but it still did not work . <HttpError 404 when requesting https://www.googleapis.com/bigquery/v2/projects/esuioswebrtc/datasets/prod/tables/analytics/insertAll?alt=json returned "Not Found: Table esuioswebrtc:prod.analytics">...

'Immediate Follow' Page Path in BigQuery

google-bigquery
I am working in BigQuery to understand how many users complete a specific page path (at any point in the session). Lets say the page path is Page 1 -> Page 2 -> Page 3. The pages must be followed in sequential order. I am able to use BQ to...

TABLE_QUERY fails when trying to use last_modified_time

google-bigquery
I have an existing query which uses TABLE_QUERY(), and filters the results based on creation_time: SELECT * FROM (TABLE_QUERY( project_114151_dataset , "MSEC_TO_TIMESTAMP(creation_time) > DATE_ADD(CURRENT_TIMESTAMP(), -45, 'DAY') AND REGEXP_MATCH(table_id, r'^fact_[0-9]{8}$') ")) I want to change the query to run based on last_modified_time; since it is also a timestamp in msec, I...

BigQuery running totals

google-bigquery,window-functions,running-total
I'm having troubles making running totals work for me in BigQuery. I've found an example that works here: BigQuery SQL running totals SELECT word, word_count, SUM(word_count) OVER(ORDER BY word DESC) FROM [publicdata:samples.shakespeare] WHERE corpus = 'hamlet' AND word > 'a' LIMIT 30 But what I really want to do -...

BiqQuery which tool is used to produce reports online? [closed]

report,google-bigquery
I'am new to BigQuery and I come to Google cloud from "oracle forms/reports"... Can someone give me some direction which tool is used to produce some kind of report connected to BigQuery? For now I have tried to export BigQuery datas into CSV and use that data in Excel or...

'TRIM' or 'PROPER' in BigQuery

google-bigquery,trim
is there a way to normalize strings in BigQuery? My dataset looks like: Alfa Beta Alfa BETA alfa beta //with a space after 'beta' By now I can use lower or upper to normalize the letters but I don't know how to eliminate spaces before and after the text. Does...

Export Google Cloud Datastore and import to BigQuery programmatically

gae-datastore,google-bigquery,google-datastore
I'm looking for a method to export my Cloud DataStore and import it into Big Query daily. The manual way is described at google page. I do not find a clean way to automate it.

combining two multiple bigquery SELECT FROM statements

sql,google-bigquery
I'm somewhat new to BigQuery and SQL, so part of the difficulty I'm having may be that I don't know how to describe the problem I'm having sufficiently well to be able to search for an answer, but I have looked so please be gentle. What I'm trying to do...

What does a BigQuery dry run return?

google-bigquery
The BigQuery docs say: "Test your queries on smaller partitions of the table rather than one large table. If using the API, validate queries for syntax and get data processing statistics using the dryRun flag." But they also say, for dryRun: "If set, don't actually run the query. A valid...

BigQuery SPLIT() and grouping by result

google-bigquery
Using SPLIT() & NTH(), I'm splitting a string value, and taking the 2nd substring as the result. I then want to group on that result. However, when I use SPLIT() in conjunction with a GROUP BY, it keeps giving the error: Error: (L1:55): Cannot group by an aggregate The result...

How do I cast dd/mm/yyyy string into date in BigQuery?

datetime,casting,google-bigquery,string-to-datetime
I have 3 columns 1. dd/mm/yyyy (stored as a string) 2. app_id and #downloads of apps I have to find unique ids of apps downloaded within a week. Thank you ...

BigQuery: Using threshold with COUNT DISTINCT in WINDOW function returns error

google-bigquery,window-functions
With COUNT DISTINCT, I often make use of a threshold to make it more precise. E.g. COUNT(DISTINCT users, 100000). If I am using a WINDOW function though I get an error when trying to use a threshold COUNT_DISTINCT must have at most 1 argument(s), found 2. E.g. here's a made-up...

Using external .csv file in Google BigQuery

google-bigquery
I want to use an external .csv file with line-separated words to use in Google BigQuery. I want to do something of this sort, please excuse the pseudocode: import data_file SELECT some_words FROM publicdata:samples.shakespeare WHERE some_words CONTAINS data_file.word I want to do this for each word in the data file,...

Running count of apperance of customer id Bigquery

google-bigquery
there are similar questions on here but either I cannot figure out how to convert to my situation (Likely) or they are not that similar but read close to what i want to do (BigQuery: How to calculate the running count of distinct visitors for each day and category?) Anyway......

Cannot use calculated offset in BigQuery's DATE_ADD function

google-bigquery,tableau,google-cloud-platform
I'm trying to create a custom query in Tableau to use on Google's BigQuery. The goal is to have an offset parameter in Tableau that changes the offsets used in a date based WHERE clause. In Tableau it would look like this: SELECT DATE_ADD(UTC_USEC_TO_MONTH(CURRENT_DATE()),<Parameters.Offset>-1,"MONTH") as month_index, COUNT(DISTINCT user_id, 1000000) as...

Bigquery select distinct values

google-bigquery
How to select distinct values in Google Bigquery? Query: SELECT DISTINCT cc_info FROM user WHERE date = ? Thanks!...

BigQuery first record of a specific column

google-bigquery
here's a sample of my output: I need to take just the first record per each visitId (with the min time). I've tried to use the MIN function excluding the hits.time from the GROUP BY list: SELECT STRFTIME_UTC_USEC(date, '%U') AS WK, visitId, date AS SALES_DATE, hits.eventInfo.eventLabel AS SEARCH_DD, year(date) as...

Google spreadsheet script authorisation to BigQuery

google-apps-script,google-spreadsheet,google-bigquery
I have a Google spreadsheet with a script that connects to BigQuery (using this tutorial - https://developers.google.com/apps-script/advanced/bigquery?hl=ar-AE). It adds an extra menu option and users can run the script that executes a query to BigQuery. It works fine for me and I want to share this spreadsheet with other users...

Getting A Better Understanding Of Streaming Inserts With BigQuery

google-bigquery,google-cloud-platform
I understand that there has been a material change relating to the BigQuery streaming API. As I received in a message from the Google cloud team on Thursday, May 14th: "In 2013, we launched Google BigQuery streaming API, making it easy to analyze large amounts of data quickly. This product...

BigQuery bq command with asterisk (*) doesn't work in Compute Engine

google-bigquery,google-compute-engine,google-cloud-platform
I have a directory with a file named file1.txt And I run the command: bq query "SELECT * FROM [publicdata:samples.shakespeare] LIMIT 5" In my local machine it works fine but in Compute Engine I receive this error: Waiting on bqjob_r2aaecf624e10b8c5_0000014d0537316e_1 ... (0s) Current status: DONE BigQuery error in query operation:...

BigQuery - Check if table already exists

google-api,export,google-bigquery,google-cloud-storage
I have a dataset in BigQuery. This dataset contains multiple tables. I am doing the following steps programmatically using the BigQuery API: Querying the tables in the dataset - Since my response is too large, I am enabling allowLargeResults parameter and diverting my response to a destination table. I am...

Value cannot be null. Parameter name: baseUri

c#,google-api,google-bigquery,google-api-dotnet-client,service-accounts
I am using Google BigQuery API with Service Account Authorization in C# console application. When i am trying to load CSV job, it throws "Value cannot be null. Parameter name: baseUri" Exception. It specific on only single table, when i am changing the table name it all gets work. Here...

Can bigquery query flattern tables and convert it into nest data structure

google-bigquery
Now I'm facing a problem that there are 5 tables in bigquery, image that table A have a record that in table B there are 5 records have some connections with it .Here is the example: Table A: record a Table B: record b,c,d,e,f Now I'm using this sql: select...

Regex QueryString Parsing for a specific in BigQuery

regex,google-app-engine,logging,google-bigquery
So last week I was able to begin to stream my Appengine logs into BigQuery and am now attempting to pull some data out of the log entries into a table. The data in protoPayload.resource is the page requested with the querystring paramters included. The contents of protoPayload.resource looks like...

Is it good to call Thread.Sleep during polling Google Big Query results in ASP.NET? Alternatives?

c#,asp.net,asp.net-mvc,async-await,google-bigquery
I am using ASP.NET MVC 5 which gets data from Google Big Query. Due to the way Google Big Query is designed, I need to poll for results if job is not finished. Here is my code, var qr = new QueryRequest { Query = string.Format(myQuery, param1, param2) };// all...

How to Pivot in Google BigQuery

python,pandas,google-bigquery
Suppose I have the following query sent to BQ: SELECT shipmentID, category, quantity FROM [myDataset.myTable] Further, suppose that the query returns data such as: shipmentID category quantity 1 shoes 5 1 hats 3 2 shirts 1 2 hats 2 3 toys 3 2 books 1 3 shirts 1 How can...

Get MAX from row with column name (SQL)

google-bigquery
Sorry if my questin is simple, but I spent one day for googling and still can't figure out how to solve this: I have table like: userId A B C D E 1 5 0 2 3 2 2 3 2 0 7 3 And I need each MAX per...

Bigquery AllowLageResults and setMaxResults

java,google-bigquery
Instead of setting AllowLargeResults true for job in bigquery, can we use setMaxresults property to get response in pages or we still need to set AllowLargeResults even if we have set the setMaxresults and getting response in pages.

BigQuery error: Cannot query the cross product of repeated fields

google-analytics,google-bigquery
I am running the following query on Google BigQuery web interface, for data provided by Google Analytics: SELECT * FROM [dataset.table] WHERE   hits.page.pagePath CONTAINS "my-fun-path" I would like to save the results into a new table, however I am obtaining the following error message when using Flatten Results = False:...

Designing an API on top of BigQuery

google-app-engine,bigdata,google-bigquery
I have an AppEngine app that tracks user various sorts of impression data across several websites. Currently we're gathering roughly 40 million records a month and the main BigQuery table is closing in on 15Gb in size after 6 weeks of gathering data and our estimates show that within 6...

BigQuery SPLIT() ignores empty values

google-bigquery
It appears that SPLIT() treats empty values as though they don't exist at all, which yields unexpected results. For example: SELECT NTH(3, SPLIT(values, ",")) FROM (SELECT "a,b,,d,e" as values) returns "d", when I would expect it to return NULL. You can see how this would be problematic for several rows...

Subscriber names from github

google-bigquery
I am experimenting with big query. So far I've been able to get the number of people that have watched a repository. Is it possible to get the user names of the subscribers that have watched a repository? Thanks...

Bigquery - select timestamp as human readable datetime

google-bigquery
How to select timestamp(stored as seconds) as human readable datetime in Google Bigquery? schema id(STRING) | signup_date(TIMESTAMP) I wrote a query using DATE function, but getting error SELECT DATE(create_date) FROM [accounts] Error: Invalid function name: DATE; did you mean CASE? Thanks!...

Hard limit on number of tables in a BQ project

google-bigquery
I've got some highly partitionable data that I'd like to store in BigQuery, where each partition would get its own table. My question is if BQ will support the number of tables I'll need. With my data set, I'd be creating approximately 2,000 new tables daily. All tables would have...

Syntax to run a distinct GROUP_CONCAT in Google Bigquery

google-bigquery
I have this query: SELECT campaign.id AS campaign_id, GROUP_CONCAT(utm.campaign) AS utm_campaign FROM [email_event] WHERE (TIMESTAMP BETWEEN SEC_TO_TIMESTAMP(1412136000) AND SEC_TO_TIMESTAMP(1414814340)) GROUP BY campaign_id; And I would love to run a distinct GROUP_CONCAT, as now same entries are repeated in the output. UPDATE I've extended your solution to this: SELECT campaign.id AS...

How to integrate Google Bigquery with c# console application

c#,integration,google-bigquery
If it is possible to integrate Google big query with C# console application?. If yes how we can do, i searched over internet i could not find proper answer for that. I want connection string format? I have created Client ID from Google Developer console how authentication has done? It...

Substring in Google BigQuery

date,substring,google-bigquery
If I have a Date parameter such as: 03. Mar at 5:00pm PST and I only need to break it down by date so that my end result is: 03. Mar how can I achieve that? Is there a substring equivalent syntax in Google BigQery? Or maybe a date/time function...

Can we perform joins on tables in two different projects in BigQuery?

database,join,google-bigquery
I have two projects having datasets. I want to perform join of one table from first project to table in second project. How can I do that? Query ?

Google BigQuery - simulate Pandas removeDuplicates() in Google BigQuery SQL

sql,data,pandas,analytics,google-bigquery
Given a Google BigQuery dataset with col_1....col_m, how can you use Google BigQuery SQL to return the dataset where there are no duplicates in say... [col1, col3, col7] such that when there are rows with duplicates in [col1, col3, col7], then the first row among those duplicates is returned, and...

How do I share a bigquery dataset with another project?

google-bigquery
How do I share a bigquery table/dataset with another project? I do not see an option to share with a specific project.

BigQuery; extract numbers only from a string

split,extract,google-bigquery
my data looks like a 1x1000 vector with variable number of inputs in a like. sometimes it is just age but sometimes they add weight and state ID. 85 age 15 age; 68 Weight 25 age; 80 Weight; 02 Alaska 72 Weight; 50 Wyoming What I would like to get...

Big Query Table Last Modified Timestamp does not correspond to time of last table insertion

google-bigquery
I have a table, rising-ocean-426:metrics_bucket.metrics_2015_05_09 According to the node js API, retrieving metadata for this table, Table was created Sat, 09 May 2015 00:12:36 GMT-Epoch 1431130356251 Table was last modified Sun, 10 May 2015 02:09:43 GMT-Epoch 1431223783125 By my records, the last batch insertion to this table was actually on:...

Google Bigquery query execution using google cloud dataflow

google-bigquery,google-cloud-dataflow
Is it possible to execute Bigquery's query using Google cloud data flow directly and fetch data, not reading data from table then putting conditions? For example, PCollections res=p.apply(BigqueryIO.execute("Select col1,col2 from publicdata:samples.shakeseare where ....")) Instead of reinventing using iterative method what Bigquery queries already implemented, we can use the same directly....

BigQuery completed job returns 404 on getting query results (immediately after)

google-bigquery
We run a set of queries on a 2 hour interval which have been running for a week now without issues. Recently on 2015-06-04 00:00:26 UTC we had a job (job_OY8G2_I-F6dbXFW93GdB94wc_W0 ) marked as done, but we received a 404 HTTP exception when trying to get the query results. I...

IGNORE CASE query problems saving to a table and using Allow large results

google-bigquery
I need case insensitivity in my queries so I found IGNORE CASE which works superbly when used in queries that target the browser (I am talking about BQ web UI). If I choose a destination table (an absolute must for me) and select Allow Large Results (with unchecked Flatten Results)...

Append a column and its data to a BigQuery table

google-bigquery
I have successfully been able to append a column to a BigQuery table using this link. This only update the schema but I'd like also to fill the field of the newly added column. Is there a way to do it ? Thank you !...

How to flatten with a table wildcard in BigQuery?

google-bigquery
We recently switched to a standard setup with tables that are labeled by month (foo_2015_05) with a common format that contains a repeated field. Originally when I created a view based on one, large table, it forces me to FLATTEN the table on the repeated field. When trying to update...

Job configuration for Synchronous query in Google Big Query

python,google-app-engine,google-bigquery
Can job configuration be done to allow Large results using Synchronous big query ?

How do you calculate a boolean aggregate over a column in BigQuery?

sql,aggregate-functions,google-bigquery
I have a table of events of users, and I want to project those events into a new column with some predicate, and then aggregate the events together per user into a new projection that tells me if a user has ever had the predicate match for them, or if...

BigQuery query without join gives join error on “not in” usage

google-bigquery
When I do select from a sub-select and finally want to do a "not in" with the results, BQ query gives the following error: Error: Join attribute is not defined: t1.customer_id When I change the "not in" part with a simple where t1.customer_id = 1 for a example, the query...

BigQuery Join If

sql,google-bigquery
For a project about books, I have a large table 'Books' with details about a large number of book titles [author, title, pubDate, etc...]. I also have a table which contains pseudonyms information for authors who used them [authorName, pseudName]. What I want to do is add a column to...

Trying to find exact word match within separate table field, accounting for negative words

google-bigquery
I have tried so many different queries to get this one right, but its become a complete mess. Long story short, I am trying to find an exact word match (isolated word separated by spaces) based on 3 separate keywords and excluding any matches that contain negative keywords. field_name_1, field_name_2...

Unable to extract length of an integer column

google-bigquery
I am trying to extract the length of an integer column but I get this error as below. Can someone share their thoughts on this? Thanks. Error: Argument type mismatch in function LENGTH: first argument is type int64...

'Allow Large Results' option in Browser Tool is not honored

google-bigquery
I'm trying to query a relatively small table (1.3M rows, 517MB) and do an order by on one of the columns. The results are configured to write to another table and "Allow Large Results" is checked. But BigQuery still gives the error: Error: Response too large to return. Consider setting...

Bigquery union/join error

google-analytics,google-bigquery
I am getting an error when trying to pull from my google analytics bigquery export tables... I want to look at a month's worth of data with some filters (including one that narrows it down to a list of specific fullvisitorids of interest). However, when I run the following query,...

Error when I try to create different BigQuery tables at the same pipeline execution

google-bigquery,google-cloud-dataflow
I have a pipeline execution with the below code: PCollection<TableRow> test1 = ... test1 .apply(BigQueryIO.Write .named("test1 write") .to("project_name:dataset_name.test1") .withSchema(tableSchema) .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED) .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)); PCollection<TableRow> test2 = ... test2 .apply(BigQueryIO.Write .named("test2 write") .to("project_name:dataset_name.test2") .withSchema(tableSchema)...

Hits per day in Google Big Query

sql,google-bigquery
I am using Google Big Query to find hits per day. Here is my query, SELECT COUNT(*) AS Key, DATE(EventDateUtc) AS Value FROM [myDataSet.myTable] WHERE ..... GROUP BY Value ORDER BY Value DESC LIMIT 1000; This is working fine but it ignores the date with 0 hits. I wanna include...

Using multiple '*' patterns when loading into BigQuery won't work

google-bigquery
We're trying to use a glob pattern when loading into BigQuery, for example: gs://<bucket_name>/Network*Impressions_12345_20150201* We have both "..NetworkImpressions_.." and "..NetworkBackfillImpressions_.." in our bucket, so we use the first '*' to scoop up both types of files. But BQ borks with: "Not found: URI gs://backup-gdfp-7415/Network*Impressions_232503_20150101_20*" The files definitely exist. If we...

Using more than one field with IN ( ) for a sub-query

google-analytics,google-bigquery
In Google BigQuery, I would have to do something like: SELECT hits.item.productName FROM [‘Dataset Name’ ] WHERE date, visitId, fullVisitorId IN ( SELECT date, visitId, fullVisitorId FROM [‘Dataset Name’ ] WHERE hits.item.productName CONTAINS 'Product Item Name A' AND totals.transactions>=1) However, this does not seem to be supported. What alternatives do...

Use the JOIN command with multiple conditions

sql,google-bigquery
I am using Google BigQuery and I get this query to work: SELECT suppliers.supplier_id, suppliers.supplier_name, orders.order_date FROM suppliers INNER JOIN orders ON suppliers.supplier_id = orders.supplier_id; but I would also like to add another condition, that must be met, e.g. that the suppliers.order_date must equal orders.order_date. Something like ON suppliers.supplier_id =...

Using ignoreUnknownValues from Hadoop BigQuery Connector

hadoop,google-bigquery,google-hadoop
I'm piping unstructured event data through Hadoop and want to land it in BigQuery. I have a schema that includes most of the fields, but there are some fields I want to ignore or don't know about. BigQuery has a configuration field called ignoreUnknownValues, but I can't figure out how...

Is there any form to write to BigQuery specifying the name of destination tables dynamically?

google-bigquery,google-cloud-dataflow
Is there any form to write to BigQuery specifying the name of destination tables dynamically? Now I have: bigQueryRQ .apply(BigQueryIO.Write .named("Write") .to("project_name:dataset_name.table_name") .withSchema(Table.create_auditedTableSchema()) .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED) .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)); But I need the "table_name" as a dynamic table name that depends on the "tablerow" data that I want to write....

How can I generate a BigQuery result without JSON formatting?

python,google-bigquery
I've been adapting the asynch_query.py script found at the bigquery-python-samples github repository to return data from google bigquery. As written the section of code that returns the results produces a list of items in JSON format. def run(project_id, query_string, batch, num_retries, interval): service = get_service()query_job = async_query(service, project_id, query_string, batch,...

Inner Joining big tables in Big Query

google-bigquery
I am trying to perform an inner join between two big tables where each table consists of almost 30 million records. When I try running a simple INNER JOIN between these two tables I get an error as below asking me to use JOIN EACH syntax but I didn't find...

Row larger than the maximum allowed size

google-bigquery
I have successfully imported many gzipped JSON files on several occasions. For the two files BQ import choked. Both files reported the same error: File: 0 / Offset:0 / Line:1 / Column:20971521, Row larger than the maximum allowed size Now I've read about the row limit of 20MB and I...

unexpected behaviour of Google BigQuery WHERE NOT list CONTAINS string

sql,google-bigquery,contains
I have a small example table temp.ty: From this table I want to only extract the rows where ipc is not present in not_ipc (and where exclude is not present in other_classes), which seems rather straigtforward. Yet the following query returns zero results: SELECT * FROM temp.ty where not ipc...

Google BigQuery asking Gmail Confirmation, Best way to handle in Production Environment

google-bigquery,dev-to-production
I have created one C# console application to read BigQuery data. While i am first running this console application, it opens browser window for asking Google acceptance, further times it does not asking permission in same machine. Question: It asking permission for each and every machine for the first time,...

BigQuery streaming best practice

bigdata,google-bigquery
I am using Google BigQuery for sometime now, using upload files, As I get some delays with this method I am now trying to convert my code into streaming. Looking for best solution here, what is more correct working with BQ: 1. Using multiple (up to 40) different streaming machines...

error when importing gz files into bigquery

google-bigquery
I ran into an error when importing gzipped tab delimited files into bigquery The output I got was: [email protected]:/opt/batch/jobs# bq show -j bqjob_r5720e2f2267a5a5b_0000014d09571f27_1 Job infra-bedrock-861:bqjob_r5720e2f2267a5a5b_0000014d09571f27_1 Job Type State Start Time Duration Bytes Processed ---------- --------- ----------------- ---------- ----------------- load FAILURE 30 Apr 08:00:44 0:02:05 Errors encountered during job execution. Bad...

Regexp in BigQuery

regex,google-bigquery
how can I search in bigquery for expressions and group them even if they are messed up by semicolons? Database example: :Adidas Adidas Adidas; null adidas 7up 7UP 7UP; :7UP null I'd like to group them and count. I'd like to get this result: adidas 4 7up 4 null 2...

BigQuery - same query works when submitted from UI and reports SQL syntax error from batch

batch-processing,google-bigquery
I have a SQL query involving two joins on different fields. When I run this query interactively in the UI, I get back a result set, no problem. When I submit the exact same query in batch, I get back a SQL syntax error: Ambiguous field name 'video' in JOIN....

BigQuery raises Pagination token expired on first getQueryResults

google-bigquery
We're seeing sporadic cases (4x today) of query errrors that BigQuery raises at the firs attempt to call getQueryResults (e.g. without a pagination token). The error is: HttpError 400 when requesting https://www.googleapis.com/bigquery/v2/projects/.../queries/job_...?alt=json returned "Pagination token expired"> The status of the job on a get() call returned 'DONE'. This is the...

bigquery split string to chars

google-bigquery
Suppose I have a table, in which one of the columns is a string: id | value ________________ 1 | HELLO ---------------- 2 | BYE How would I split each STRING into it's chars, to create the following table: id | value ________________ 1 | H ---------------- 1 | E...

The Python script configuration for a Big Query job requires a sourceUri value, but there is no sourceUri

python,python-2.7,google-bigquery
When attempting to write a Python script for a Google BigQuery job. I'm following the configuration guidelines found in the job configuration properties. It indicates the configuration parameter configuration.query.tableDefinitions.(key).sourceUris[] is required. This parameter is described as "The fully-qualified URIs that point to your data in Google Cloud Storage." However, the...

Error “Login Required” when trying to query Google BigQuery with Python

python,google-bigquery
I'd like to access BigQuery data from my local Linux machine with Python. The code from Google help https://cloud.google.com/bigquery/authorization#service-accounts-server works fine giving me the list of datasets. But the query send via service library SELECT id, name FROM [test_articles.countries] LIMIT 100 fail with "Login Required" message: googleapiclient.errors.HttpError: <HttpError 401 when...

Error with BQ command line tool: Cannot start a job without a project id

google-bigquery
I am having issues with the BQ-Command line tool. Specifically when trying to query a dataset/table, whether one the public datasets or my own I get the error: BigQuery error in query operation: Cannot start a job without a project id. I have a project id set as default as...

When does a cached Big Query job expire?

google-bigquery
I am currently using bigquery and whenever a query is run, the job is cached in memory and subsequent pages can be fetched from the cached table. So, is there a fixed expiration date for the cached tables? What factors does the job data persistence depend on? I am trying...

Beginner GA export bigquery questions

google-analytics,google-bigquery
I am new to bigquery. Have a couple of rookie questions -- Is there any way to do a select top x * query, laid out kind of like the preview pane in table details? It can be a lot easier to understand when you can visually see the data...

BigQuery - filtering without losing 'null' values

null,google-bigquery,contains
I try to filter a database but unluckily I lose the 'null' values either way: The Sample looks like Name | City | Sold Nike | NYC | 15 null | SFO | 20 Mega | SEA | 10 null | null | 8 nike | CHI | 12 I...

BigQuery Bug: SELECT of aliased field fails if scoped aggregation in subquery

google-bigquery
This query fails: SELECT g.repository.url, cnt, FROM (SELECT repository.url, COUNT(payload.pages.action) WITHIN RECORD as cnt, FROM publicdata:samples.github_nested) g LIMIT 10 With the error: Field 'g.repository.url' not found; did you mean 'repository.url'? It looks like aliased fields in the SELECT clause don't work when the SELECT includes a field calculated from a...

How to count push events on GitHub using BigQuery?

github,google-bigquery
I'm trying to use the public GitHub dataset on BigQuery to count events - PushEvents, in this case - on a per repository basis over time. SELECT COUNT(*) FROM [githubarchive:github.timeline] WHERE type = 'PushEvent' AND repository_name = "account/repo" GROUP BY pushed_at ORDER BY pushed_at DESC Basically just retrieve the count...

Big query - Concatenate strings horizontally

google-bigquery,string-concatenation
I have a data with a column for the first name and a column for the last name. I try to combine the them into one column via code: SELECT GROUP_CONCAT_UNQUOTED(full_name,' ') from (Select first_name as check from [DATA]), (select last_name as check from [DATA]) But it returns a one...

BigQuery command line tool - append to table using query

google-bigquery
Is it possible to append the results of running a query to a table using the bq command line tool? I can't see flags available to specify this, and when I run it it fails and states "table already exists" bq query --allow_large_results --destination_table=project:DATASET.table "SELECT * FROM [project:DATASET.another_table]" BigQuery error...

Google BigQuery asking for JOIN EACH but I'm already using it

google-bigquery
I'm trying to run a query in BigQuery which has two sub selects and a join, but I can't get it to work. What I'm doing as a workaround is to run the subselects by themselves, then saving them as tables, then doing another query with a join, but I...

Alter table or select/copy to new table with new columns

google-bigquery
I have a huge BQ table with a complex schema (lots of repeated and record fields). Is there a way for me to add more columns to this table and/or create a select that would copy the entire table into a new one with the addition of one (or more)...

Google Bigquery says “Response too large to return” with simple select

google-bigquery
Modifier allowLargeResults is set on and I have also tried interactive and batch query priority. There are 70M records in table search_results, 10M records in searches and about (just) 900 in buy table. And also the WHERE reduces the number of rows pretty well. SELECT s.flyFrom, s.to, s.typeFlight, r.price, b.price,...

DateTime offset in Google BigQuery

google-bigquery
I have some trouble with Google BigQuery I need to build result in timezone UTC+05:45, but i get this error DATE_ADD 2nd argument must have INT32 type. query example SELECT DATE(DATE_ADD(time, 5.75, 'HOUR')) AS day, ... FROM ... WHERE ... AND ( DATE_ADD(time, 5.75, "HOUR") >= '2015-05-01 00:00:00' AND DATE_ADD(time,...

Truncate a table in GBQ

google-bigquery
I am trying to truncate an existing table in GBQ but the below command fails when I run it. Is there any specific command or syntax to do that. I looked into GBQ documentation but no luck. TRUNCATE TABLE [dw_test.test]; ...

BigQuery export with TIMESTAMP of derived tables broken?

google-bigquery
I noticed that export to storage from a BigQuery derived table (table constructed from a query of another table) does strip the TIMESTAMP from the result. Table table with TIMESTAMP Do a query on that table, example "SELECT user_id,subscription_date FROM [All.Users] LIMIT 1000" (reproduced it with one row, two columns)...

Extracting data using regexp_extract in Google BigQuery

sql,regex,extract,google-bigquery
I am trying to extract data from a column which has multiple characters and I am only interested in getting the specific string from the input string. My sample input and outputs are as below. How can I implement this using regexp_extract function.Can someone share their thoughts on this if...

Is data appended to a table or overwrite it if the table has existed already when streaming data into BigQuery

google-bigquery
When streaming data into a BigQuery table, I wonder if the default is to append the json data to a BigQuery table if the table has existed already? The api documentation for tabledata().insertAll() is very brief and doesn't mention parameters like configuration.load.writeDisposition as in a load job.

Insert nested data into BigQuery using Golang

go,google-bigquery
I can insert a flat object into BigQuery using Golang - how I can insert nested data into a table? My BigQuery schema looks like this (from the example): [{ "name": "kind", "mode": "nullable", "type": "string" }, { "name": "fullName", "type": "string", "mode": "required" }, { "name": "visit", "type": "record",...

BigQuery Basics - FROM clause while quering

sql,google-analytics,google-bigquery
A very basic question. I have a query as follows: What does the FROM clause mean here? SELECT [device.deviceCategory] as [device_Category] FROM [project-id:dataset.ga_sessions_20150309] [ga_sessions_20150309] GROUP BY 1 ORDER BY [device_Category] ASC What is the difference between the following two FROM clauses? FROM [project-id:dataset.ga_sessions_20150309] [ga_sessions_20150309] FROM [project-id:dataset.ga_sessions_20150309] ...

Is there a way to determine or specify what geo region BigQuery stores data in?

google-bigquery
Is there a way to determine what region (like these) BigQuery is storing my data in? More to the point, is there a way to specify where my data gets stored when sent into BigQuery? If it matters, I'm using both the POST method for bulk loading data and streaming...

BiqQuery - select values with max function

sql,max,google-bigquery
I have this kind of table (just for example.. in my case this table has over 60000 records) I would like to know how to make a select to get which gender rate a movieId (or songId) with higher rate? I do query like this: select * from ( SELECT...

Select first row from group each by with count using Big Query

sql,google-bigquery
I've got over 500million rows stored in BigQuery which basically represent the exact position of a device at a certain time (irregular). I'm trying to find a fast and efficient way to determine the first and the last seen position of the device. So far, I have it working with...

Exporting data from BigQuery to GCS - Partial transfer possible?

asynchronous,export,google-bigquery,google-cloud-storage,callblocking
I am currently exporting my data (from a destination table in Bigquery) to a bucket in GCS. Doing this programmatically using the Bigquery API. There is a constraint while exporting data from Bigquery to GCS - the data should not be greater than 1GB. Since my data in the destination...

How can I pivot dataset in Google BigQuery?

sql,pivot,google-bigquery
I have a massive dataset with this schema: Customer INTEGER CategoryID INTEGER CategoryName STRING ProjectStage INTEGER NextStepID INTEGER NextStepName STRING NextStepIsAnchor BOOLEAN I heed to get the resulting set where each customer will be only on one row and his/her next steps will be in the columnts like this: Customer...