Feed - Ben Barber

Apr 2, 2026

5 Best Market Data Sources for Stock Market Research and Quantitative Trading

A short guide to the 5 best market data sources for stock market research and quantitative trading, comparing Polygon, Tiingo, Nasdaq Data Link, Alpha Vantage, and Alpaca.

1 Permalink

Good strategy research starts with clean, reliable data. The best source depends on whether you care most about tick-level accuracy, broad historical coverage, ease of use, or cost. These five are the strongest options for most stock-market researchers and systematic traders.

1. Polygon.io (now branded Massive.com)

Polygon is one of the best all-around choices for strategy research because it offers real-time and historical U.S. market data through REST and WebSocket APIs, including tick-level trades and quotes, snapshots, and downloadable flat files for larger backtests. That makes it especially useful when you need both live trading inputs and large historical datasets in the same workflow.

2. Tiingo

Tiingo is a strong research-oriented source when you want clean end-of-day data, fundamentals, news, and IEX intraday coverage without a heavy integration burden. It is especially practical for factor research, portfolio models, and academic-style backtesting where data cleanliness often matters more than ultra-low latency.

3. Nasdaq Data Link

Nasdaq Data Link is one of the best platforms for broad historical research because it aggregates many datasets and supports multiple delivery methods, including REST, streaming, Python, R, and Excel. It is well suited to quants who combine price data with macro, alternative, or fundamentals datasets instead of relying only on raw market feeds.

4. Alpha Vantage

Alpha Vantage remains a good entry point for individual researchers and smaller projects. It provides real-time and historical data across stocks, ETFs, indices, FX, commodities, fundamentals, and technical indicators through simple APIs and spreadsheet-friendly access. It is not usually the first pick for institutional-grade execution systems, but it is very useful for prototyping and lightweight research pipelines.

5. Alpaca Market Data API

Alpaca is a strong choice if your research stack is closely tied to live algorithmic trading. Its market data API covers real-time and historical equities, options, and crypto, and its docs emphasise live streaming plus developer-friendly integration with trading workflows. That makes it attractive for people who want research, paper trading, and execution infrastructure close together.

Which one is best?

Overall from my experience, I recommend the following:

Best overall for quant trading: Polygon / Massive.
Best for clean research workflows: Tiingo.
Best for broad dataset discovery: Nasdaq Data Link.
Best for beginners and prototypes: Alpha Vantage.
Best for research-to-execution pipelines: Alpaca.

I would suggest starting out with Tiingo or Alpha Vantage if your on a budget, then move to Polygon/Massive when your models need deeper intraday coverage, and use Nasdaq Data Link when your edge depends on combining market data with other datasets. Alpaca is most compelling when your trading stack is built around its ecosystem.

Jan 24, 2023

Using Gzip for Storage Optimisation in Large CSV Data Sets

How to work with CSV.gzip files in Python and decompress them through the CLI.

1 Permalink

Working with CSV files can be a hassle, especially when the files are large. One way to make the process easier is to compress the files using gzip, which can significantly reduce the file size.

In this post, I’ll show you how to work with CSV.gzip files using Python and how you can decompress them through the command line interface so they can be opened in an application such as Excel.

Working with CSV.gzip files in Python

First, you’ll need to import the gzip module and the csv module. You can do this by running the following code:

import gzip
import csv

Next, you’ll need to open the gzipped CSV file. You can do this using the gzip.open() function, which works just like the built-in open() function, but automatically decompresses the file. Here’s an example:

with gzip.open('data.csv.gz', 'rt') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)

In this example, we’re using the with statement to open the file data.csv.gz in read mode. The rt mode stands for “text mode,” which tells the gzip.open() function to decompress the file and return it as a text file. The csv.reader() function is then used to read the decompressed file and return a reader object that can be iterated over to read the rows of the CSV file.

It is also possible to write data to csv.gzip file, you can do this by using the gzip.open() function in write mode. Here’s an example:

with gzip.open('data.csv.gz', 'wt') as f:
    writer = csv.writer(f)
    writer.writerow(['Ticker', 'Price', 'P/E Ratio'])
    writer.writerow(['TSLA', 143.00, 44.33])
    writer.writerow(['AAPL', 140.30, 23.32])

In this example, we’re using the with statement to open the file data.csv.gz in write mode. The wt mode stands for “text mode,” which tells the gzip.open() function to compress the file and return it as a text file. The csv.writer() function is then used to write the data and return a writer object that can be used to write the rows of the CSV file.

Working with CSV.gzip files in Python is a great way to save space and make your data processing tasks more efficient. With the gzip and csv modules, you can easily read and write compressed CSV files with minimal code.

How to decompress a CSV.gzip file using the CLI

You can decompress a CSV.gzip file using the command line interface (CLI) by using the gunzip command. The gunzip command is used to decompress files that have been compressed with the gzip command. Here’s an example of how to use the gunzip command to decompress a CSV.gzip file:

gunzip data.csv.gz

This command will decompress the file data.csv.gz and create a new file named data.csv. You can then open the data.csv file in Excel.

Alternatively, you can also use zcat command:

zcat data.csv.gz > data.csv

This command will decompress the file data.csv.gz and creates a new file named data.csv and pipe the output to the new file.

If you don’t have the gunzip or zcat command installed, you can install it using your package manager, such as apt or yum.

Once the command is run, you will have the decompressed file data.csv which you can open in excel and work with it as you would normally do with a csv file.

Dec 5, 2022

How to download a directory from S3 using the AWS CLI

By using the AWS CLI and its `aws s3 cp` command, you can download a folder directly from an S3 bucket to your local machine.

1 Permalink

The AWS CLI has the required functionality for you to download a folder directly from an AWS S3 Bucket to your local machine.

To get started, make sure you have the AWS CLI installed and then create a folder such as ~/data on your local machine where you wish to store your S3 Bucket downloads.

Using the aws s3 cp [bucketURI] [localDirPath] command you can download a file directly from an S3 bucket to your local machine, but to make this work with folders or directories we need to also pass the --recursive flag.

This command tells the CLI to recursively download all files and folders from the location of the S3 Bucket URI to the ~/data directory on our local machine.

aws s3 cp s3://your-s3-bucket/path ~/data --recursive

Performing a dry run

If it’s a large folder with a lot of files you may wish to do a dry run first by passing the --dry-run flag, this will simulate the the download action without any files actually getting transfered, highlighting any issues or errors along the way.

aws s3 cp s3://your-s3-bucket/path ~/data --recursive --dry-run

Filtering file types

By default when downloading with the --recursive flag from the S3 bucket it will include all the files. If you only want to include files of a certain type in your download request it is possible to filter them by using the --exclude and --include flags.

Its important to note that to use the --include flag correctly you have to first exclude all files with the --exclude "*" flag, then add the --include flags for your chosen file types. The order here is important when setting both of these as the filters that appear later in the command have higher precedence.

The example below will download all files recursivly from the specified S3 bucket location that have a .csv extension.

aws s3 cp s3://your-s3-bucket/path ~/data --recursive --exclude "*" --include "*.csv"

To download multiple file types in a single request you can pass additional --include flags like in the example below which downloads both .csv and .xls files.

aws s3 cp s3://your-s3-bucket/path ~/data --recursive --exclude "*" --include "*.csv" --include "*.xls"

References

You can read more about the available flags and options in the official AWS CLI documentation.

Dec 10, 2018

Using TypeScript in a Phoenix Project

How to setup and use TypeScript in a Phoenix project.

1 Permalink

For the last two year’s I have been working mostly with Typescript when developing for the front-end and it has fast become my preferred default. Unfortunately Typescript is not part of the default setup when starting a new project with the Elixir Phoenix Framework, so there are some steps I regularly have to go through to setup Typescript in a new Phoenix project.

mix phx.new my_app

cd my_app/assets

npm install

npm install --save-dev typescript @babel/preset-typescript eslint eslint-config-prettier eslint-plugin-prettier prettier @typescript-eslint/parser @typescript-eslint/eslint-plugin

Once we have installed TypeScript and a few other necessities you will then need to add tsconfig.json and a tslint.json configuration files.

// my_app/assets/tsconfig.json
{
  "compilerOptions": {
    "target": "es5",
    "module": "esnext",
    // Search under node_modules for non-relative imports.
    "moduleResolution": "node",
    // Process & infer types from .js files.
    "allowJs": true,
    // Don't emit; allow Babel to transform files.
    "noEmit": true,
    // Enable strictest settings like strictNullChecks & noImplicitAny.
    "strict": true,
    // Disallow features that require cross-file information for emit.
    "isolatedModules": false,
    // Import non-ES modules as default imports.
    "esModuleInterop": true
  },
  "include": ["src"]
}

// my_app/assets/tslint.json
{
  "rulesDirectory": ["tslint-plugin-prettier"],
  "extends": ["tslint-config-prettier"],
  "rules": {
    "prettier": true
  }
}

Then we need to tell Babel to use the Typescript preset by adding @babel/preset-typescript to your .babelrc file. Then it should look like below:

// my_app/assets/.babelrc
{
  "presets": ["@babel/preset-env", "@babel/preset-typescript"]
}

Typescript files use a .ts file extension, so we need to change the default Phoenix .js files over to use this extension. Also while making this change I rename the js directory to src.

You should then have a structure in your assets directory like below (some files omitted for brevity).

- css
- node_modules
- src
  - app.ts
  - socket.ts
- static
- vendor

Then lastly we need to make a few changes to the Webpack configuration so it is aware of the news files and structure. First we need to change the entrypoint to use our new src directory and app.ts file.

{
  entry: {
      './src/app.ts': ['./src/app.ts'].concat(glob.sync('./vendor/**/*.js'))
  },
}

Then we need to update the Babel loader to transpile files with a .ts extension.

{
  test: /\.ts$/,
  exclude: /node_modules/,
  use: {
    loader: 'babel-loader'
  }
}

Then you can fire up your Phoenix application with mix phx.server and you are ready to write your client side functionality in Typescript. You can check your up and running with a simple typed function like below, which Phoenix should then live-reload and display the prompt.

const say = (message: string) => alert(message);

say("Hello world!");

Apr 21, 2018

Serving Additional Static Assets in a Phoenix Application

How to setup and serve additional static assets in Phoenix.

1 Permalink

I recently had the requirement to serve some static files from within a Elixir Phoenix application but this was not as simple as dropping the files into a directory under priv/static.

Phoenix by default has the static paths set to a defined list of files and directories, therefore any non standard assets that we want to serve must be explicitly declared in the endpoint configuration.

The endpoint configuration can be found in /lib/<app-name>Web/endpoint.ex which if you look trough you will find the Plug used for adding static paths.

plug Plug.Static,
  at: "/",
  from: :my_phoenix_app,
  gzip: true,
  only: ~w(css fonts images js favicon.ico robots.txt)

For my use-case I needed the static files served from a /docs directory so I created a new directory priv/static/docs that contained the files and then added the path to the endpoint configuration like so.

plug Plug.Static,
  at: "/",
  from: :my_phoenix_app,
  gzip: true,
  only: ~w(css fonts images js favicon.ico robots.txt docs)

Then that was all there was to it, everything now worked perfectly.

Jul 2, 2017

Sending Rails Application Logs to Logstash over UDP

How to offload all of your Rails logs to an external ELK stack which consists of Elastic Search, LogStash and Kibanna.

1 Permalink

Tracking down issues by searching though log files can be difficult, especially if your application is deployed across multiple instances. Thankfully this can be made easier by offloading all of your Rails logs to an external ELK stack which consists of Elastic Search, LogStash and Kibanna.

To get our ELK stack up and running we will use Docker and Docker Compose pulling the hugely popular sebp/elk docker image. As it is though, sebp/elk isn’t pre-configured with support for UDP so we need to add a few modifications for that.

Updated For Rails 6 - 10th of October 2019
Following some requests I have now updated this guide to work with Rails 6 and provided an example application, put together using this guide, that is available over on GitHub.

Dockerfile configuration

Firstly create a new Dockerfile pulling from the sebp/elk image and set our working directory to ${LOGSTASH_HOME}. Then for LogStash to support receiving events over UDP we need to add a couple of plugins, these are logstash-input-udp for enabling the UDP protocol, and logstash-codec-json_lines as we are going to send or logs across in this format.

# ELK Stack Dockerfile with UDP
from sebp/elk

WORKDIR ${LOGSTASH_HOME}

# Plugins to install
RUN gosu logstash bin/logstash-plugin install logstash-input-udp
RUN gosu logstash bin/logstash-plugin install logstash-codec-json_lines

Next we need to configure a LogStash listener so it knows hoe to receive the logs from our Rails Application. Create a new file called rails-udp-input.conf alongside your Dockerfile and place the following content in it. This simply tells LogStash what protocol and port it should listen on along with the codec that it should use to deserialize any logs received.

input {
  udp {
    port => 5228
    codec => json_lines
  }
}

Now we need to tell LogStash to use this config file by adding the following to our Dockerfile.

# Add LogStash Listeners
ADD ./rails-udp-input.conf /etc/logstash/conf.d/rails-udp-input.conf

Docker Compose configuration

To run your ELK stack you can put the following into a docker-compose.yml file alongside your Dockerfile, you can then run your stack with the docker-compose up -d command. The important thing to note here, is that for any LogStash listeners you have added you will need to add a port mapping for them in the compose configuration. In this instance I have added - "5228:5228/udp", the internal port mapping also needs to be prepended with /udp as our listener is using the UDP protocol.

version: "3"

services:
  elk:
    build: .
    ports:
      - "5228:5228/udp"
      - "5601:5601"
      - "9200:9200"
      - "5044:5044"
    volumes:
      - elk-data:/var/lib/elasticsearch

volumes:
  elk-data:

Rails configuration

Now that our ELK stack is up and running we need to tell our Rails Application to start sending its logs to it. For this we will use the logstash-logger gem which you will need to add to your Gemfile as below.

# Add LogstashLogger for sending logs to Elk
gem 'logstash-logger'

Then we need to create a logstash initilaizer under config/initializers/logstash.rb with the contents below. You will also need to make LOGSTASH_HOST and LOGSTASH_PORT available as environment variables, note that the value of LOGSTASH_HOST should not include any protocol such as https:// or http:// as we are using UDP.

# config/initializers/logstash.rb
# Configure LogStashLogger to send logs to our remote Elk stack in production
if ENV["LOGSTASH_ENABLED"] == "true"
  Rails.logger = LogStashLogger.new(
    type: :multi_delegator,
    outputs: [
      { type: :file, path: "log/#{Rails.env}.log" },
      { type: :udp, host: ENV["LOGSTASH_HOST"], port: ENV["LOGSTASH_PORT"] }
    ]
  )

  # Add metadata for Logstash
  LogStashLogger.configure do |config|
    config.customize_event do |event|
      event["application"] = Rails.application.config.session_options[:key].sub(/^_/,"").sub(/_session/,"")
      event["environment"] = Rails.env

      event["@metadata"] = {
        beat: event["application"]
      }
    end
  end
end

Thats it! You should now be able to run your Rails application and see your logs appearing in the Losgstash instance.

For further reference you can find a working example of a Rails application using Logstash Logging as detailed in this guide over on GitHub.

Jun 15, 2017

Testing External API integrations in Rails

How to test external API calls in a Rails application.

1 Permalink

When your Rails application depends on a lot of communication with external services and API’s its important that these integrations are tested properly.

At the same time we dont want the test suite to be hitting the external API’s when they are ran as this is not practical and will greatly increase the time it take for the tests to complete. Thankfully we can easily get around this though a process call ‘stubbing’ and utilising the WebMock ruby gem.

Adding WebMock for Stubbing external API Calls

WebMock works by listening in the background when the test suite is running and intercepting any requests that we have defined a stub request for, it will then return the mock response that we have defined. Mock responses are a copy of what a real world response would be like from the external API.

To get started you will first need to add webmock to your Gemfile and then add the following to the top of your /test/test_helper.rb file initialize WebMock and disable external HTTP/HTTPS request.

# /test/test_helper.rb
require 'webmock/minitest'

WebMock.disable_net_connect!

Creating Mock Response Fixtures

Sometimes mock API responses can contain a large data payload which can quickly bloat out our ruby test classes, I prefer to store these mock responses as fixtures which keeps our code lean and also makes them reusable.

Lets create our first mock which is an access token response, this will live in a new directory called mocks that will be placed under /test/fixtures.

/* /test/fixtures/mocks/access_token_response.json */
{
  "access_token": "73bff7ffeda047ca892e54b289811011380b4bfabafff3bcb",
  "token_type": "bearer",
  "expires_in": 3599,
  "refresh_token": "dd8fbf83fa81bd21d384166de292959bc7ea388837a4e70a",
  "scope": "user",
  "created_at": 1495701071
}

So we can easily use any mocks that we have created lets also create a helper module that will give us access to them in our tests.

# /test/support/mock_fixtures.rb
module MockFixtures
  def mocks(filename)
    File.read(mocks_path_for(filename))
  end

  private

  def mocks_path_for(filename)
    File.join(File.dirname(__FILE__), '../fixtures/mocks/', filename)
  end
end

Now we have that added, we just need to tell the Rails test helper to load our helper module by adding the following to the top of the file..

# /test/test_helper.rb
require_relative './support/mock_fixtures'

Using WebMock in your tests

With all of the above setup you can now use WebMock in your tests, serving any of your mocks in response to and external API calls that you define a stub for.

# /test/controllers/oauth_controller_test.rb
require 'test_helper'

class OauthTokenTest < ActionDispatch::IntegrationTest
  setup do
    @token_url = 'https://example-oauth-api.com/auth/token'

    stub_request(:post, @token_url).to_return(
      status: 200,
      body: mocks('access_token_response.json'),
      headers: { 'Content-Type' => 'application/json' }
    )
  end
end

That`s all there is to it really and now you can be at ease knowing that the the code you have wrote to interact with the external API is now fully tested.

Loading more posts...