Using libraries from one language in another language

Using libraries from one language in another language

I was recently faced with a challenge where I was building an application in Ruby, and for one specific task that I wanted to accomplish, there were no libraries for it in Ruby, there were only libraries in JavaScript.
Obviously I could not rewrite the entire application in JavaScript for a single library, but I needed that library nonetheless.

There are a few different ways I looked into for solving this problem.

  1. Creating an HTTP API in JavaScript to consume from Ruby
  2. Using a cross-language RPC framework like Apache Thrift
  3. Creating a Node script and executing it from Ruby

I am not a fan of premature optimization, and it's likely that YAGNI. Keeping that in mind, I wanted the simplest solution for the problem that wouldn't need too much time to develop or maintain, and which could be stripped out for a more optimized approach when it's actually needed.

With approach 1, here's the problem:
You will need to create a whole another microservice, take care of hosting it, it's one more server you need to run during development and in production. You will need to add error handling code, because what happens in case that microservice goes down? It's added complexity when you don't know you're necessarily gonna benefit from it.

With approach 2, here's the problem:
Ultimately this would be the ideal solution, but it is complicated and time consuming to implement, and the time spent setting that up could be spent working on features or bugfixes instead.

I decided to go with approach 3 instead. My specific problem was to perform certain parsing of web pages using puppeteer.

  • I created a Node script that takes the web page URL as a command line argument
  • The Node script loads the URL using Puppeteer (headless chrome), does some manipulation and parsing of the page, and
  • It dumps the manipulated data into a file, and outputs the filename to STDOUT.
'use strict';

const puppeteer = require('puppeteer');
const JSDOM = require("jsdom").JSDOM;
const Readability = require("readability");
const fs = require("fs");
const crypto = require("crypto");
const mkdirp = require("mkdirp");

const loadPage = async () => {
    let browser;
    let res = {};
    const baseDir = "/tmp/project_name/pagedata/";
    mkdirp.sync(baseDir);
    const filename = baseDir + crypto.randomBytes(20).toString('hex') + ".json";
    try {
        browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
        const page = await browser.newPage();
        // Don't load images, saves a lot of bandwidth and loading time
        await page.setRequestInterception(true);
        page.on('request', request => {
            if (request.resourceType() === 'image') {
                request.abort();
            } else {
                request.continue();
            }
        });
        await page.goto(process.argv[2], {timeout: 30000, waitUntil: 'networkidle2'});
        await page.waitFor(250);
        
        // Additional processing steps omitted for brevity

        let html = await page.content();
        let text = await page.$eval("body", e => e.innerText);
        let res = {
            page_text: text,
            page_html: html
        };
        let res_string = JSON.stringify(res);
        fs.writeFileSync(filename, res_string, "utf8");
    } catch (err) {
        // console.error(err);
        res.error = err;
        let res_string = JSON.stringify(res);
        fs.writeFileSync(filename, res_string, "utf8");
    } finally {
        if (browser) {
            browser.close();
        }
        console.log(filename);
        process.exit();
    }
};

loadPage();

And from Ruby, this is how I consume this script:

require "http"
require "nokogiri"
require_relative "base"

module FetchUrl
  include Service

  def self.call(url)
    url = ParseUrl.call(url)
    headless_req = `node node_scripts/load_page_headlessly.js #{url}`.chomp
    data_file = JSON.parse(File.read(headless_req))
    if data_file["error"]
      puts data_file["error"]
    else
      html = data_file["page_html"]
      text = data_file["page_text"]
      {
          title: title,
          html: html
      }
    end
  end
end

I agree that this might not be the most element solution that there is, but it works, it's quick and easy to implement, and I can easily replace this solution when I know that the applications needs are no longer fulfilled by it.