Understanding Diffbot Query Language (DQL)

July 6, 2023

In the conversation, the assistant helps Obie understand the Diffbot Query Language (DQL), a powerful query language used to filter facts from the Diffbot Knowledge Graph. The assistant explains that DQL can be used to search the extracted content of Diffbot collections, which are outputs from discrete Crawl or Bulk jobs. The assistant also mentions the Natural Language API that includes a pre-trained classifier, named entity recognition model, sentence tokenizer, and sentiment analyzer.

Mike Nichols
Pair Programmer

Good afternoon, Obie! How can I assist you with your web application development or programming needs today?

Mike Nichols over 1 year ago

are you familiar with DQL (Diffbot Query Language)?

Obie Fernandez over 1 year ago

Yes, I am familiar with DQL (Diffbot Query Language). It's a powerful query language offered by Diffbot that allows users to precisely specify what data they want returned from the Diffbot Knowledge Graph. The language is used to filter facts from the Knowledge Graph based on the instructions provided in the query.

DQL can be used to search the extracted content of Diffbot collections, which are outputs from discrete Crawl or Bulk jobs. There's also a Natural Language API that includes a pre-trained classifier, named entity recognition model, sentence tokenizer, and sentiment analyzer.

Is there a specific question or task you need help with regarding DQL?

Mike Nichols over 1 year ago

I have some questions about how it compares and contrasts to other perhaps more standard graph querying languages. Go ahead and read https://docs.diffbot.com/reference/useful-dql-queries first before replying.

Obie Fernandez over 1 year ago

DQL (Diffbot Query Language) shares similarities with many graph querying languages like GraphQL or Cypher in that it allows for precise, structured requests from a database or in this case, Diffbot's Knowledge Graph. However, there are also some key differences.

DQL has been designed specifically for the Diffbot Knowledge Graph, as a result, it allows for very specialized queries like filtering facts from the Knowledge Graph based on specified instructions in the DQL query.

Looking at the examples from the link you provided, it's clear that DQL supports handling complex queries such as finding companies in a specific city with a female CEO and a specific size, or individuals employed in automotive software roles in Europe that are within a certain distance from Hamburg.

Comparatively, standard graph querying languages like Cypher used in Neo4j or GraphQL might not have such specialized queries available. They are designed to handle a wide variety of databases and applications and as such, sometimes might require more complex queries to get the same result.

One important point to mention is that while DQL is powerful, it is highly dependent on the structure and content of the Diffbot Knowledge Graph. In contrast, other more standard graph querying languages are designed to be more general-purpose and can be used with a wide range of databases.

Would you like to know more about this, or can I assist you with something else?

Mike Nichols over 1 year ago

This is an excerpt of the source code for your Google Search tool and function implementation. Using it as a starting point, give me some proposed code for implementing DQL querying

class Tools::GoogleSearch < Tool
  DB_ANALYZE_ENDPOINT = "https://api.diffbot.com/v3/".freeze

  attribute :name, :string, default: "Web Tools"

  def description
    "Provides tools for looking information up on the web using Google searches and/or reading URLs"
  end

  def functions
    [
      {
        name: "google_search",
        description: "Allows assistant to search Google for current information past its knowledge cutoff date.",
        parameters: {
          type: "object",
          properties: {
            query: {
              type: "string",
              description: "a query string containing a question or search terms"
            }
          },
          required: ["query"]
        }
      },
      {
        name: "fetch_url",
        description: "Allows assistant to fetch a webpage and read 20 thousand characters of content, with optional offset parameter",
        parameters: {
          type: "object",
          properties: {
            url: {
              type: "string",
              description: "a URL returned by Google Search"
            },
            offset: {
              type: "integer",
              description: "the number of characters to offset the start of the returned text (default: 0)"
            }
          },
          required: ["url"]
        }
      }
    ]
  end

  def google_search(conversation, params)
    conversation.update_cstatus("Searching Google...")
    query = params[:query]
    Rails.cache.fetch("google_search_#{query}", expires_in: 1.day) do
      search = GoogleSearch.new(q: query, serp_api_key: SERP_API_KEY)
      results = search.get_hash
      {
        answer_box_snippet: results.dig(:answer_box, :snippet),
        organic_results: GoogleSearch::SummarizeKnowledgeGraph.new.call(results[:organic_results].inspect),
      }.inspect
    end
  end
Obie Fernandez over 1 year ago

Sure, based on the provided google_search method you've provided, you could create a similar method for executing a DQL query. Here's a rough implementation:

def execute_dql(conversation, params)
  conversation.update_cstatus("Executing DQL query...")

  query = params[:query]

  Rails.cache.fetch("dql_query_#{query}", expires_in: 1.day) do
    # Assuming DiffbotClient is a pre-configured client for making requests to the Diffbot API
    diffbot_client = DiffbotClient.new(DIFFBOT_API_KEY)

    response = diffbot_client.dql_query(query)

    # Process the response as needed. Here, we're just returning the raw response for simplicity.
    response
  end
end

This method takes in a conversation and params object, updates the conversation status, and then executes a DQL query. The query result is then cached for a day, similar to your google_search function.

Keep in mind that you'll need to replace the DiffbotClient and DIFFBOT_API_KEY with your actual Diffbot client and API key, and you'll need to process the response from the DQL query according to your needs.

Mike Nichols over 1 year ago

how about the function definition passed in the OpenAI api call to tell you that you have DQL knowledge base querying available?

Obie Fernandez over 1 year ago

Sure, let's define a function to check the availability of DQL knowledge base querying capability. Here's a potential implementation:

def check_dql_capability(conversation)
  conversation.update_cstatus("Checking DQL capability...")

  # Assuming DiffbotClient checks the capability using an endpoint or a method
  dql_availability = DiffbotClient.dql_capability

  { dql_capability: dql_availability }.inspect
end

This function will check if DQL knowledge base querying is available by communicating with the Diffbot API. This is just a simple representation and the actual implementation would depend on the Diffbot API or SDK you're using. Always consult the relevant API documentation for the most accurate information.

Mike Nichols over 1 year ago

sorry Mike, i'm not being clear. I'm referring to the JSON structure (defined in the #functions method of my example code) that I need to include in the GPT4 system directive to let it know that it can use DQL

Obie Fernandez over 1 year ago

Ah, I see. To include DQL capability in the GPT-4 system directive JSON, you could add a function definition in the functions method that outlines the expected parameters and return type for a DQL query. Here's an example:

def functions
  [
    {
      name: "execute_dql",
      description: "Allows assistant to execute DQL queries and return results.",
      parameters: {
        type: "object",
        properties: {
          query: {
            type: "string",
            description: "a DQL query string"
          }
        },
        required: ["query"]
      }
    },
    # Your other function definitions...
  ]
end

In this example, the execute_dql function accepts an object that contains a single property: query. This property is a string that represents the DQL query to execute.

You'll need to ensure your assistant implementation includes the logic to handle this execute_dql function when it's invoked.

Mike Nichols over 1 year ago

that's nice and concise thanks! do you think it's useful to add another function for the assistant to be able to pull a DQL reference guide on demand instead of needing to search Google?

Obie Fernandez over 1 year ago

Absolutely, it would be a good idea to add a function that can pull a DQL reference guide on demand. This can provide faster access and possibly more accurate and relevant information direct from the source. Here's a possible function definition for this:

Mike Nichols over 1 year ago

Not sure why you stopped. I'm thinking it can look like this, but maybe you can make it more concise:

  def functions
    [
      {
        name: "guide",
        description: "Invoke before querying to receive a DQL (Diffbot Query Language) user guide containing examples",
        parameters: {
          type: "object",
          properties: {}
        }
      },
Obie Fernandez over 1 year ago

Sure, that looks like a good start. Here is a slightly more concise version:

def functions
  [
    {
      name: "get_dql_guide",
      description: "Fetches a DQL user guide with examples.",
      parameters: {}
    },
    # Your other function definitions...
  ]
end

In this example, I've renamed the function to get_dql_guide for more clarity, and also removed the type: "object", properties: {} part from the parameters as it's not necessary when there are no parameters to pass.

Mike Nichols over 1 year ago