Real Estate Consulting Data Project (2024)

Using Zillow data to advise a real estate investment firm

Published in

Towards Data Science

21 min read

Loading in the data and initial observations

First, we need to import all the libraries to use; we will be using many standard Python data science packages, especially pandas and statsmodels.

Now using pandas to read the data:

Looking at the raw data, we can see that there are two columns with 5 digit numbers to identify the “regions.” A few quick Google searches reveals that the ‘RegionName’ column contains the zip codes. The other column is not useful to us. We can also observe that the data are in what is called “wide format,” where the dates and their corresponding values for each zip code are contained in columns which expand the dataframe to the right, which is why it has 273 columns. To convert the data to long format, pandas’ melt method can be used. For our purposes, we will want the data in long format. Testing found that the sizeRank feature was not useful.

This operation has duplicated each zip code for each date that was found in the columns, leading to a dataframe with 3,901,595 rows. It would be possible to use a MultiIndex DataFrame to work with panel data at this point, but using a GroupBy object is the method I will use here. First, the ‘date’ column needs to be converted to datetime, and we need to check for missing values.

This last operation reveals that there are 156,891 missing values in the ‘value’ column, which we will need to keep in mind moving forward. Now, let’s create our GroupBy object, and take our first look at some zip codes.

Having our first look at some data, we can see that the individual zip codes have many differences. Just looking at the first 6 out of 14,723, we can see differences in average value and growth rate. We can also see that there is a general trend that they are following, with rising values up until the subprime mortgage crisis between 2007 and 2010 crashed the market, and then a subsequent recovery beginning around 2013. Just looking at these 6, we can see that some zip codes weathered the storm better than others, and this would be an attractive feature for investors who realize that future downturns may be caused by unforeseen macroeconomic effects.

We now have some ideas about what we might look for when identifying the “5 best zip codes” to recommend to an investor: growth rate and resilience to poor economic environment; but there are others to consider. First, one would feel most confident in an investment if the expected rate of growth is complemented by low expected volatility. Second, it would be best to lease the owned properties to renters during the holding period, so that the capital is not just sitting around doing nothing. This means that we should recommend zip codes that not only have strong expected growth, low expected variance, and resilience to poor economic conditions, but that also have a better than average leasing profitability. In order to determine this, we can turn to a metric called Price-to-Rent (P2R) Ratio. This ratio gives us an idea of the proportion between the price to purchase and own a property in a given area, and the amount that renters are willing to pay per year there. This Zillow research article by Jamie Anderson provides more considerations real estate investors should make about the profitability of renting properties, and provides the data which we will import now.

Modeling a single zip code

Before we can iterate through all of our zip codes and produce a model for each, we need a good way to quickly generate a well-fit model for a single zip code. Statsmodels gives us great functionality with SARIMAX models, but it does not contain a method to determine the optimal orders for such a model. Grid searching using loops and comparing AIC scores is painfully slow, especially if one wants to consider lags up to, say, 5 or 6 for the AR and MA terms. Luckily, the forecast package for R has an auto.arima function which can quickly find optimal orders and coefficients given a time series. With this, we have the option of either taking the suggested orders and have statsmodels estimate the coefficients, or we could smooth a statsmodels SARIMAX of the suggested order with the given coefficients. It was found during this study that the coefficients estimated by the auto.arima function consistently produced lower AIC scores than those estimated using statsmodels, so we will make our models using the former. To see how we will do this, we first need to look at how we can call an R function from Python using rpy2. We will start by importing the necessary items.

We can now use functions from the forecast package in our Python workflow, but making use of the outputs will take a bit of tinkering. The coefficients and their names are available in vectors attached to the returned object, but the orders are not stored in such a convenient way. Let’s see what a printout of the return from the auto.arima function looks like, using the first zip code in our data set for the test. Note that we must cast the time series as a FloatVector so it is compatible with the R function. Also, take note of how the time series is accessed from the pandas GroubBy object, by getting a group by its name (zip code), setting the date column as the index, then taking the value column. Also note that Python drops the first zeros in these “lower” zip codes because they are numeric values, but we know it is really a 5 digit zip code with a zero on the front. In financial time series analysis, it is general practice to model the returns of a series, rather than the prices, because this gives us a common scale on which to compare all of our zip codes. For mathematical convenience, the log returns are generally favored over percentage increases, since they can be cumulatively summed over time. These are generated by taking the differenced log values of a series. This produces a NaN value on the first date of the series which we will drop. Also, the frequency ‘MS’ is assigned to the resulting returns series as this tells pandas that the data are monthly.

Here we can see the return stream of the first zip code. We can see that the series has trends, and the variance is not constant, meaning it is not stationary. The trends can be removed by another order of differencing, but the heteroscedasticity will remain, invariably leading to heteroscedastic errors in our model. Although the residuals will be heteroscedastic, as long as the errors are centered around zero across time, with a roughly normal distribution, and not serially correlated, the model will be mostly effective. The model will estimate parameters based on the log likelihood of the data given the estimated parameters, so what will be affected by the inconsistent volatility will be the estimate of sigma2 (variance) for the model, which will find a happy medium between the lower variance of the early years and the higher variance after around 2009. This is mostly important in our forecasting considerations because the model will generate confidence intervals based on an estimate of variance which is somewhat biased to an outdated market regime with lower variance, leading to narrower confidence bands than might actually be appropriate.

However, despite this issue with forecasted confidence intervals, when comparing the models of many zip codes which are experiencing different degrees of variance swings during and after the crash, there is an advantage to using the entire time series to estimate model parameters, even with heteroscedasticity present, because periods of more intense volatility during the crash will lead to higher estimates of variance in the models fit to zip codes which did not handle the crash well. This means that once all of the models are generated, those with more optimistic outlooks for future variance will be those which were fed data that did not have as large of spikes in volatility; in other words, those that showed resilience during the crash, meaning that the comparisons of the models will still work in our favor to find what we are looking for. One could adjust for the biased sigma2 estimate for forecasting after the fact, changing just this parameter in the smoothing step to a value which is more reflective of the variance in recent years before performing forecasts, but since this is a hypothetical project with plenty to cover already, I will leave this out.

The traditional method for finding the orders for ARMA models is the Box-Jenkins method which uses the ACF and PACF to visually find spikes in autocorrelation. We will be using a more modern approach with the auto.arima function, but it would be interesting to take a look at the ACF and PACF for our first zip code for a visual reference along with the suggested orders. Remember the returns will need an order of differencing, so we apply this before looking at the autocorrelation to be modeled, and drop the leading NaN value.

We can see there certainly is some serial correlation in the series, but it is difficult to tell exactly what orders would best model it. It looks like the yearly seasonality is evident at 12 lags, and there is some strong autocorrelation up to lag 5. Let’s see what the object returned from the auto.arima call looks like. Note that when making the time series object using the ts function in R, the frequency is given as 12 to indicate monthly data.

Series: structure(c(-0.00265604405811537, -0.00177462335836864, -0.00266785396119396, -0.00178253166628295, -0.00178571476023492, -0.000894054596880522, -0.000894854645842713, 0, 0.00178890924272324, 0.00178571476023492, 0.00178253166628295, 0.0017793599000786, 0.002663117419484, 0.0017714796483812, 0.0026513493216207, 0.00264433825308963, 0.00263736416608573, 0.00263042676877312, 0.00262352577238545, 0.00261666089117085, 0.0034782643763247, 0.00346620797648711, 0.0025917941074276, 0.00258509407210461, 0.00171969087952739, 0.00171673861905397, 0.000857265376119187, 0.000856531101616653, 0.000855798083895465, 0.0017094021256483, 0.00170648505575954, 0.00170357792478271, 0.00254993763327249, 0.004235499766855, 0.00337553063128126, 0.00336417474563255, 0.00335289501031077, 0.00417537141048108, 0.00332779009267448, 0.00414422474602638, 0.00330305832925681, 0.00329218404347742, 0.00328138112317689, 0.00408664238545242, 0.00407000968829685, 0.00567032706008774, 0.00483482005458313, 0.00401123682645377, 0.00399521106728784, 0.0039793128514809, 0.00396354066245586, 0.00473560474583401, 0.00392927813988919, 0.00469484430420763, 0.00544960476756629, 0.00464756839654612, 0.00616334770766791, 0.0068886609951857, 0.00608366895361456, 0.00604688161489264, 0.00526119214336163, 0.00523365680611043, 0.00520640819057405, 0.00444116199996714, 0.00515654917924557, 0.00513009553102961, 0.00510391191817661, 0.0065241260769433, 0.00719945514285492, 0.00857148105014005, 0.00849863472146239, 0.00842701616188002, 0.00835659459094273, 0.00828734024856992, 0.00958255108099593, 0.00949159668157051, 0.00873367996875452, 0.00932097294306367, 0.0112027530765531, 0.0123739774874423, 0.00902067972593201, 0.00575633185257551, 0.00572338605268641, 0.00632113356336994, 0.00753299230754578, 0.00747667034301891, 0.00680485149838539, 0.0067588582951057, 0.00610502506680355, 0.0072771697738947, 0.00782429510752891, 0.0077635504899245, 0.00711324872451868, 0.0076493459184892, 0.00817284175587396, 0.00925932541279728, 0.0097449897009394, 0.010780246243792, 0.0112234623698484, 0.011650617219976, 0.0120615497338186, 0.0119178008640368, 0.0101795682134629, 0.0100769879503577, 0.0110208410142789, 0.0119326967118454, 0.0128108336717592, 0.0146578312983845, 0.014940516954951, 0.0156942897625889, 0.0149725086303825, 0.01380645591966, 0.012685159527317, 0.0120651115518431, 0.0100964602510096, 0.00682596507039968, 0.0049762599862273, 0.00270392233240102, 0.00134922440076579, 0.000449337235149727, -0.000898876465017295, -0.0013498314760465, -0.00225377602725274, -0.00225886700972744, -0.00181077460070966, -0.000453206443289389, 0, 0.000453206443289389, 0.00181077460070966, 0.00225886700972744, 0.00180342699915137, 0.000900495333249651, -0.0013510472669811, -0.00270758288154482, -0.00452899324870693, -0.00500569873443624, -0.00503088186627743, -0.0045955963233375, -0.00415417167913823, -0.00463607691747825, -0.00605921219926842, -0.00703732672057633, -0.00756147270057639, -0.00809721023262, -0.00623652776946138, -0.00482393603085285, -0.00436152860052985, -0.00389294895540893, -0.00390816325475818, -0.00294117859081666, -0.0024576075284326, -0.00345082915773354, -0.00395844158642866, -0.00347653690108984, -0.00348866538730341, -0.0040020063418531, -0.00351494210744541, -0.00453744161395342, -0.00354341043998474, -0.000507228009860583, 0.00202737018250154, 0.00303336936332954, 0.00252079790656623, -0.00151171608532152, -0.00201918291723047, -0.000505433419908385, 0.00252461633713885, 0.00402617554439999, 0.00300902935161851, -0.00100200409185014, -0.00301205046999264, -0.00706360151874996, -0.00813425939068146, -0.0123268124806586, -0.0135277817381336, -0.00683314091078202, 0.00263365967346196, 0.000525900616906938, -0.00263227317032033, -0.00422610243350618, -0.00424403820047914, -0.00266169973486008, 0.00159786984729493, 0.00318810046864648, -0.00106157122495887, -0.0042575902526405, 0, 0.00159872136369721, -0.00320000273067222, -0.00535619920052355, -0.00215053846322988, -0.00269469308842396, -0.00704037568232785, -0.00873367996875452, -0.00660068403135128, -0.00442478598035656, 0, 0.00442478598035656, 0.00495459312468327, 0.00383667228648576, 0.00327690080231591, 0.00163443239871519, -0.00217983737542049, 0.00109051264896465, 0.00705949171296005, 0.0118344576470033, 0.0085197533447694, 0.00264760546505549, -0.00211752328990755, -0.00318471606752091, -0.00533050302693994, -0.00643433855264597, -0.00431267514794698, -0.00162206037186685, 0.00054097918549445, -0.0021656749124972, 0, 0.0032467560988696, 0.00377460680028641, 0.00322407587175277, 0.00267881221002675, 0.00267165534280522, 0.00266453661509658, 0.00477834729198179, 0.00528263246442684, 0.00525487283835879, 0.00366204960651473, 0.00364868791549178, 0.00104004169541305, -0.0015604684570949, -0.00312826115380993, -0.00261438057407126, 0.00104657257067053, 0.00573665197743445, 0.00673752224774127, 0.00515199490942919, 0.0035906681307285, 0.00306748706786131, 0, 0.00102040825180616, 0.00508648095637376, 0.00455581653586101, 0.00654419652421367, 0.00899106955985651, 0.00496032763096999, 0.000989119764124524, 0.00148184764088199, 0.000493461643371162, -0.000493461643371162, 0.00639608576582695, 0.0102464912091715, 0.0106229873912866, 0.0109864969415678, 0.00945633524203515, 0.00656662772389005, 0.00837993730674924, 0.0115235200038608, 0.00866991513344573, 0.00453309933098467, 0.00271002875886417, 0, 0, 0.00450045764105589, 0.00403316701762257), .Tsp = c(1, 22.9166666666667, 12), class = "ts") ARIMA(3,1,1)(0,0,2)[12] 
Coefficients:
 ar1 ar2 ar3 ma1 sma1 sma2
 -0.5119 0.0622 -0.3987 0.9257 -0.4116 -0.3586
s.e. 0.0611 0.0649 0.0578 0.0259 0.0666 0.0703
sigma^2 estimated as 2.524e-06: log likelihood=1320.17
AIC=-2626.35 AICc=-2625.91 BIC=-2601.34

We can see this is a bit of a mess, but everything we need is there. The parameter estimates and their names, as stated before, are conveniently attached to the object as vectors which can be easily accessed. The orders, however, are not conveniently stored, and will be extracted by converting this output into a string and indexing it to get the desired information. Below are two functions, one which can extract the parameters and another which can extract the orders from this object.

Now we have helper functions to extract the information we want from our auto.arima output. There is one snag left to manage, which is that when the auto.arima output has a constant, we need to provide an exogenous variable to the statsmodels SARIMAX object in the form of a vector of ones which the coefficient will be assigned to. Also, creating our auto.arima response object could be much simpler. Rather than generating these each time, we can create yet another helper function to give us everything we need for the creation of our statsmodels SARIMAX model in one step, as below:

We now have everything we need to quickly generate an appropriate model to a time series. Let’s try this on our first zip code’s log returns, to see if it worked:

Real Estate Consulting Data Project (10)

Real Estate Consulting Data Project (11)

We can see this model has some issues but that it is fairly effective. The residuals are heteroscedastic, as expected, and there is some mild autocorrelation in the residuals at lag 5, which is not ideal. However, the residuals are centered around zero, and although the leptokurtocity (excess kurtosis, meaning a slender peak, being caused by the volatility clustering) is causing the JB test null hypothesis of normally distributed residuals to be rejected, they are not too far off, having reasonable skew and kurtosis. Testing this process on a few other zip codes shows that the models generated tend not to have autocorrelated residuals in general, and close-to but not normally distributed residuals, with heteroscedasticity following the same general pattern of increasing volatility over time, and the resulting leptokurtocity. There is much discussion centered on the fact that having perfect SARIMAX model fit for financial time series is not common, and these models are fairly solid considering this domain, especially considering the rapidity with which they are being generated.

Comparing models of all zip codes

We now want to apply this modeling process to all zip codes which have complete data, and which have most recent average sales prices below $500,000. Each model will be used to make a forecast over 5 years, then we will take the average of the predicted mean and the lower confidence limit over the projection period for each model for comparisons. The reason that these two metrics make a good combination for zip code comparisons is that a high mean of expected returns tells us that the model has an optimistic outlook of returns in the future, and when this is paired with higher value for the mean of the lower confidence limit over the same time period, it means that not only does the zip code have a positive outlook for gains, it also has a relatively tight confidence interval, and thus lower expected variance. Zip codes which experienced less intense volatility clustering during the financial crash will have lower estimates for sigma2 in their models, and thus will have tighter confidence bands in their forecasts. This method should lead us to zip codes which are at the top of the list for 3 of our investment criteria: high expected capital appreciation rate, low expected volatility, and resilience to poor economic conditions during the crash. Once we have our contenders, we can then look at their P2R ratios to find those which also satisfy our need for leasing profitability.

Although we have a novel and efficient way to generate the models, 14,723 zip codes is still a lot, and generating a model for each one will take a few hours. The following code will run the appropriate loop to compile a dataframe of the metrics which we will use to select our top zip code candidates.

0 - 1001
(3, 1, 1)(0, 0, 2, 12) model with no constant
1 - 1002
(0, 1, 3)(0, 0, 2, 12) model with no constant
2 - 1005
(2, 1, 1)(0, 0, 2, 12) model with no constant
3 - 1007
(4, 1, 3)(1, 0, 1, 12) model with no constant
4 - 1008
(0, 1, 3)(0, 0, 1, 12) model with no constant...and so on forever

Excellent. A few short hours later and we have some nice forecast metrics to lead us to superior zip codes for investment. The most informative metric we have taken is the mean of the lower confidence limit throughout the 5 year forecast. There is a lot of information built into this one number, because it can’t be high without having a high mean predicted mean of returns over time as well as a low expected variance. This means that we can sort our results by this metric in descending order to find our top contenders so far, and then include P2R to finalize a list of our top 5 zip codes for investment.

Real Estate Consulting Data Project (12)

Choosing the top 5 zip codes

The results are in, but we need a more descriptive dataframe, and we want the P2R ratios as well. Earlier, we saw that the P2R dataframe has the locations recorded in a ‘Region Name’ column, which is the name of the metro together with the state abbreviation. We can create a similar column from our original dataframe combining the metro and state columns, so we can merge the two dataframes. Let’s limit our results to the top 200, then iterate through the index and create an info dataframe, then merge everything together.

Real Estate Consulting Data Project (13)

Now we have all of the information we need in one place. We can see that there are missing values in the P2R column, so we will not be able to comment on the leasing profitability for these zip codes, and we will focus on those for which we do have P2R info. Let’s filter our results to everything with a P2R below the national average and go from there.

Real Estate Consulting Data Project (14)

Now our search is coming to a close. What is left to do from here is inspect the models for each of these zip codes individually to find the best opportunities for investors. To save a bit of space here, the top two zip codes on this list had historically low volatility, then in the one or two most recent months saw massive unexpected growth rates, the combination of which makes the model forecasts quite optimistic in the short term. However, it would be better to advise the client to invest in zip codes with a little more than a couple of recent months of great returns, so we move on down the list. San Antonio has a great P2R ratio, close to Dogen’s recommendation of 9.6. Since our target P2R is around 10, let’s consider what this means: it takes 10 years of rent payments to pay off the cost of ownership of a property. This means that if an investor holds the property for 10 years and leases it out during that time, they will have paid off their initial investment, and can still sell the properties at a potentially higher value! That gives them a great return on their investment. Let’s look at the model for San Antonio and make forecasts over a period of 10 years.

Real Estate Consulting Data Project (15)

Real Estate Consulting Data Project (16)

We can see this model fits fairly well except for the typical heteroscedasticity to expect from any zip code in this dataset. The residuals are close to a normal distribution, and the residuals are not autocorrelated. Let’s look at a in-sample and out-of-sample 10 year forecast.

Here’s a 10 year forecast in-sample, taking us back to to the middle of the crash, when things were most uncertain.

Real Estate Consulting Data Project (17)

We can see that even though our sigma2 may be a bit underestimated, the confidence bands of our model contained everything following April 2008 apart from the recent spike in growth. Let’s see what the model predicts about the next 10 years.

Real Estate Consulting Data Project (18)

This looks pretty good. The predicted mean is around 2% a month and the confidence bands are mostly above the zero line. The area in the shaded region which falls below the zero line is much smaller than the area above it, giving an investor confidence that they are unlikely to lose money in the next 10 years.

Picking the top 5 zip codes at this point was done by hand by examining individual models for each. The three code blocks above can be adjusted to generate a model and its predictions for any zip code just by changing the zip code in the initial get_group call. In picking the top 5, picking two zip codes in the same city was avoided since diversification is generally a good thing in capital investment, so that in the event that something adversely affects the market in one location, the portfolio is not over-exposed. However, two zip codes from the Dallas/Fort Worth metro were picked, although in different districts, one suburban and one urban. The best 5 zip codes were:

Real Estate Consulting Data Project (19)

All of these zip codes have good forecasts of mean returns, low estimated volatility, and P2R ratios below the national average. Because these zip codes also have the highest average lower confidence interval in their forecasts, they represent the areas least likely to lose value into the future, according to the SARIMAX models. The properties in these zip codes can have the cost of ownership paid off with around 10 years of rent, so that, assuming the home does nothing other than hold its value, the investor can have 100% profit over that time interval, and any capital gains on the properties would be an addition to that. Considering these zip codes have the lowest probability of losing value, they are quite attractive choices for a real estate investor. Let’s look at their price curves together.

Real Estate Consulting Data Project (20)

This gives us a good picture of the resilience of these zip codes to the market crash and the recent growth that we were looking for. Let’s look at the models from the rest of the top 5 zip codes.

Bell Buckle, TN:

Real Estate Consulting Data Project (21)

Real Estate Consulting Data Project (22)

Real Estate Consulting Data Project (23)

Dallas, TX:

Real Estate Consulting Data Project (24)

Real Estate Consulting Data Project (25)

Real Estate Consulting Data Project (26)

Markle, IN:

Real Estate Consulting Data Project (27)

Real Estate Consulting Data Project (28)

Real Estate Consulting Data Project (29)

Flower Mound, TX:

Real Estate Consulting Data Project (30)

Real Estate Consulting Data Project (31)

Real Estate Consulting Data Project (32)

Conclusion

In this project, an imaginary real estate investment firm was to be advised on the top five zip codes to invest in. In order to answer this question, domain knowledge was combined with comparisons of SARIMAX models generated for the return series of each individual zip code. To generate such a large quantity of models in a reasonable time frame, an efficient method to generate well-fitting models for a large number of time series in Python has been demonstrated. By using the strength of the auto.arima function in R’s forecast package within a Python environment, it has been possible to use a Python workflow and statsmodels functionality with appropriate models generated quickly. The zip codes were evaluated by their outlooks on capital appreciation, looking for strong growth with low expected variance. The P2R ratios of the zip codes with the best outlooks were then used to filter the results, and zip codes with both strong outlooks and above average leasing profitability were hand selected for recommendation. Since the target P2R was around 10, 10 year forecasts were generated to see what the model expectations were over a holding period for which the owner could expect to pay off the costs of property ownership. Since the zip codes selected (78210, 37020, 75228, 46770, and 75028) were among the least likely to depreciate over that time period of any of the 14,723 in the data set, they make the best prospects for a real estate investor.

Real Estate Consulting Data Project (2024)

FAQs

What is the identification and analysis of sub markets within larger groups based on their economic demographic and or psychographic characteristics? ›

Market segmentation is the process of dividing the market into subsets of customers who share common characteristics. The four pillars of segmentation marketers use to define their ideal customer profile (ICP) are demographic, psychographic, geographic and behavioral.

Read On ›

What is the objective of a real estate consultant? ›

Real estate consultants are hired to advise on commercial and real estate development projects from start to finish. This may involve researching potential sites for the development, conducting feasibility reports, and running competitive analyses.

Discover More Details ›

What is consulting in real estate? ›

Put simply, real estate consultants are professionals who specialize in helping buyers and sellers make informed decisions about their real estate transactions. For example, they can help conduct a financial analysis for a particular property or assist with strategic planning when trying to build your portfolio.

How do you analyze property data? ›

One of the most common and reliable ways to analyze property value data is to use comparable sales, also known as comps. Comps are the prices of similar properties that have sold recently in the same area as the property you are evaluating.

See Details ›

What is the difference between a real estate agent and a consultant? ›

A real estate agent will sell you a house because you want to buy one. A consultant will ask why you want to buy the house, and help you asses whether it's a wise decision based on your level of financial stability and risk aversion.

Find Out More ›

What is the difference between a consultant and an agent? ›

When you hire an agency, you enlist them to do marketing on behalf of your company. When you hire a consultant, you enlist them to bring advice and guidance, which you and your team use when executing your marketing.

Tell Me More ›

What is called a consultant? ›

1. : one who consults another. 2. : one who gives professional advice or services : expert. consultantship.

Show Me More ›

How does consulting pay? ›

Some consultants prefer an hourly rate, while others charge per project or based on a retainer. You can choose whether you want to be paid up front or at certain milestones along the way. There's also a variety of payment methods you can accept, ranging from cash and checks to credit cards and digital wallets.

Explore More ›

What happens in a consulting role? ›

Consultants offer advice and expertise to client organisations to help them improve their business performance. Their work can focus on operations, strategy, management, IT, finance, marketing, HR and supply chain management across a wide range of industries and specialisms.

How does consulting work? ›

What Does a Consultant Do? Consultants offer expert research, solutions, and experience to improve business performance. They are problem-solvers who go into a business to offer objective insights and help put suggested strategies into place.

Show Me More ›

How do I extract data from Zillow? ›

📚 How do I scrape Zillow?

Create a free Apify account using your email.
Open Zillow Search Scraper.
Add one or more Zillow search URLs to scrape available listings.
Click "Start" and wait for the data to be extracted.
Download your data in JSON, XML, CSV, Excel, or HTML.

Read The Full Story ›

What is data in real estate? ›

Real estate data is information about properties, their purpose, their value, and their ownership. Its attributes include real estate data include property listings, sales records, rental prices, property characteristics, and market trends.

See Details ›

How do you conduct real estate research? ›

Here are the five simple steps for conducting your own market research and competitive analysis in real estate:

Step 1: Gather the Necessary Real Estate Data. ...
Step 2: Gather Real Estate Comps Data. ...
Step 3: Calculate Your Potential Returns. ...
Step 4: Perform Comparative Analysis. ...
Step 5: Study the Competition.

Oct 17, 2022

Get More Info Here ›

What is general data in real estate? ›

Specific data, covering details regarding the particular property, and general data, pertaining to the nation, region, city, and neighborhood wherein the property is located, are collected and analyzed to arrive at a value. Appraisals use three basic approaches to determine a property's value.